A
Arnie
Folks,
We ran into a pretty significant performance penalty when casting floats.
We've identified a code workaround that we wanted to pass along but also was
wondering if others had experience with this and if there is a better
solution.
-jeff
.....
I'd like to share findings regarding C# (float) cast.
As we convert double to float, we found several slow down issues.
We realized C# (float) cast can be costly if not used appropriately.
------------------------------------------------------------
Slow cases
------------------------------------------------------------
(A)
private void someMath(float[] input, float[] output)
{
int length = input.Length;
for (int i = 0; i < length; i++)
{
output = (float)Math.Log10(input); // <--- inline (float)
cast is slow!
}
}
(B)
private void Copy(double[] input, float[] output)
{
int length = input.Length;
for (int i = 0; i < length; i++)
{
output = (float)input; // <--- inline (float)
cast is slow!
}
}
In these examples, "inline" (float) casts are executed on the same line as
other operations
such as Math.Log10() or simple data fetch from input array.
These are slow. Even with Release build.
(A): It takes 3 to 6 % more than double[] case. ;-)
(B): It takes as twice(!) as double[] case. ;-)
In my understanding and articles on the Net, the slow down comes from
writing intermediate value
back to memory as follows. The extra trips are costly.
(A) CPU/FPU +--> fetch --> Math.Log10 --+ +--> (float) --+
| | | |
| | | |
| V | V
memory input written back to heap output
Extra memory access!
(B) CPU/FPU +--> fetch --+ +--> (float) --+
| | | |
| | | |
| V | V
memory input written back to heap output
Extra memory access!
------------------------------------------------------------
Fast cases
------------------------------------------------------------
To avoid the extra memory access, we can use a temporary variable to store
the intermediate data.
The temporary variable is allocated in CPU register and we can keep the
speed fast.
(C)
private void someMath(float[] input, float[] output)
{
int length = input.Length;
for (int i = 0; i < length; i++)
{
double tmp = Math.Log10(input); // <-- store in a
temporary variable in CPU register
output = (float)tmp; // <-- then (float) cast.
Fast!
}
}
(D)
private void Copy(double[] input, float[] output)
{
int length = input.Length;
for (int i = 0; i < length; i++)
{
double tmp = input; // <-- store in a
temporary variable in CPU register
output = (float)tmp; // <-- then (float) cast.
Fast!
}
}
In these improved versions, the intermediate data are not written back to
the memory.
The improved versions are actually slightly faster than the double[] case.
(C): 1% faster than double[] case.
(D): 3% faster than double[] case.
(C) CPU/FPU +--> fetch --> Math.Log10 --> stays in -----> (float) --+
| CPU register |
| Fast! |
| V
memory input
output
(D) CPU/FPU +--> fetch --> stays in -----> (float) --+
| CPU register |
| Fast! |
| V
memory input output
OK, this is what we found from benchmarking and googling.
The same thing can be said for ArraySegment<float> arrays as well.
This is because the issue relates to float variables in the array, not the
array itself.
You would say this is .NET compiler optimization issue.
If you know optimization flags or anything that can fix this issue on
compiler side, please let us know.
That would be a great help!
(By the way, simple release build does not help.)
Otherwise, we will need to optimize our code by hand using temporary
variable technique as in the example.
Well, we have many instances of this kind of "inline" casts in our code.
We ran into a pretty significant performance penalty when casting floats.
We've identified a code workaround that we wanted to pass along but also was
wondering if others had experience with this and if there is a better
solution.
-jeff
.....
I'd like to share findings regarding C# (float) cast.
As we convert double to float, we found several slow down issues.
We realized C# (float) cast can be costly if not used appropriately.
------------------------------------------------------------
Slow cases
------------------------------------------------------------
(A)
private void someMath(float[] input, float[] output)
{
int length = input.Length;
for (int i = 0; i < length; i++)
{
output = (float)Math.Log10(input); // <--- inline (float)
cast is slow!
}
}
(B)
private void Copy(double[] input, float[] output)
{
int length = input.Length;
for (int i = 0; i < length; i++)
{
output = (float)input; // <--- inline (float)
cast is slow!
}
}
In these examples, "inline" (float) casts are executed on the same line as
other operations
such as Math.Log10() or simple data fetch from input array.
These are slow. Even with Release build.
(A): It takes 3 to 6 % more than double[] case. ;-)
(B): It takes as twice(!) as double[] case. ;-)
In my understanding and articles on the Net, the slow down comes from
writing intermediate value
back to memory as follows. The extra trips are costly.
(A) CPU/FPU +--> fetch --> Math.Log10 --+ +--> (float) --+
| | | |
| | | |
| V | V
memory input written back to heap output
Extra memory access!
(B) CPU/FPU +--> fetch --+ +--> (float) --+
| | | |
| | | |
| V | V
memory input written back to heap output
Extra memory access!
------------------------------------------------------------
Fast cases
------------------------------------------------------------
To avoid the extra memory access, we can use a temporary variable to store
the intermediate data.
The temporary variable is allocated in CPU register and we can keep the
speed fast.
(C)
private void someMath(float[] input, float[] output)
{
int length = input.Length;
for (int i = 0; i < length; i++)
{
double tmp = Math.Log10(input); // <-- store in a
temporary variable in CPU register
output = (float)tmp; // <-- then (float) cast.
Fast!
}
}
(D)
private void Copy(double[] input, float[] output)
{
int length = input.Length;
for (int i = 0; i < length; i++)
{
double tmp = input; // <-- store in a
temporary variable in CPU register
output = (float)tmp; // <-- then (float) cast.
Fast!
}
}
In these improved versions, the intermediate data are not written back to
the memory.
The improved versions are actually slightly faster than the double[] case.
(C): 1% faster than double[] case.
(D): 3% faster than double[] case.
(C) CPU/FPU +--> fetch --> Math.Log10 --> stays in -----> (float) --+
| CPU register |
| Fast! |
| V
memory input
output
(D) CPU/FPU +--> fetch --> stays in -----> (float) --+
| CPU register |
| Fast! |
| V
memory input output
OK, this is what we found from benchmarking and googling.
The same thing can be said for ArraySegment<float> arrays as well.
This is because the issue relates to float variables in the array, not the
array itself.
You would say this is .NET compiler optimization issue.
If you know optimization flags or anything that can fix this issue on
compiler side, please let us know.
That would be a great help!
(By the way, simple release build does not help.)
Otherwise, we will need to optimize our code by hand using temporary
variable technique as in the example.
Well, we have many instances of this kind of "inline" casts in our code.