surprising results with subtration with floats on VC

pavan · Dec 17, 2003

See the following code :

#include <iostream>

using namespace std;

float f1 = 1.0e+6;

// This function just returns f1.
float value1()
{
return f1;
}

void main()
{
float f3 = f1 * f1 - f1 * f1;
cout<<"f3 = "<<f3<<endl;
float f4 = (f1 * value1()) - (f1 * value1());
cout<<"f4 = "<<f4<<endl;
}

The result is :
f3 = 0
f4 = -4096
I expected that the second result should also be zero, as
both are same.
But I am getting -4096.
Can you tell me the reason for this behavior.

The above behavior is not shown by double and long double.
Thanks,
pavan.

William M. Miller · Dec 17, 2003

pavan said:
See the following code :

#include <iostream>

using namespace std;

float f1 = 1.0e+6;

// This function just returns f1.
float value1()
{
return f1;
}

void main()
{
float f3 = f1 * f1 - f1 * f1;
cout<<"f3 = "<<f3<<endl;
float f4 = (f1 * value1()) - (f1 * value1());
cout<<"f4 = "<<f4<<endl;
}

The result is :
f3 = 0
f4 = -4096
I expected that the second result should also be zero, as
both are same.
But I am getting -4096.
Can you tell me the reason for this behavior.

The above behavior is not shown by double and long double.

The basic problem is that 1.0e12 cannot be represented
exactly in a float -- it doesn't have enough bits in the
mantissa. Your first calculation is carried out directly
in the floating-point registers, which have sufficient
bits in the mantissa to represent the intermediate results
in the calculation, so you get an exact answer. In the
second calculation, what happens is that the program
computes the results of "f1 * value1()" and then has to
save that value in a temporary before calling value1() again
to compute the other operand of "-". The temporary is a
float, which doesn't have enough bits to represent the value
exactly, so the intermediate result is rounded. The rounded
result is 4096 less than the exact value.

You can see that this is exactly what is happening by adding
the following to your code:

float f7 = f1 * f1;
float f8 = f7 - f1 * f1;
cout <<"f8 = "<<f8<<endl;

The printed result is "-4096", exactly as in your second
expression.

If you compile with "-Og" (which turns on global analysis
and common subexpression elimination), the function will only
be called once, so there's no need to store a temporary; the
entire calculation will be carried out in the floating point
registers and the result of both of your calculations will be
"0".

Welcome to the fun world of floating point rounding errors! :-)

-- William M. Miller

pavan · Dec 18, 2003

Thanks a lot for a very informational reply.

pavan.