Float !?

Chris · Nov 20, 2003

Hi

The following code is giving strange results...........

float fTest = 536495.61f;
cout << _T("float: ") << fTest << endl;

int dec, sign;
char* pszTest = _fcvt(fTest, 10, &dec, &sign);
cout << _T("string: ") << pszTest << endl;
cout << _T("dec: ") << dec << endl;
cout << _T("sign: ") << sign << endl;

It seems that the stored value for the float is 536495.625 and not
536495.61........... Why?
And why would the "cout" of a float not give the digits after the floating
point?

Regards
Chris

Chris · Nov 20, 2003

OK back to the basics....... It's an internal representation problem.
I'm still wondering why "cout" of a float would not give the digits after
the floating point...

Wil · Nov 20, 2003

It seems that the stored value for the float is

536495.625 and not 536495.61........... Why?

Because a float has 32 bits, so about 8 decimal digits is
all the precision you'll get. Specifically, a floating-
point number has to be stored as

(+-) 1.b1b2b3...b23 x 2^b1b2b3...b8

so you get 1 bit for the sign of the number, 8 bits for
the exponent (which is expressed with a bias of 127, so
the exponent can take on values in the range [-127,127]
without a sign bit), and 23 bits for the mantissa. (The
leading "1" is suppressed, to give an extra precision
bit.) If you need more than 23 bits of precision in your
binary number, use a double instead of a float.

In your particular case, 536494 = 2^19 + 2^13 + 2^11 +
2^10 + 2^9 + 2^8 + 2^7 + 2^5 + 2^3 + 2, so after 19 binary
digits are used up, you still have 536495.61 - 536494 =
1.61 left to express. The leading 1 is suppressed, so you
have to express 0.61 and you have only 23 - 19 = 4 bits
left in your float. After you add 2^-1 + 2^-4 to get
0.5625, you have used up all the bits and you still have a
remainder of 0.0475. What the machine does is the best it
can with 4 bits, namely represent your 1.61 as 1 + 2^-1 +
(0 x 2^-2) + 2^-3 + (0 x 2^-4) = 1.625. Thus, your number
winds up being stored as 536495.625, as you discovered.

And why would the "cout" of a float not give the digits
after the floating point?

Again, the machine does "the best it can", this time of
guessing a good format for the output of a number of this
size with this precision. If you want to see the 3 digits
to the left of the zero, do something like

cout << setw(12) << setprecision(3) << fixed;
cout << fTest;

Your old FORTRAN hacker,
Wil