Why does 28.08 show up as 28.080000000000002 in DataGridView

  • Thread starter Thread starter Daniel Manes
  • Start date Start date
I have a fair idea. And they're the same people who get a surprise when they
light the barbecue after pouring petrol (gasoline) on the charcoal.

It's the best argument that there is for not allowing users to muck about
with things that they don't understand.



fact that they knew the nature of its conversion and storage. Do you know
how many people I have seen choose FLOAT for storing monetary values like
prices, or even quantities? And do you know how surprised they are to
find out that SQL Server changed 28.08 to 28.0800000000002? SQL Server

</snip>
 
Do you think it would be hard to explain that it's the *conversion*
which loses the data, rather than giving the impression that FLOAT is
inherently inexact?

I don't think it would be too hard, but I think it would be useless. As I
said before, most of the people don't care why it happened, they just want
to know how to fix it. When you get your oil changed, do you want to know
exactly how impure your oil was and how this affected your engine and gas
mileage down to the nano-details? Or do you just want new oil?

I've had an article about this since 2003, and you're the first person I've
ever heard even raise a peep about it.
http://classicasp.aspfaq.com/general/why-does-3-2-1-5-4-7000000000000002.html

And I still argue that it is valid to call the data type approximate,
because frankly, conversion is a part of the data type, as far as I'm
concerned. YMMV.

A
 
Jon Skeet said:
And that's great, so long as the numbers are originally specified in
base 10 - you don't need a conversion.

I distinguish between an approximate *conversion* and whether or not a
*type* is exact.

There's absolutely a conversion. Every time you feed information into the
computer and store it as one of these data types it is converted from the
input format to the internal representation. Every single time. Being a CS
guy you should know that :) It just so happens that with the NUMERIC(4, 2)
type, for instance, there is a single lossless conversion from the character
string "28.08" to the internal representation [shown in Base-16 here]:
"0x04020001F80A0000". For REAL and FLOAT there is a lossy conversion from
the character string to an approximate Base-2 representation of
"0x403C147AE147AE14", which represents a number that is exactly not "28.08".

As for distinguishing between the "type" and the "conversion", do you
specifically target the "conversion" in C# when you write a statement like
this?

float f = 28.08;

Technically this is a conversion of the character string "28.08" to an
internal IEEE-standard floating-point format and subsequent assignment of
the result of that conversion to the variable "f". However, since there is
*always* an implicit conversion most people take it for granted. It might
make a good trivia question, but beyond that it's not important to specify
this particular implicit conversion every time you reference a variable
assignment. Same goes for SQL.

To say that the fault lies completely with the conversion process and the
storage format bears no blame, is that an implication that the conversion
process can be fixed so that this information loss does not occur? I tend
to think that the conversion process is necessitated by the limitations of
the storage format itself and not the other way around.
 
Aaron Bertrand said:
I don't think it would be too hard, but I think it would be useless. As I
said before, most of the people don't care why it happened, they just want
to know how to fix it. When you get your oil changed, do you want to know
exactly how impure your oil was and how this affected your engine and gas
mileage down to the nano-details? Or do you just want new oil?

I guess it depends on what kind of person you are. If I get results I
don't expect, I want to get a good, *accurate* description of what's
going on - in this case, where I'm losing information. The fact that
it's the conversion which is lossy would be important to me.
I've had an article about this since 2003, and you're the first person I've
ever heard even raise a peep about it.
http://classicasp.aspfaq.com/general/why-does-3-2-1-5-4-7000000000000002.html

I have no problems with that article.
And I still argue that it is valid to call the data type approximate,
because frankly, conversion is a part of the data type, as far as I'm
concerned. YMMV.

The conversion from decimal is part of what you can do with the type,
but I don't think that's enough to deem the type itself as approximate.
To go back to the base 3 case, if there *were* a conversion from base 3
to decimal, would that make the decimal type in some way approximate in
a way that it isn't at the moment? It would still be able to represent
the same data it can today, so how would it have become less exact?

(No, I'm still not saying that a base 3 type will ever come about -
it's merely a thought experiment to examine how reasonable the "exact"
vs "approximate" distinction is.)
 
To say that the fault lies completely with the conversion process and the
storage format bears no blame, is that an implication that the conversion
process can be fixed so that this information loss does not occur? I tend
to think that the conversion process is necessitated by the limitations of
the storage format itself and not the other way around.

YES, I agree!

Next topic, please...
 
The conversion from decimal is part of what you can do with the type,
but I don't think that's enough to deem the type itself as approximate.

It is if the limitations of the type affect how certain data is stored
there. I'm not going to keep beating dead horse here (and once again, I
don't really care about base 3, if I hadn't already beat that dead horse
enough).

A
 
[Finally remembered to remove the VB newsgroup as requested by Cor.]

Mike C# said:
And that's great, so long as the numbers are originally specified in
base 10 - you don't need a conversion.

I distinguish between an approximate *conversion* and whether or not a
*type* is exact.

There's absolutely a conversion. Every time you feed information into the
computer and store it as one of these data types it is converted from the
input format to the internal representation. Every single time. Being a CS
guy you should know that :) It just so happens that with the NUMERIC(4, 2)
type, for instance, there is a single lossless conversion from the character
string "28.08" to the internal representation [shown in Base-16 here]:
"0x04020001F80A0000".

Okay, agreed - apologies. I suppose I was considering the conversion
(or lack thereof) from one base to another. I should have been more
precise.
For REAL and FLOAT there is a lossy conversion from
the character string to an approximate Base-2 representation of
"0x403C147AE147AE14", which represents a number that is exactly not "28.08".
Yup.

As for distinguishing between the "type" and the "conversion", do you
specifically target the "conversion" in C# when you write a statement like
this?

float f = 28.08;

Technically this is a conversion of the character string "28.08" to an
internal IEEE-standard floating-point format and subsequent assignment of
the result of that conversion to the variable "f". However, since there is
*always* an implicit conversion most people take it for granted. It might
make a good trivia question, but beyond that it's not important to specify
this particular implicit conversion every time you reference a variable
assignment. Same goes for SQL.

Yes, I would regard there as being a compile-time conversion going on -
and it's defined by the spec:

<quote>
The value of a real literal having type float or double is determined
by using the IEC 60559 =3Fround to nearest=3F mode.
</quote>

(The spec also talks about the rounding conversion used, if any is
required, in converting a literal into a decimal.)
To say that the fault lies completely with the conversion process and the
storage format bears no blame, is that an implication that the conversion
process can be fixed so that this information loss does not occur?

Not at all, just as I wouldn't say that the conversion from "28.08" to
an INTEGER can be "fixed" so that the information loss doesn't occur.
I tend to think that the conversion process is necessitated by the
limitations of the storage format itself and not the other way
around.

The storage format for FLOAT has a set of values it can represent, just
the same as INTEGER and DECIMAL. I still don't understand why it seems
to be "forgiveable" for INTEGER and DECIMAL (in that it doesn't stop
them from being labelled "exact", but not for FLOAT.

You wrote elsewhere:

<quote>
The fact that you cannot accurately store a value that is not a member
of the set of valid values for a type is of no significance.
</quote>

The set of valid values for FLOAT doesn't include 28.08 in the same way
that the set of valid values for INTEGER doesn't include 28.08 and in
the same way that the set of valid values for NUMERIC(2,1) doesn't
include 28.08. I see those three situations as being the same, but
you're highlighting FLOAT as being "different". Why?
 
Jon Skeet said:
Do you think it would be hard to explain that it's the *conversion*
which loses the data, rather than giving the impression that FLOAT is
inherently inexact?

SQL is an abstraction, as are C#, C++, VB, yadda yadda. Part of that
abstraction is designed to pull people away from the underbelly of the
machine and give them tools they can use to get a particular job done.
While it's useful from a CS perspective to understand the IEEE floating
point standard, how chip designers and manufacturers implement FPU's, the
differences between binary floating point/binary coded decimal/packed binary
coded decimal, the Assembly Language instructions happening under the hood,
etc., it's just not *necessary* to have this intimate knowledge to do
99.9999999999999999% of what most SQL developers, DBA's, etc., do. Just
like it's not necessary to have Ph.D.'s in mechanical engineering, civil
engineering, physics, mathematics, and electrical engineering to drive a
car. It might be nice to have all that knowledge, but it's not really
necessary.
I don't doubt that people use FLOAT inappropriately, and that they are
indeed surprised by the lossy conversion, but I don't see how
describing the type as inaccurate is any clearer than saying
(accurately) that it's the conversion which loses the information.

Speaking of "inaccurate", the FLOAT type is "approximate", not
"inaccurate". I don't recall seeing anyone call the FLOAT type "inaccurate"
until this post.

As for why it's described as "approximate"; because SQL is an abstraction,
and in SQL (which is not object-oriented) you cannot override operators like
the assignment operator. The operators are closely tied to the data types,
and to almost all users (and the vast majority of developers) it's enough to
know that if you assign a FLOAT variable a value and later retrieve it, it
might not be the same value you assigned. That's the level of abstraction
brought by SQL - it's several layers of abstraction higher than Assembler
Language, C++, or even C#.
 
[Removed VB group]

Aaron Bertrand said:
It is if the limitations of the type affect how certain data is stored
there. I'm not going to keep beating dead horse here (and once again, I
don't really care about base 3, if I hadn't already beat that dead horse
enough).

I think we can work towards a compromise then. How about the following
statements:

<proposal>
FLOAT is an approximate type with respect to some values which are
exactly representable in base 10. It is an exact type with respect to
values which are exactly representable in base 2.

DECIMAL is an approximate type with respect to some values which are
exactly representable in base 3. It is an exact type with respect to
values which are exactly representable in base 10.
</proposal>

Is there anything in there that you'd actually disagree with (rather
than just not caring about)?

The difference between those statements and the simplistic statement
that "FLOAT is approximate, DECIMAL is exact" is that the above make it
perfectly clear that it's only the base involved which determines the
"exactness", not anything else about the type.
 
DECIMAL is an approximate type with respect to some values which are
exactly representable in base 3. It is an exact type with respect to
values which are exactly representable in base 10.
</proposal>

Is there anything in there that you'd actually disagree with (rather
than just not caring about)?

I don't think there's any reason to point out that decimal can't store base
3 precisely, since this is completely irrelevant in SQL Server. If you
really like those definitions, then propose them to the standards bodies and
to Microsoft (connect.microsoft.com/sql). I don't feel very inclined to
adopt them.
The difference between those statements and the simplistic statement
that "FLOAT is approximate, DECIMAL is exact" is that the above make it
perfectly clear that it's only the base involved which determines the
"exactness", not anything else about the type.

Irrelevant in the context of this discussion, imho, since you cannot alter
the base for any of the numeric data types in SQL Server.
 
Jon Skeet said:
That means it's deterministic, not that it's exact. It certainly *is*
an approximation, and it's losing information. I'd certainly be
perfectly happy to argue the "approximate" side of that debate :)

We could debate the exactness there :) After all, the conversion process
boils down to a bunch of machine-level instructions that perform the exact
same task each and every time. So that in addition to being deterministic,
they perform a very exact process. There's nothing approximate about the
machine language instructions that perform the conversion; they are exactly
defined, exactly performed, and you retrieve an exact result.
Do either you or Aaron actually disagree that when you separate the
idea of "type" from "conversion" it's the *conversion* which is
approximate rather than the FLOAT type itself? If you do, it's probably
worth continuing to debate (despite my 5am post!) - if not, it's
presumably just a matter of how much we all do/don't care about making
the distinction between types and conversions. That's a matter of
opinion and probably not worth arguing about.

I can agree with that to an extent; however, the lossy conversion is
necessary because of the limitations of the storage format. Consider BCD
format which stores each digit in 3 bits. Absolutely no loss occurs during
the character conversion process from "28.08" to the internal BCD
representation, precisely because the format is not limited in that respect.

SQL is a highly abstract language, and as I mentioned most users and
developers don't care about the intricacies of IEEE floating point
representation, mantissas (mantissae?), etc. As Aaron pointed out, SQL
users tend to focus on higher level aspects, like if I put in 28.08 will I
get back 28.08? Or will I get 28.079237402371? That's the power of higher
level languages - you don't need to know the intricate details of the
hardware to do useful things.
 
Aaron Bertrand said:
I don't think there's any reason to point out that decimal can't store base
3 precisely, since this is completely irrelevant in SQL Server.

The reason for pointing out the situations in which decimal would be
"approximate" is to throw more light on why float is "approximate" with
regards to base 10. It shows (correctly, IMO) that there's not much
*fundamental* difference between the two types - just the base
involved. (The precision and scale being fixed or floating is another
difference, of course, but not related to this issue IMO.)
If you really like those definitions, then propose them to the
standards bodies and to Microsoft (connect.microsoft.com/sql). I
don't feel very inclined to adopt them.

I was merely trying to find some common ground between us.
Irrelevant in the context of this discussion, imho, since you cannot alter
the base for any of the numeric data types in SQL Server.

The context of this discussion, as far as I'm concerned at least, is a
consistent treatment of "approximate" compared with "exact". My
statements gave a consistent treatment, whereas I see inconsistency in
the way that float and decimal are described otherwise. There's a
context of "with respect to numbers which are exactly representable in
base 10" in the simplistic statement which I feel would be better made
explicit than implicit.
 
The reason for pointing out the situations in which decimal would be
"approximate" is to throw more light on why float is "approximate" with
regards to base 10. It shows (correctly, IMO) that there's not much
*fundamental* difference between the two types - just the base
involved.

And since the base is fixed and not negotiable in SQL Server, we'll just
have to agree to disagree.
I was merely trying to find some common ground between us.

I don't think you will, because I still consider FLOAT and REAL to be
approximate data types in SQL Server. I don't think you'll convince many
people to change their opinions by trying to segregate conversion from
storage, because quite frankly, 99% of them don't care, and the other 1% are
likely busy solving more important problems. :-)

A
 
Aaron Bertrand said:
And since the base is fixed and not negotiable in SQL Server, we'll just
have to agree to disagree.

Do you see no value in thought experiments then? Not interested in what
the impact of a base 3 type *would* be if it were introduced in the
next version of SQL Server? (No, I'm still not suggesting it will be.)

But you're right, we will have to agree to disagree.
I don't think you will, because I still consider FLOAT and REAL to be
approximate data types in SQL Server. I don't think you'll convince many
people to change their opinions by trying to segregate conversion from
storage, because quite frankly, 99% of them don't care, and the other 1% are
likely busy solving more important problems. :-)

I'll try for one last time to find common ground: do you agree that
your statements about exact/approximate are only true from a base 10
perspective?
 
You wrote elsewhere:

<quote>
The fact that you cannot accurately store a value that is not a member
of the set of valid values for a type is of no significance.
</quote>

The set of valid values for FLOAT doesn't include 28.08 in the same way
that the set of valid values for INTEGER doesn't include 28.08 and in
the same way that the set of valid values for NUMERIC(2,1) doesn't
include 28.08. I see those three situations as being the same, but
you're highlighting FLOAT as being "different". Why?

Just replying to myself now that I've found both parts of the
inconsistency. Here's Mike about FLOAT:

<quote>
T-SQL defines real and float as
"approximate" data types because using the IEEE standard binary
representation not all numbers in the range of valid values can be
stored in an exact representation.

....

Decimal and numeric are "exact" data types because they can represent
all the numbers in their range of valid values in an exact
representation. I.e., "28.08" is stored exactly as 28.08.
</quote>

Compare that with the quote quoted above (if you see what I mean).

It seems that in the case of INTEGER and DECIMAL, "the range of valid
values" is deemed to be "those which can be stored exactly" whereas for
FLOAT Mike deems it to be "anything between the minimum value and the
maximum value".


Either 28.08 shouldn't be regarded as being "in the range of valid
values" for FLOAT, *or* it should be regarded as being "in the range of
valid values" for DECIMAL(1,2). I'm happy to use either definition, so
long as it's used consistently.



If you apply the definition of "range of valid values" Mike has used
for FLOAT to INTEGER/DECIMAL, you would have to concede that they are
not exact: 1/3 is between the min and max values of DECIMAL(1,2) but is
not exactly representable, for example.

In reverse, if you apply the definition of "range of valid values" Mike
has used for INTEGER/DECIMAL to FLOAT, then your claim that "not all
members in the range of valid values can be stored in an exact
representation" is clearly incorrect (by definition).


I suspect that any definition which can be applied *consistently* to
INTEGER, DECIMAL and FLOAT but which deems FLOAT to be "approximate"
and INTEGER/DECIMAL to be "exact" by examining the "range of values"
has to specifically refer to base 10 representations of numbers. Of
course, I'd be happy to be disproved - Mike, care to construct such a
definition?
 
Jon said:
The reason for pointing out the situations in which decimal would be
"approximate" is to throw more light on why float is "approximate" with
regards to base 10. It shows (correctly, IMO) that there's not much
*fundamental* difference between the two types - just the base
involved.

It is justified, however, for reasons of human culture and in particular
accounting and legal requirements in accounting.

It's not a technical slight or insult; it's about people.

-- Barry
 
Thanks, everyone, I think :) At least I can't complain that no one
responded to my question! Anyway, I definitely need to rethink when I
use float/double and when I use decimal. As for when to use numeric,
SQL Server docs says "numeric is functionally equivalent to decimal" so
I'm thinking...never?

As for the debate over what to call the different data types, the
conclusion that formed in my mind was something like this: The decimal
data type can store base-10 numbers that don't repeat infinitely
"exactly", while the float data type can store binary numbers that
don't repeat infinitely "exactly." Ask either type to store a base they
weren't designed for, however, and you could lose information during
the conversion, resulting in the stored number being an "inexact"
representation of the original number. Since we live in a base-10
world, and saying "inexact data type" is, well, a lot quicker than what
I just said, I think ANSI and whoever else decided to call floats
"inexact" data types were more than justified, even if their semantics
were imperfect.

-Dan
 
Thanks, everyone, I think :) At least I can't complain that no one
responded to my question! Anyway, I definitely need to rethink when I
use float/double and when I use decimal. As for when to use numeric,
SQL Server docs says "numeric is functionally equivalent to decimal" so
I'm thinking...never?

As for the debate over what to call the different data types, the
conclusion that formed in my mind was something like this: The decimal
data type can store base-10 numbers that don't repeat infinitely
"exactly", while the float data type can store binary numbers that
don't repeat infinitely "exactly." Ask either type to store a base they
weren't designed for, however, and you could lose information during
the conversion, resulting in the stored number being an "approximate"
representation of the original number. Since we live in a base-10
world, and saying "approximate data type" is, well, a lot quicker than
what
I just said, I think ANSI and whoever else decided to call floats
"approximate" data types were more than justified, even if their
semantics
were imperfect.

-Dan
 
Barry Kelly said:
It is justified, however, for reasons of human culture and in particular
accounting and legal requirements in accounting.

It's not a technical slight or insult; it's about people.

The reason DECIMAL exists is certainly about people/accounting/etc -
but I don't see that the differences between DECIMAL and FLOAT need to
made to seem greater than they are.

(The impression that decimal is exact and float/double are approximate
occurs in more than SQL land - it's a common belief in the C#/.NET
communities as well, usually where people don't really understand how
the types work under the hood.)
 
Back
Top