-------------------------------------------------
Jon Skeet wrote (at this indent level):
-------------------------------------------------
John Bentley at this level.
Note some snipping without notification.
Fair enough. I still prefer my terms as I think they're more commonly
used, but it's really just a matter of taste - the main thing is that
we can understand each other
That, absolutely, is the main thing.
One alternative way of putting it for me would be "floating binary
point" or "floating decimal point" - how does that grab you?
I've just thought of a better reason to avoid the "floating" in these phrases.
It is true that the three .Net datatypes we've been talking about are all
floating point datatypes. However there has been, in VBA for example, a
nonintegral datatype that was a fixed point datatype. That was the currency
datatype that was fixed at 4 decimal places. Presumably, being a currency, the
scaling would have been decimal (base 10). Prsesumably, all that was stored was
the mantissa. The implicit exponent would have always been -4, that is, there
would have always been a Base 10, decimal, scaling of 10^-4 (= / 10,000).
If we are interested in coming up with the most generic terms, that which could
apply to any computer language, and that which gets at a common quality, then a
datatypes being a "floating datatype" is not the important thing. In languages
like VBA we have fixed and floating point datatypes but the important difference
is not this one. It is the base of the scaling. You could, for example, have a
language with fixed and floating point datatypes all of which scale with the
same base. Maybe in the future, when processors are 100 times faster than they
are now, all nonintegrals will scale in base 10. (Maybe then there will be only
one nonintegral datatype to choose from)
As an aside I have thought of a quick way to appreciate the difference between a
Base 2 scaling and a Base 10 scaling. Base 2 scaling is sliding the (binary)
point while the mantissa is still in base 2. Base 10 scaling is sliding the
(decimal) point after the mantissa has been converted to base 10. You know this,
I'm just drilling it into my own understanding.
It's worth noting that 1/3 *isn't* an irrational number though - and
indeed if we had a base 3 floating point number type, it would be
exactly representable (but 1/2 wouldn't be).
Yes, thanks. That clears a confusion. 1/3 is representable as a fraction. I've
just done it. It is just not exactly representable as a decimal with a finite
number of digits in Base 10, but is in Base 3. PI and e are not representable as
a fraction. Yes, 1/3 *isn't* an irrational number.
I think the difference is that most of the time financial calculations
*don't* require divisions like that. They usually include a lot of
addition and subtraction, and multiplication - but rarely actual
division. On the other hand, I haven't done much financial work, so
that's really speculation. Certainly as soon as division comes in,
you're likely to get inaccuracies.
If we think about our savings accounts then I agree that division never comes in
(as far as I can see). We deposit and withdraw exact amounts most of the time.
Occasionaly we get an interest payment. Unless the bank is cruel to its
developers the interest figure will be able to be exactly represented in base
10, something like 4.1% as opposed to 4 1/3 %
125.78 * ' Initial Balance
04.1%
-------
5.15698 +
125.78
--------
130.93698 ' Final Balance
As an aside there are a few interesting issues here. What, exactly is your
entitlement? 130.93, 130.94, 130.9367, 130.93698? I imagine Jon Skeet walking
into his local branch and asking "What is my exact balance?", teller replies,
Jon enquires further, "I mean my exact string representation, I can supply the
algorithm if you wish."
I speculate, and this might require a new thread, that there is some IEEE
standard for financial transactions which says that currency amounts shall be
stored to 4 decimal places, rounded according to "banker's rounding" (which is
toward the even number). So, for example, if an interest payment yields an
intermediate balance of 157.34865 this gets *stored* in your account as
157.3486. The 0.0086 is kept there for future rounding operations. However if
you closed your account with 157.3486 in it you will be actually handed 157.35.
If you closed your account on a day with 157.3428 in it, you will get back
157.34.
The question here, as an aside: Is there a standard for financial transactions
which specifies a maximum number of digits to which you should store currency
amounts? (I have posted this in a new thread:
Standard for financial applications specifying maximum number of decimal
places?)
Back to our main thread. My point is, yes for some financial applications,
division may never come into things.
However, in many they will. Instead of the scolarship example think of a
companies annual profit that must be paid out to shareholders as dividends.
I've mislead us a little bit with this scolarship example (which we could easily
change into a dividend example). Wondering what we do with the extra cent is a
problem that is not due to any limitation of the decimal datatype rather a
limitation due to the nature of currency: it is a quantity that, at the point
where money has to change hands, requires exactness, a limited precision. If we
had a magic computer that had an infinite datatype then we still have the same
problem.
The problem arises not only for division, but as my previous interest example
shows, for multiplication too. How should we specify the problem? Perhaps, it is
a only a problem of how we should round a more precise number to a less precise
number.
I'm not sure what you mean by a "sparse value" here. Put it this way:
Suppose we had two data types, one of which represented a thousand
numbers between 0 and 10000, and another of which represented a
hundred
numbers between 0 and 10000000 - there will be more of a "gap" between
represented numbers in the second type than in the first type, on
average.
Let us, so that we can have more numbers that are representble by both
datatypes, have:
SuperCool 50 (Decimal Analogy): 11 numbers between 0 and 50 (0, 5, 10, 15, ...
50)
SuperCool 100 (Double Analogy): 6 numbers between 0 and 100 (0, 20, 40, 60, 80,
100)
In common we have (0, 20, 40)
Under this scheme, yes, the datatype SuperCool 100 takes up less memory and also
is representing numbers more sparsely (I like this phrase of yours).
However, If we ask "what number can SuperCool 50 hold that SuperCool 100 cannot,
within the range of SuperCool 50?" we can give a number: 5, for example. If, on
the other hand, I ask you "what number can a Decimal hold that a Double cannot,
withhin the range (and precision) of a Decimal?" can you come up with any
number?
It could - but I suspect it's unlikely.
If it turns out that there is a Standard for storing a maximum number of digits
in a financial application then it will never happen. That is, the decimal
datatype's precision would be big enough, presumabley, to hold intermediate
numbers during a calculation before being rounded off the the nearest (say)
fourth decimal place.
There will be no *massive* error using double. The error is likely to
be smaller than the engineers would be able to cope with anyway. The
real world doesn't tend to be as precise as a double, in other words.
I suspect that no matter how far we wish to pursue this discussion the basic
rule of thumb will be: for Financial apps use the Decimal datatype, for
scientific apps use a double.
I won't mind even if I can't see the exhaustive list of reason why this would be
so. The final motivation is in order to guide my .NET programming choices.
Thinking of this end in mind we could shift our discussion more directly around
the programming rules and work backwards rather than starting with axioms and
moving toward the rules.
A fuller list of rules could be:
Choosing between a Binary Scaled Datatype (single, double in VB.NET) and Decimal
Scaled data type (decimal) is governed by considerations:
1. Size: Use Binary Scaled Datatype (doubles and singles) if you need to store a
number that is too large or too small for a Decimal Scaled Datatype (decimal).
2. Exactness: If you deal with quantities that start life with, and require,
exact representation, like the price of a shirt, use Binary Scaled Datatype
(doubles and singles). If you deal with quantities that start life with
imprecise representations, and can never have precise representation, like the
length of a diameter of a tyre, use floating point data types (double and single
in VB.NET). Rough guide: For Financial applications use the fixed point
datatype, for scientific applications use floating point data types.
3. Tolerating round off errors: The Decimal Scaled Datatype (Decimal Datatype)
is less prone to round off errors.
4. Performance: Floating datatypes (Double and Single in VB.NET) are in the
order of 40 times faster than the Decimal Scaled Datatype(Decimal Datatype).
What do ya reckon?
A further issue: Double V Single (Float in C#). At the moment I'd be inclined to
always choose a double by default, even though it is slower and takes more
memory. In any given app, it's speed is always apparent while an inaccuracy
might not be until a disaster occurs. With processor power and memory becoming
larger and cheaper I think coding optimizations become less important. In
database apps, to take a specific type of app, the major speed bottlenecks will
be in the number of records (and the number of fields) coming across the network
rather than choosing a double over a single.
In any case have you done any performance tests of Doubles V Singles?