Excel and the Math Coprocessor for DLLs

  • Thread starter Thread starter Lynn McGuire
  • Start date Start date
L

Lynn McGuire

I have a large Win32 DLL (10 MB) that is called from
my user interface (written in C++) or from VBA in MS
Excel. In my user interface, the DLL runs in its
own space and calculates correctly. Under Excel VBA,
my DLL is having problems with double precision
accuracy. The following test passes in my user
interface but fails under my bad pentium test:

double precision chptst
double precision divtwo
double precision top
double precision bottom
data top / 4195835.0D0 /
data bottom / 3145727.0D0 /
DIVTWO = top / bottom
CHPTST = (DIVTWO * bottom) - top

In my user interface, the chptst result is zero.
Under Excel VBA, the chptst result is 0.2851266E-09.

I have tried resetting the math coprocessor in my
DLL with the following code but it is not working:

unsigned old87Status = 0;
unsigned new87ControlWord = 0;
unsigned new87ControlMask = 0;
unsigned new87result = 0;
old87Status = _status87 ();
if (old87Status != 0)
new87result = _control87 (new87ControlWord, new87ControlMask);

I have verified this behavior in both Excel 2003
and 2010. Does anyone have any ideas here ?

Sincerely,
Lynn McGuire
 
Lynn,
I'm thinking that if your UI calls the DLL then it runs in-process
rather than in its own space. If your UI is a C++ EXE then the same
methodology is being applied to handle the DLL's returned value.

As I stated in your previous post on 3/15, Excel/VBA handles double
precision in its own way and so the return from your DLL got grabbed up
in that process. As JoeU suggests, I suspect Excel is modifying the
return for use with VBA. I use VB6 or PowerBasic DLLs and have no FPU
discrepancies.

You might get more help if you ask this in "comp.lang.C++"...

--
Garry

Free usenet access at http://www.eternal-september.org
ClassicVB Users Regroup!
comp.lang.basic.visual.misc
microsoft.public.vb.general.discussion
 
I have a large Win32 DLL (10 MB) that is called from
my user interface (written in C++) or from VBA in MS
Excel. In my user interface, the DLL runs in its
own space and calculates correctly. Under Excel VBA,
my DLL is having problems with double precision
accuracy. The following test passes in my user
interface but fails under my bad pentium test:

double precision chptst
double precision divtwo
double precision top
double precision bottom
data top / 4195835.0D0 /
data bottom / 3145727.0D0 /
DIVTWO = top / bottom
CHPTST = (DIVTWO * bottom) - top

In my user interface, the chptst result is zero.
Under Excel VBA, the chptst result is 0.2851266E-09.

I have tried resetting the math coprocessor in my
DLL with the following code but it is not working:

unsigned old87Status = 0;
unsigned new87ControlWord = 0;
unsigned new87ControlMask = 0;
unsigned new87result = 0;
old87Status = _status87 ();
if (old87Status != 0)
new87result = _control87 (new87ControlWord, new87ControlMask);

I think the problem is that your call to _control87(0,0) is a NOOP.

Untested but I think _control87( _PC_64+_RC_NEAR, _MCW_PC+_MCW_RC);

Ought to do the trick. Force 64 bit computation and nearest rounding.

It could also be the case that in a pure C/C++ environment the final
pass of the optimising compiler is smart enough to notice that your
expression is identically zero at compile time.
I have verified this behavior in both Excel 2003
and 2010. Does anyone have any ideas here ?

Hope this helps. See the following for details

http://msdn.microsoft.com/en-us/library/aa246719(v=VS.60).aspx
 
Lynn McGuire said:
double precision chptst
double precision divtwo
double precision top
double precision bottom
data top / 4195835.0D0 /
data bottom / 3145727.0D0 /
DIVTWO = top / bottom
CHPTST = (DIVTWO * bottom) - top In my user interface, the chptst
result is zero.
Under Excel VBA, the chptst result is 0.2851266E-09.

You are certainly looking at a floating-point anomaly; that is, sensitivity
to 64-bit rounding. But I do not believe it can be explained by the FPU
rounding mode alone.

And while I might be able to suggest a solution for this particular example,
in general, it is better to bullet-proof your arithmetic to correctly all
numerical examples and computations.

In general, you cannot expect 64-bit floating-point arithmetic to exactly
equal a mathematical solution. All computer arithmetic is limited by a
finite number of bits, whereas mathematics effectively relies on an infinite
representation of information.

In deference to the finite limitations of computer arithmetic in general,
and 64-bit floating-point in particular, it is prudent to explicitly round
any arithmetic result to the precision of accuracy that you require. That
is, any arithmetic result involving non-integer operands, division, or
integer operands or integer results greater than 2^53 in magnitude.

-----

To address your particular example....

We can simulate the two different results using VBA in a manner that sheds
some light on the problem. Consider the following two VBA procedures:

Sub testit1()
Const top As Double = 4195835#
Const bottom As Double = 3145727#
Dim chptst As Double
Dim divtwo As Double
divtwo = top / bottom
chptst = (divtwo * bottom) - top
MsgBox Format(chptst, "0.000E+00")
End Sub

Sub testit2()
Const top As Double = 4195835#
Const bottom As Double = 3145727#
Dim chptst As Double
Dim divtwo As Double
chptst = ((top / bottom) * bottom) - top
MsgBox Format(chptst, "0.000E+00")
End Sub

testit1 displays about 2.851E-10, whereas testit2 display 0.000E+00 --
exactly zero.

The difference is that in testit1, the 80-bit floating-point result of
top/bottom (the FPU of Intel CPUs use 80-bit floating-point internally) is
rounded to a 64-bit floating-point result stored into divtwo. Then VBA uses
the 64-bit divtwo in the computation of chptst.

But in testit2, VBA does all of the computation with 80-bit precision,
rounding to 64 bits only when storing the result into chptst.

I do not know anything about Microsoft C++ or how C++ DLLs might work when
called from VBA.

But based on your observations, I suspect that when the code is compiled and
linked in a C++ program, a better C++ compiler is used that optimizes the
computation of chptst to use the 80-bit result of top/bottom despite the
fact that you stored it into the 64-bit divtwo. However, when the DLL is
compiled and linked into VBA, obviously C++ is using the 64-bit divtwo, just
as testit1 does.

(Although I refer to "a better compiler" as if there are two, the difference
might actually be a difference in the behavior of __the__ so-called
"back-end compiler"; that is, a phase of the C++ compiler.)

Honestly, that does not make all that much sense to me based on my
experience with (Unix) compilers. But that conclusion seems to be supported
by my experiments below.

The more reasonable assumption is that the 80-bit rounding to 64-bit is
handled differently when the C++ DLL is called from VBA.

However, I cannot duplicate the results of testit2 even when I modify
testit1 in either of the following manners:

1. divtwo = top / bottom + 2^-52 ' add 1 to the least-significant bit
2. divtwo = top / bottom - 2^-52 ' sub 1 from the least-significant bit

Those modifications do have the intended effect, which we can see when we
look at the binary representation:

1. Original divtwo is 3FF55754,1C7C6B43.
2. Adding 2^-52, divtwo is 3FF55754,1C7C6B44.
3. Subtracting 2^-52, divtwo is 3FF55754,1C7C6B42.

But the testit1 results in chptst are:

1. With original divtwo, chptst is about 2.851E-10.
2. With divtwo + 2^-52, chptst is about 9.836E-10.
3. With divtwo - 2^-52, chptst is about -4.134E-10.

Since none is exactly zero, as we see in testit2, I conclude that altering
the rounding to 64-bit alone does determine the result in testit1, but the
additional precision of 80-bit representation does.

However, arguably, that is only conjecture.


Lynn McGuire said:
I have tried resetting the math coprocessor in my
DLL with the following code but it is not working:
unsigned old87Status = 0;
unsigned new87ControlWord = 0;
unsigned new87ControlMask = 0;
unsigned new87result = 0;
old87Status = _status87 ();
if (old87Status != 0)
new87result = _control87 (new87ControlWord, new87ControlMask);

I presume that the intended purpose of this code is to change the FPU
rounding mode.

My conclusion above should explain why that does not work.

Unfortunately, I do not know how to set and read the FPU control word in
VBA. So I can offer a dispositive explanation.
 
GS said:
As JoeU suggests, I suspect Excel is modifying the return for use with
VBA.

I don't believe I "suggested" any such thing.

What I believe I did say is: I would expect that each application sets the
FPU rounding mode according to its own requirements.

Whether Excel does that only one time at start-up or Excel and VBA restore
it after returning from each call to a DLL, I cannot say. That would be the
"defensive" thing to do.
 
Lynn,
I'm thinking that if your UI calls the DLL then it runs in-process rather than in its own space. If your UI is a C++ EXE then the
same methodology is being applied to handle the DLL's returned value.

As I stated in your previous post on 3/15, Excel/VBA handles double precision in its own way and so the return from your DLL got
grabbed up in that process. As JoeU suggests, I suspect Excel is modifying the return for use with VBA. I use VB6 or PowerBasic DLLs
and have no FPU discrepancies.

You might get more help if you ask this in "comp.lang.C++"...

My user interface calls a Win32 exe program
which in turn calls the DLL.

I am seeing different results for the internal
calculations in my DLL when called by Excel
VBA. So, it does not matter how the floating
values returned from my DLL are handled.

comp.lang.c++ is for C++ questions. Not apps
like Excel nor specific operating systems.

Thanks,
Lynn
 
You are certainly looking at a floating-point anomaly; that is, sensitivity to 64-bit rounding. But I do not believe it can be
explained by the FPU rounding mode alone.

And while I might be able to suggest a solution for this particular example, in general, it is better to bullet-proof your arithmetic
to correctly all numerical examples and computations.

In general, you cannot expect 64-bit floating-point arithmetic to exactly equal a mathematical solution. All computer arithmetic is
limited by a finite number of bits, whereas mathematics effectively relies on an infinite representation of information.

In deference to the finite limitations of computer arithmetic in general, and 64-bit floating-point in particular, it is prudent to
explicitly round any arithmetic result to the precision of accuracy that you require. That is, any arithmetic result involving
non-integer operands, division, or integer operands or integer results greater than 2^53 in magnitude.

-----

To address your particular example....

We can simulate the two different results using VBA in a manner that sheds some light on the problem. Consider the following two VBA
procedures:

Sub testit1()
Const top As Double = 4195835#
Const bottom As Double = 3145727#
Dim chptst As Double
Dim divtwo As Double
divtwo = top / bottom
chptst = (divtwo * bottom) - top
MsgBox Format(chptst, "0.000E+00")
End Sub

Sub testit2()
Const top As Double = 4195835#
Const bottom As Double = 3145727#
Dim chptst As Double
Dim divtwo As Double
chptst = ((top / bottom) * bottom) - top
MsgBox Format(chptst, "0.000E+00")
End Sub

testit1 displays about 2.851E-10, whereas testit2 display 0.000E+00 -- exactly zero.

The difference is that in testit1, the 80-bit floating-point result of top/bottom (the FPU of Intel CPUs use 80-bit floating-point
internally) is rounded to a 64-bit floating-point result stored into divtwo. Then VBA uses the 64-bit divtwo in the computation of
chptst.

But in testit2, VBA does all of the computation with 80-bit precision, rounding to 64 bits only when storing the result into chptst.

I do not know anything about Microsoft C++ or how C++ DLLs might work when called from VBA.

But based on your observations, I suspect that when the code is compiled and linked in a C++ program, a better C++ compiler is used
that optimizes the computation of chptst to use the 80-bit result of top/bottom despite the fact that you stored it into the 64-bit
divtwo. However, when the DLL is compiled and linked into VBA, obviously C++ is using the 64-bit divtwo, just as testit1 does.

(Although I refer to "a better compiler" as if there are two, the difference might actually be a difference in the behavior of
__the__ so-called "back-end compiler"; that is, a phase of the C++ compiler.)

Honestly, that does not make all that much sense to me based on my experience with (Unix) compilers. But that conclusion seems to be
supported by my experiments below.

The more reasonable assumption is that the 80-bit rounding to 64-bit is handled differently when the C++ DLL is called from VBA.

However, I cannot duplicate the results of testit2 even when I modify testit1 in either of the following manners:

1. divtwo = top / bottom + 2^-52 ' add 1 to the least-significant bit
2. divtwo = top / bottom - 2^-52 ' sub 1 from the least-significant bit

Those modifications do have the intended effect, which we can see when we look at the binary representation:

1. Original divtwo is 3FF55754,1C7C6B43.
2. Adding 2^-52, divtwo is 3FF55754,1C7C6B44.
3. Subtracting 2^-52, divtwo is 3FF55754,1C7C6B42.

But the testit1 results in chptst are:

1. With original divtwo, chptst is about 2.851E-10.
2. With divtwo + 2^-52, chptst is about 9.836E-10.
3. With divtwo - 2^-52, chptst is about -4.134E-10.

Since none is exactly zero, as we see in testit2, I conclude that altering the rounding to 64-bit alone does determine the result in
testit1, but the additional precision of 80-bit representation does.

However, arguably, that is only conjecture.




I presume that the intended purpose of this code is to change the FPU rounding mode.

My conclusion above should explain why that does not work.

Unfortunately, I do not know how to set and read the FPU control word in VBA. So I can offer a dispositive explanation.

That particular test is for detection of the Pentium
FPU FDIV bug:
http://en.wikipedia.org/wiki/Pentium_FDIV_bug
If that test does not round to exactly zero then the
FPU is having problems.

I am beginning to think that you are correct about
the rounding mode.

Thanks,
Lynn
 
I think the problem is that your call to _control87(0,0) is a NOOP.

Untested but I think _control87( _PC_64+_RC_NEAR, _MCW_PC+_MCW_RC);

Ought to do the trick. Force 64 bit computation and nearest rounding.

It could also be the case that in a pure C/C++ environment the final pass of the optimising compiler is smart enough to notice that
your expression is identically zero at compile time.


Hope this helps. See the following for details

http://msdn.microsoft.com/en-us/library/aa246719(v=VS.60).aspx

I will try this as I am thinking this way also.

Thanks,
Lynn
 
Lynn McGuire said:
That particular test is for detection of the Pentium
FPU FDIV bug:
http://en.wikipedia.org/wiki/Pentium_FDIV_bug

Yes, I remember that defect quite well. That bug appeared and was fixed
long ago (c. 1993 according to the wiki article). No modern Intel x32 or
x64 processor has that defect.


Lynn McGuire said:
If that test does not round to exactly zero then the
FPU is having problems.

I don't see that test or that statement in the wiki article. I suspect you
are relying on information you found by following one of the many links on
that webpage.

If you can point me to it, I'd appreciate it.

In any case, I am quite sure that it is referring to 80-bit FPU operations,
not rounding to 64-bit.

And as I demonstrated with the VBA procedure "testit2", the 80-bit FPU
operations do indeed result in exactly zero.

Note that the wiki article does clearly state that 4195835 / 3145727 should
result in 1.333820449136241002, and the FPU bug results in
1.333739068902037589 (forgive any typos).

First, I hasten to point out that that is beyond the formatting limits of
VBA, which will only format the first 15 significant digits.

But VBA arithmetic does indeed result in 1.33382044913624, essentially the
same as the corrected FPU result and not at all like the result due to the
FPU bug.

So the difference you see has nothing to do with the FPU bug. (But perhaps
you did not intend to imply otherwise.)

Actually, the exact 64-bit representation is
1.33382044913624,10930577198087121360003948211669921875.

Note the difference in the 4 digits following the first 15: 1093 v. 1002.

That reinforces my assertion that any expectations of zero for the complete
test is based on the 80-bit FPU arithemetic, not as it is represented in
64-bit storage.

Moreover, I can tell you that 1.333820449136241002 is only an approximation.
An exact conversion of any binary fractional part will end in 5, as
demonstrated by the exact conversion of the 64-bit representation above.


Lynn McGuire said:
I am beginning to think that you are correct about
the rounding mode.

For my edification, please articulate what you "beginning to think" I am
correct about. I have made a number of assertions.

I hope the point you have come to understand is: the difference between the
DLL called from C++ and the DLL called from VBA is not related to the FPU
rounding mode; instead, it is probably related to 80-bit operands v. a mix
of 80-bit and 64-bit operands.

Certainly the rounding to 64-bit causes the difference. But I do not
believe that any change in FPU rounding mode would make a difference.

However, I can only speculate. I could be wrong. So I will be very
interested in your results after applying Martin's corrections.
 
It happens that Lynn McGuire formulated :
comp.lang.c++ is for C++ questions. Not apps
like Excel nor specific operating systems.

Correct! However, it's highly possible that someone there has had
similar issues using C++ DLLs with MS Office automation. Same may be
the case in a Classic VB forum where people have used C++ DLLs and had
similar issues. No harm asking...

--
Garry

Free usenet access at http://www.eternal-september.org
ClassicVB Users Regroup!
comp.lang.basic.visual.misc
microsoft.public.vb.general.discussion
 
I think the problem is that your call to _control87(0,0) is a NOOP.

Untested but I think _control87( _PC_64+_RC_NEAR, _MCW_PC+_MCW_RC);

Ought to do the trick. Force 64 bit computation and nearest rounding.

It could also be the case that in a pure C/C++ environment the final pass of the optimising compiler is smart enough to notice that
your expression is identically zero at compile time.


Hope this helps. See the following for details

http://msdn.microsoft.com/en-us/library/aa246719(v=VS.60).aspx

Bummer, neither of
_control87( _PC_64+_RC_NEAR, _MCW_PC+_MCW_RC);
_control87 (_PC_64, _MCW_PC);
did not help.

Something is really weird here.

Thanks,
Lynn
 
Lynn McGuire said:
Bummer, neither of
_control87( _PC_64+_RC_NEAR, _MCW_PC+_MCW_RC);
_control87 (_PC_64, _MCW_PC);
did not help. Something is really weird here.

The only "really weird" thing here is that apparently you are not
comprehending my explanations. My bad: TMI!

The outcome with _control87 should come as no surprise because I already
demonstrated that no change in the rouned 64-bit value would result in
exactly zero.

I concluded, therefore, that it is the extended 80-bit precision that, by
coincidence, causes that particular example to become zero.

I say "by coincidence" because the 80-bit precision will not always cause
expressions of the form (x/y)*y-x to be zero, for integer x and y, just as
there are integer x and y where that expression is not zero using 64-bit
precision.

I never fully explained my reference to "80-bit precision". Perhaps you are
unaware....

Although type Double is represented in memory by 64-bit floating-point,
Intel CPUs use 80-bit floating-point registers to perform arithmetic.

Since the 80-bit FP registers are accessible to the CPU, compilers can take
advantage of them to store pairwise intermediate results.

For example, the compiler might put x into FP1 and y into FP2, compute
FP1/FP2 in FP3 (x/y), then compute FP3*FP2 ((x/y)*y) in FP3, and finally
compute FP3-FP1 in FP3 ((x/y)*y-x). Finally, FP3 would be rounded to 64-bit
and stored into a variable, chptst in your case.

That is what is happening in my VBA example procedure called testit2().

-----

As for why the C++ compiler might optimize the DLL when linked to a C++
application, but not when linked to an Excel/VBA application, both Martin
and I conjectured that that is simply how things work.

Perhaps someone who understands Microsoft C++ and DLLs better can offer a
more detailed factual explanation. I don't think we need any more wild
speculation ;-).
 
The only "really weird" thing here is that apparently you are not comprehending my explanations. My bad: TMI!

The outcome with _control87 should come as no surprise because I already demonstrated that no change in the rouned 64-bit value would
result in exactly zero.

I concluded, therefore, that it is the extended 80-bit precision that, by coincidence, causes that particular example to become zero.

I say "by coincidence" because the 80-bit precision will not always cause expressions of the form (x/y)*y-x to be zero, for integer x
and y, just as there are integer x and y where that expression is not zero using 64-bit precision.

I never fully explained my reference to "80-bit precision". Perhaps you are unaware....

Although type Double is represented in memory by 64-bit floating-point, Intel CPUs use 80-bit floating-point registers to perform
arithmetic.

Since the 80-bit FP registers are accessible to the CPU, compilers can take advantage of them to store pairwise intermediate results.

For example, the compiler might put x into FP1 and y into FP2, compute FP1/FP2 in FP3 (x/y), then compute FP3*FP2 ((x/y)*y) in FP3,
and finally compute FP3-FP1 in FP3 ((x/y)*y-x). Finally, FP3 would be rounded to 64-bit and stored into a variable, chptst in your case.

That is what is happening in my VBA example procedure called testit2().

-----

As for why the C++ compiler might optimize the DLL when linked to a C++ application, but not when linked to an Excel/VBA application,
both Martin and I conjectured that that is simply how things work.

Perhaps someone who understands Microsoft C++ and DLLs better can offer a more detailed factual explanation. I don't think we need
any more wild speculation ;-).

Yes, I comprehended your explanation. And I understand the
difference between 64 bit precision and 80 bit precision -
that is what the _PC_64 flag is for. I started programming
using 36 bit words for single precision (univac 1108). I've
been down this road before unfortunately.

BTW, I never said that my DLL was written in Visual C++.
My DLL is written in 700K lines of Fortran, C and C++ code.
But that is all built at compile and link time. The only
thing that can be a difference here is how the math
coprocessor is running.

I am currently intrigued by this conversation:
http://windowssecrets.com/forums/sh...nt-results-running-DLL-s-with-VB-(VB-Fortran)
"We finally solved the problem. What happened is the floating
point control of windows is set by each language according to
its own set of parameters. This affects how the math processor
rounds numbers and other various operations. C and Fortran
sets the floating point parameters equally, VB and VBA each
have their own set of parameters. We had to write a bit of
Assembly code to force C, Fortran, VB, and VBA to use the same
parameters for the floating point control. Now all four
languages now give the same answers. This was a doosie that
took 4 days to solve."

I wish that they had posted the code, it sounds very
interesting.

Thanks,
Lynn
 
Lynn McGuire was thinking very hard :
"We finally solved the problem. What happened is the floating
point control of windows is set by each language according to
its own set of parameters. This affects how the math processor
rounds numbers and other various operations. C and Fortran
sets the floating point parameters equally, VB and VBA each
have their own set of parameters. We had to write a bit of
Assembly code to force C, Fortran, VB, and VBA to use the same
parameters for the floating point control. Now all four
languages now give the same answers. This was a doosie that
took 4 days to solve."

This is what I was eluding to. Nice find!

--
Garry

Free usenet access at http://www.eternal-september.org
ClassicVB Users Regroup!
comp.lang.basic.visual.misc
microsoft.public.vb.general.discussion
 
Lynn McGuire said:
Yes, I comprehended your explanation. And I understand
the difference between 64 bit precision and 80 bit
precision - that is what the _PC_64 flag is for.

Okay. You said you tried these combinations:

_control87( _PC_64+_RC_NEAR, _MCW_PC+_MCW_RC);
_control87 (_PC_64, _MCW_PC);

In both cases, you have selected 80-bit arithmetic (64-bit mantissa). And I
quite sure that Excel (VBA) uses _PC_64+_RC_NEAR.

So there is one combination that remains:

_control87(_PC_53+_RC_NEAR, _MCW_PC+_MCW_RC)

I believe that forces the FPU to round each result to 64-bit floating-point.
It might even restrict the FPU to 64-bit floating-point.

In either case, that might mimick this behavior in VBA: each pairwise
operation is rounded to 64-bit floating-point.

Sub testit3()
Const top As Double = 4195835#
Const bottom As Double = 3145727#
Dim chptst As Double
Dim divtwo As Double
divtwo = top / bottom
divtwo = divtwo * bottom
chptst = divtwo - top
MsgBox Format(chptst, "0.000E+00")
End Sub

In this case, chptst is exactly zero.
 
OK. This looks like a rounding error in the 53 bit mantissa so I suspect
that the right coprocessor settings may be _PC_53.
REAL*8 in FORTRAN speak.
Bummer, neither of
_control87( _PC_64+_RC_NEAR, _MCW_PC+_MCW_RC);
_control87 (_PC_64, _MCW_PC);
did not help.

Try _PC_53 as the rounding mode (classic REAL*8) arithmetic. I think
what may be happening is that _PC_64 guard digits on the full precision
calculation using that nasty test case are causing trouble here.

When the result is stored to memory it is being rounded to nearest which
is not the same result computed at full 80bits (aka _PC_64) as at 64bits
(aka _PC_53). It doesn't help that some compilers generate optimised
code that works with intermediate results at full native width in the
FPU and rounds only when storing back to memory.
Something is really weird here.

You might also try to replicate the fault in the controlled C compiler
environment by doing it with each of the rounding modes and precisions.
 
Lynn McGuire said:
My DLL is written in 700K lines of Fortran, C and C++
code. But that is all built at compile and link time.

Not necessarily. In some architectures, a "late optimization phase" is
invoked when the DLL is loaded and linked to an application. This supports
a single binary DLL that can be used on a variety of architectures.

That is what I was alluding to earlier.

However, it's a moot point. I suspect we have established that the key
factor is _PC_53 v. _PC_64 mode.

I am still awaiting the results of your trying _PC_53 mode. But I believe
my explanation of your results with _PC_64 mode most likely point _PC_53
mode as the solution.
 
Not necessarily. In some architectures, a "late optimization phase" is invoked when the DLL is loaded and linked to an application.
This supports a single binary DLL that can be used on a variety of architectures.

That is what I was alluding to earlier.

However, it's a moot point. I suspect we have established that the key factor is _PC_53 v. _PC_64 mode.

I am still awaiting the results of your trying _PC_53 mode. But I believe my explanation of your results with _PC_64 mode most likely
point _PC_53 mode as the solution.

I tried _PC_53. Did not help.

Thanks,
Lynn
 
Note: This is a long article. Please do not quote the entire content in
any response. Instead, quote only the parts relevant to your respond, per
Usenet netiquette.


Lynn McGuire said:
I tried _PC_53. Did not help.

Works fine for me.

Since you provide no specifics about exactly what you did that did not work,
we could only speculate (wildly) about the mistakes you might be making.
That would not be fruitful.

First and foremost, I want to reiterate a point that I made earlier in this
thread: generally, you cannot expect an expression like (num/denom)*denom -
num to result in exactly zero. (Unless num is zero <wink>.)

I assume your objective is only for the DLL code called from Excel/VBA to
have the same result as the DLL code (or the exact same source) called from
an exe file, zero or not.

But the point again is: if you rounded floating-point arithmetic to the
precision that you expect, which is the correct thing to do in general, you
probably would not have this problem in the first place, notwithstanding
differences in FP modes in the two environments.

(However, arguably, explicit rounding might be difficult or might not apply
at all in your application. No way for us to know.)

So much for my "PSA". Now let's compare notes....

-----

I am using Visual C++ Express 2010 to create a DLL.

It is true that when the DLL code is executed in the VC++ IDE, the FP
control word is set to _PC_53 + _RC_NEAR.

(That surprises me. So I suspect that all Visual Studio languages behave
the same way in this respect.)

In contrast, the FP control word is set to _PC_64 + _RC_NEAR when Excel
executes.

AFAIK, there is no way to alter that, in part because I believe that Excel
and VBA execute in different threads, and the FP control word is
thread-specific like all CPU registers.

VBA also does start with the FP control word set to _PC_64 + _RC_NEAR.

And based on my experiments, it appears that VBA does reset the FP control
word to _PC_64 each time Excel calls a VBA function and when we first start
to execute a "macro" (sub).

(Alternatively, VBA might reset the FP control word to _PC_64 when "an
execution" exits to Excel or terminates. I don't believe we can tell the
difference.)

Aside.... Interestingly, VBA does not change the rounding mode (_MCW_RC).
If we change it from _RC_NEAR, it remains in effect until we change it again
(or terminate Excel of course).

In any case, the important take-away is: if we set the FP control word
during "an execution" in VBA, it remains in effect until "the execution"
terminates or exits.

So we can set the FP control word to _RC_53 at the beginning of "an
execution" in VBA by calling a DLL function for that specific purpose (see
setpr below).

Alternatively, we can set the FP control word to _RC_53 in each DLL routine
or just one or two "key" DLL routines called during "the execution" (see
lynntest2 below).

Consider the following VBA example....

Note: Be sure to see some important comments about the DLL code below.
They might be pertinent to your problems.

------

Const num As Double = 4195835
Const denom As Double = 3145727


Private Sub testit1()
Debug.Print "--------------------"
Debug.Print "fpcw: "; myHex(getfpcw())
Call lynn1
Call lynn2 ' sets _PC_53
Call lynn1
End Sub


Private Sub testit2()
Debug.Print "--------------------"
Debug.Print "fpcw: "; myHex(getfpcw())
Debug.Print "fpcw: "; myHex(setpr(2, 0, 1)) ' sets _PC_53
Call lynn1
Call lynn1
End Sub


Private Sub lynn1()
Dim res As Double
res = lynntest1(num, denom)
Debug.Print "num: "; num; " denom: "; denom; _
" res: "; Format(res, "0.0000E+00")
Debug.Print "fpcw: "; myHex(getfpcw())
End Sub


Private Sub lynn2()
Dim res As Double
res = lynntest2(num, denom)
Debug.Print "num: "; num; " denom: "; denom; _
" res: "; Format(res, "0.0000E+00")
Debug.Print "fpcw: "; myHex(getfpcw())
End Sub

-----

The output from testit1 is:

fpcw: 0x00000000
num: 4195835 denom: 3145727 res: 2.8513E-10
fpcw: 0x00000000
num: 4195835 denom: 3145727 res: 0.0000E+00
fpcw: 0x00010000
num: 4195835 denom: 3145727 res: 0.0000E+00
fpcw: 0x00010000

The output from testit2 is:

fpcw: 0x00000000
fpcw: 0x00010000
num: 4195835 denom: 3145727 res: 0.0000E+00
fpcw: 0x00010000
num: 4195835 denom: 3145727 res: 0.0000E+00
fpcw: 0x00010000

Note that "res" is non-zero only when the FP control word is _PC_64 +
_RC_NEAR (0x00000000).

In contrast, "res" is zero when the FP control word is _PC_53 + _RC_NEAR
(0x00010000), as it is when the DLL code is executed in an exe or the VC++
IDE.

-----

myHex is an irrelevant formatting function.

getfpcw, setpr, lynntest1 and lynntest2 are DLL routines implemented in
VC++. They are declared in VBA as follows:

Declare Function getfpcw Lib "C:\...\fpcwlib.dll" () As Double

Declare Function setpr Lib "C:\...\fpcwlib.dll" _
(rc As Double, pc As Double, flag As Double) As Double

' lynntest1 does not change fpcw
Declare Function lynntest1 Lib "C:\...\fpcwlib.dll" _
(num As Double, denom As Double) As Double

' lynntest2 sets fpcw to _PC_53
Declare Function lynntest2 Lib "C:\...\fpcwlib.dll" _
(num As Double, denom As Double) As Double

Note: I use ByRef and Double so that the DLL functions can be called
"directly" from an Excel worksheet (for academic, not practical purposes),
even though the FP control word is an unsigned int (32 bits). It's an
irrelevant personal design choice. You might prefer something different.

The VC++ code is (see some important comments below)....

-----

#include <float.h>
#pragma fenv_access (on) // disable fp compile-time optimization


double __stdcall lynntest1(double &num, double &denom)
{
double div;
div = num / denom;
return div*denom - num;
}


double __stdcall lynntest2(double &num, double &denom)
{
// same as lynntest1, but sets _PC_53 itself
unsigned int curfpcw;
_controlfp_s(&curfpcw, 0, 0);
_controlfp_s(&curfpcw, curfpcw, _MCW_PC + _MCW_RC); // see footnotes
_controlfp_s(&curfpcw, _PC_53, _MCW_PC);
return lynntest1(num, denom);
}


double __stdcall getfpcw()
{
unsigned int curfpcw;
_controlfp_s(&curfpcw, 0, 0);
_controlfp_s(&curfpcw, curfpcw, _MCW_PC + _MCW_RC); // see footnotes
return curfpcw & (_MCW_PC + _MCW_RC);
}


double __stdcall setpr(double &pc, double &rc, double &flag0)
{
// pc = 1, 2, 3 for _PC_64, _PC_53, _PC_24}
// rc = 1, 2, 3, 4 for _RC_NEAR, _RC_DOWN, _RC_UP, _RC_CHOP
// flag0 = 1, 2, 3 for MCW_PC, MCW_RC, MCW_PC + MCW_RC
unsigned int fpcw, mask, flag;
unsigned int curfpcw;
flag = (unsigned int)flag0;
fpcw = 0;
mask = 0;
if (flag & 1)
{
switch ((int)pc)
{
case 1: fpcw = _PC_64; mask = _MCW_PC; break;
case 2: fpcw = _PC_53; mask = _MCW_PC; break;
case 3: fpcw = _PC_24; mask = _MCW_PC; break;
}
}
if (flag & 2)
{
switch ((int)rc) {
case 1: fpcw += _RC_NEAR; mask += _MCW_RC; break;
case 2: fpcw += _RC_DOWN; mask += _MCW_RC; break;
case 3: fpcw += _RC_UP; mask += _MCW_RC; break;
case 4: fpcw += _RC_CHOP; mask += _MCW_RC; break;
}
}
_controlfp_s(&curfpcw, 0, 0);
_controlfp_s(&curfpcw, curfpcw, _MCW_PC + _MCW_RC); // see footnotes
if (mask != 0) _controlfp_s(&curfpcw, fpcw, mask);
return curfpcw & (_MCW_PC + _MCW_RC);
}

-----

Notes on DLL code....


1. I use _controlfp_s because _control87 and _controlfp are deprecated. I
also tested _controlfp, and I saw no difference. I did not test _control87.


2. For reliability reasons, I always get and re-set the current FP control
word before setting it per requirements. I discovered an inexplicable
problem when setting only _PC_64 when _PC_64 is already set and the rounding
mode is not _RC_NEAR. FP arithmetic behaves as if _PC_64 + _RC_NEAR is set,
even though getfpcw shows that _RC_NEAR is not set(!).

Based on my experiments, this is not a problem when setting only _PC_53.
But I leave the reliable implementation in place for your edification and
just in case you use this code to set other FP modes.


3. Note the #pragma fenv_access (on) directive. This is needed per VC++
documentation. Otherwise, compile-time FP optimization (e.g. evaluation of
constant FP subexpressions) might not have the effect intended by changing
the FP control word at runtime (of course).

I don't know if or how you can avoid compile-time FP optimization in other
Visual Studio languages.


4. setpr is designed to abstract (hide) the details of the FP control word.
It seems easier than remembering, combining and passing the equivalent hex
constants in VBA. This is an irrelevant personal choice. You might prefer
something else.
 
I tried _PC_53. Did not help.

You must have done it wrong then, or something else is afoot.

When I try it with a fairly trivial but incredibly ugly toy code to
force each possible state of the x87 I get the following output.

// toy87.cpp : Defines the entry point for the console application.
//

#include "stdafx.h"
#include "float.h"

double testit(int tag1, int tag2)
{
double chptst;
double divtwo;

const double top = 4195835.0;
const double bottom = 3145727.0;
_control87( tag1+tag2, _MCW_PC+_MCW_RC);

divtwo = top / bottom;
chptst = (divtwo * bottom) - top ;
return chptst;
}


int _tmain(int argc, _TCHAR* argv[])
{
for (int prec = _PC_64; prec<=_PC_24; prec += _PC_53)
for (int round = _RC_NEAR; round <= _RC_CHOP; round += _RC_DOWN)
printf("testit(%i,%i) = %e\n", prec/_PC_53, round/_RC_DOWN,
testit(prec,round));
return 0;
}

*NB* If I allow the optimiser any freedom at all it compiles to fldz

Here is the output cycling though all rounding modes of each precision:

_64_
testit(0,0) = 2.851266e-010
testit(0,1) = -4.138201e-010
testit(0,2) = 2.851266e-010
testit(0,3) = -4.138201e-010

_53_
testit(1,0) = 0.000000e+000
testit(1,1) = -9.313226e-010
testit(1,2) = 9.313226e-010
testit(1,3) = -9.313226e-010

_24_
testit(2,0) = 0.000000e+000
testit(2,1) = -5.000000e-001
testit(2,2) = 5.000000e-001
testit(2,3) = -5.000000e-001

My guess would be that you have not forced the numeric CPU into the
right state. Round to nearest gives exactly 0.0 for both 32bit and 64bit
floating point but for full 80bit hardware precision it gives an "error"
in the 54th bit. The example was constructed to trigger a rare iteration
fault in a particular CPU series on FP divide ages ago.
 
Back
Top