Avoiding redundant calculations

  • Thread starter Thread starter Walter Briscoe
  • Start date Start date
W

Walter Briscoe

I am using Excel 2003 on Windows Vista.
Mostly, I do little calculation in worksheets. I do everything with VBA
in PERSONAL.XLS.
However, I have a new working formula:
<=INT(--(A2<>A1)*VLOOKUP($F2,Stations,38,TRUE)+--(A2<>A3)*VLOOKUP($G2,St
ations,38,TRUE))>
which is copied through to about 3000 rows.
i.e. it does about 6000 VLOOKUP calls and takes a few minutes.
--(A2<>A1) and -(A2<>A3) evaluate as 1 or 0.
The VLOOKUP calls are unnecessary if those expressions result in 0.
This applies to about 80% of the VLOOKUP calls.
I would like to avoid that 80%.
One thought is the IF function, but that does not do lazy evaluation.
e.g. =IF(True,Truepart,Falsepart) calculates both Truepart and
Falsepart.
I could use a UDF, but prefer the simplicity of calculations in
worksheets.
I would value light from microsoft.public.excel.worksheet.functions.
 
Hi Walter,

Am Sun, 18 Aug 2013 08:25:28 +0100 schrieb Walter Briscoe:
<=INT(--(A2<>A1)*VLOOKUP($F2,Stations,38,TRUE)+--(A2<>A3)*VLOOKUP($G2,St
ations,38,TRUE))

try:
=INT(IF(A2<>A1,VLOOKUP($F2,Stations,38,TRUE))+IF(A2<>A3,VLOOKUP($G2,Stations,38,TRUE)))


Regards
Claus B.
 
Hi Walter,

Am Sun, 18 Aug 2013 08:25:28 +0100 schrieb Walter Briscoe:
One thought is the IF function, but that does not do lazy evaluation.
e.g. =IF(True,Truepart,Falsepart) calculates both Truepart and
Falsepart.

start the Evaluate Formula dialog. Then you will see that the falsepart
will not be calculated


Regards
Claus B.
 
Walter Briscoe said:
One thought is the IF function, but that does not do lazy
evaluation. e.g. =IF(True,Truepart,Falsepart) calculates
both Truepart and Falsepart.

That would be correrct if IF() behaved like a real function. For example,
the problem with AND(ISNUMBER(B1),B1*B2>B3) is that AND() does indeed
calculate all of its parameter, even if the a left-hand parameter is FALSE
(e.g. B1 is text), making it unnecessary to calculate parameters to the
right.

However, IF() behaves like an operator whose syntax looks like a function.
In this case, it calculates its parameters left-to-right only as needed. We
rely on this "all the time". For example,
IF(ISNUMBER(B1),IF(B1*B2>B3,1,2),3) avoids the #VALUE we get with AND() when
B1 is not numeric.

Some other worksheet "functions" work the same way, notably CHOOSE().
CHOOSE() calculates only its first parameter and the parameter selected by
it.
 
In message <[email protected]> of Sun, 18 Aug 2013 08:39:21 in
microsoft.public.excel.worksheet.functions said:
That would be correrct if IF() behaved like a real function. For
example, the problem with AND(ISNUMBER(B1),B1*B2>B3) is that AND() does
indeed calculate all of its parameter, even if the a left-hand
parameter is FALSE (e.g. B1 is text), making it unnecessary to
calculate parameters to the right.

However, IF() behaves like an operator whose syntax looks like a
function. In this case, it calculates its parameters left-to-right only
as needed. We rely on this "all the time". For example, IF(ISNUMBER(B
1),IF(B1*B2>B3,1,2),3) avoids the #VALUE we get with AND() when B1 is
not numeric.

Some other worksheet "functions" work the same way, notably CHOOSE().
CHOOSE() calculates only its first parameter and the parameter selected
by it.

Thanks to both joeu2004 and Claus for your prompt replies.
I was wrong and you are both right about the operation of IF.
I was thinking of the VBA function IIF.
I did step through the calculations of IF and
IF(logical_test,value_if_true,value_if_false)
value_if_true is only evaluated if logical_test is TRUE.
This is different from the VBA function
IIf(expr, truepart, falsepart).

I changed my expression to refer to fixed columns. i.e. to
<=INT(--($A2<>$A1)*VLOOKUP($F2,Stations,38,TRUE)+--($A2<>$A3)*VLOOKUP($G2,Stations,38,TRUE))>
I timed the copy through at 13 minutes and 24 seconds.

I put Claus's idea in the next column and copied through.
<=INT(IF($A2<>$A1,VLOOKUP($F2,Stations,38,TRUE))+IF($A2<>$A3,VLOOKUP($G2,Stations,38,TRUE)))>
I was surprised the 2 formulae copied through in the same time. ;)

I am surprised.

I tried =countif(N2:N2899,0), but Excel seemed not to like countif on
furmulae resulting in 0.

I copied the data and pasted the values and found 1942 of the values
were 0. i.e. I would expect a saving of at least 70%.
Plainly I don't understand what is going on. ;(
 
[editing out the extraneous said:
I changed my expression to refer to fixed columns. i.e. to
=INT(--($A2<>$A1)*VLOOKUP($F2,Stations,38,TRUE)
+--($A2<>$A3)*VLOOKUP($G2,Stations,38,TRUE))
I timed the copy through at 13 minutes and 24 seconds.

I put Claus's idea in the next column and copied through.
=INT(IF($A2<>$A1,VLOOKUP($F2,Stations,38,TRUE))
+IF($A2<>$A3,VLOOKUP($G2,Stations,38,TRUE)))
I was surprised the 2 formulae copied through in the same time. [....]
Plainly I don't understand what is going on.

I agree.

FYI, on my old (read: slow) single-CPU laptop, 3000 cells with Claus's
formula calculates in under 0.04 seconds, even when the lookup column
comprises of 1,048,576 rows, the max in Excel 2007 and later, and all
conditional expressions are TRUE (i.e. 6000 VLOOKUPs).

Of course, that's with constants in all cells; and those are the only values
and formulas in the (new) workbook.

PS: If you are using Excel 2003 and you have a COUNTIF formula that
references the column into which you are copying the VLOOKUP formulas, the
COUNTIF might be causing the large recalculation time. IIRC, I found that
to be the case in Excel 2003 per se, at least sometimes. (I had difficulty
duplicating the behavior.) The problem seemed to be fixed in Excel 2007 and
later.


Walter Briscoe said:
I tried =countif(N2:N2899,0), but Excel seemed not to like
countif on furmulae resulting in 0.

I don't know "does not like" means. Anyway, in general, COUNTIF has no
problem with counting formulas that result in zero.
 
In message <[email protected]> of Mon, 19 Aug 2013 08:14:51 in
microsoft.public.excel.worksheet.functions said:
[editing out the extraneous said:
I changed my expression to refer to fixed columns. i.e. to
=INT(--($A2<>$A1)*VLOOKUP($F2,Stations,38,TRUE)
+--($A2<>$A3)*VLOOKUP($G2,Stations,38,TRUE))
I timed the copy through at 13 minutes and 24 seconds.

I put Claus's idea in the next column and copied through.
=INT(IF($A2<>$A1,VLOOKUP($F2,Stations,38,TRUE))
+IF($A2<>$A3,VLOOKUP($G2,Stations,38,TRUE)))
I was surprised the 2 formulae copied through in the same time. [....]
Plainly I don't understand what is going on.

I agree.

FYI, on my old (read: slow) single-CPU laptop, 3000 cells with Claus's
formula calculates in under 0.04 seconds, even when the lookup column
comprises of 1,048,576 rows, the max in Excel 2007 and later, and all
conditional expressions are TRUE (i.e. 6000 VLOOKUPs).

Of course, that's with constants in all cells; and those are the only
values and formulas in the (new) workbook.
stations is a reference to a worksheet in a closed workbook.
Stations refers to =[stations.xls]Sheet1!$A$1:$CB$427.
I can't remember how [stations.xls] references
C:\Users\IBM\AppData\Roaming\Microsoft\Excel\stations.xls

If I open that workbook, my 13:25 becomes ~0:10
How do you measure 0.04 seconds? I am using my wrist watch.
PS: If you are using Excel 2003 and you have a COUNTIF formula that
references the column into which you are copying the VLOOKUP formulas,
the COUNTIF might be causing the large recalculation time. IIRC, I
found that to be the case in Excel 2003 per se, at least sometimes. (I
had difficulty duplicating the behavior.) The problem seemed to be
fixed in Excel 2007 and later.

The COUNTIF was done to count the number of cells in which no VLOOKUP
call is actually made.
 
Clarifications....

That is 0.040 seconds (really 0.037 sec) to recalculate 3000 formulas
already entered into cells. It takes about 0.090 sec to copy-and-paste.
I'm not sure if you are measuring copy-and-paste time or simply
recalculation time. I will use recalculation times below

Both numbers are with ScreenUpdating=True (normal mode). The times are 0.26
sec and 0.70 sec respectively with ScreenUpdating=False. That is useful
only if you want to put the operation into a VBA macro. It might make only
a small (imperceptible?) difference in your case. I will leave
ScreenUpdating=True for my times below.

Also, I should note that all times are for a worst-case (longest) lookup.
But I'm not sure that makes a significant difference since with
VLOOKUP(...,TRUE), there are at most 20 (or 21?) comparisons when the lookup
column comprises 1,048,576 rows.

In any case, since you are using Excel 2003 and you indicate that the lookup
data comprises only 450 rows, I will use that scenario, since the file size
might have significant time in external file references.

Finally, I reiterate that YMMV depending on the complexity of the workbook
besides the formulas and data you tell us about, as well as on your
computer's characteristics.


Walter Briscoe said:
stations is a reference to a worksheet in a closed workbook.
Stations refers to =[stations.xls]Sheet1!$A$1:$CB$427.
I can't remember how [stations.xls] references
C:\Users\IBM\AppData\Roaming\Microsoft\Excel\stations.xls

That is quite a significant "little detail" to omit! That makes a __huge__
difference in the VLOOKUP time, especially if stations.xls is closed. (Not
recommended.)

If stations.xls is closed in the VLOOKUP Excel instance, 3000 statements
(6000 VLOOKUPs) takes 31 to 33 sec. When stations.xls is open in the
VLOOKUP Excel instance, it takes about 0.033 sec.

So the first major performance improvement would be to ensure that
stations.xls is open in the same Excel instance.

Note: My "data.xls" file contains only A1:CB450 data (contants). If your
stations.xls has a lot of other data, even if they are constants, that might
significantly affect the file size, which might increase the close-file
VLOOKUP reference time significantly. That might explain why my "long" time
is about 33 sec, whereas yours is about 13 min.


Walter Briscoe said:
If I open that workbook, my 13:25 becomes ~0:10

Ding!

Much ado about nothing. You had already come to the right conclusion
yourself.

FYI, that explains why the reference is =[stations.xls]Sheet1!$A$1:$CB$427
instead of
=[C:\Users\IBM\AppData\Roaming\Microsoft\Excel\stations.xls]Sheet1!$A$1:$CB$427.
Excel automagically adjusts the reference when you open the file in the same
Excel instance.


Walter Briscoe said:
How do you measure 0.04 seconds? I am using my wrist watch.

For long-ish times, i.e. a lot more than 15.625 milliseconds, you could do
something like the following:

Sub doit()
Dim t As Double
Dim st As Single, et As Single ' type of Timer
st = Timer
Range("b1:b3000").Dirty ' range with VLOOKUP formulas
et = Timer
t = et - st ' seconds
MsgBox Format(t, "0.000000")
End Sub

Caveats....

The elapsed time might be off by as much as almost 15.625 msec; 0.015625
sec.

Performance measurement and interpretation is part art as well as science.
For example, the time for Range("b1").Dirty is misleading because it
includes significant overhead due to "communciation" between Excel and VBA.
The time for Range("b1:b3000").Dirty overwhelms that overhead, so the
overhead becomes insignificant (we hope). However, it opens the door for a
lot of extraneous computer activity (other events). On the other hand,
arguably that is realistic. It depends on what exactly we want to measure.

Another issue is whether or not to disable ScreenUpdating and Automatic
recalculation. Again, it depends on what exactly we want to measure. In
this case, I purposely keep those features enabled in order to mimic the
"real world" situation.

But if we did disable Automatic recalculation, we must use Range.Calculate
instead of Range.Dirty.

Either way can result in significant variability of time measurements.
Moreover, I have found that there is signficant VBA overhead the first time
a code path is executed after any change. And some overhead arises each
time a macro is executed. So I routinely ignore the first execution of a
code path. For "serious" measurements, I collect a "sufficient" sampling
for statistical analysis.

Finally, usually I use a timer with greater resolution. Besides avoiding
the off-by-16-msec anomaly, it is useful for measuring small intervals (if
we can control overhead). Here are the macros that I use.

Option Explicit

Public Declare Sub Sleep Lib "kernel32" (ByVal msec As Long)
Public Declare Function QueryPerformanceFrequency Lib "kernel32" _
(ByRef freq As Currency) As Long
Public Declare Function QueryPerformanceCounter Lib "kernel32" _
(ByRef cnt As Currency) As Long

Private freq As Currency, df As Double

Function myTimer() As Currency
QueryPerformanceCounter myTimer
End Function

Function myElapsedTime(ByVal dt As Currency) As Double
If freq = 0 Then QueryPerformanceFrequency freq: df = freq
myElapsedTime = dt / df
End Function


Usage:

Dim t as Double
Dim st As Currency, et As Currency
st = myTimer
' ... activity to measure ...
et = myTimer
t = myElapsedTime(et - st) ' seconds
 
Back
Top