comparing two arrays

  • Thread starter Thread starter Matthew Dyer
  • Start date Start date
M

Matthew Dyer

There's got to be a faster way to do this, so I come to the brain trust asking for brainy guidance!
I am comparing two ranges, each with appx 7000 items in them. I want to seeif an item in list1 is also in list2. If it is, i want to hide that row. Right now, I am going through list1, item by item, and comparing it to list2item by item via a double loop to achieve this. The problem is that this results in appx 40,320,350 individual iterations of my loop, thus slowing things down considerably. Even with screen updating turned off this is takingquite a bit of time. Is there a way to make this run faster? below is my code:

Sub compare()
Application.ScreenUpdating = False
Dim todaycell As Range
Dim yesterdaycell As Range
For i = 1 To lastrow(Sheet9)
Set yesterdaycell = Sheet9.Range("a" & i)
For ii = 1 To lastrow(Sheet10)
Set todaycell = Sheet10.Range("a" & ii)
If yesterdaycell.Value = todaycell.Value Then
yesterdaycell.EntireRow.Hidden = True
End If
Next ii
Next i
Application.ScreenUpdating = True
End Sub
 
Matthew Dyer said:
I am comparing two ranges, each with appx 7000 items
in them. [....] The problem is that this results in
appx 40,320,350 individual iterations of my loop
Is there a way to make this run faster?

Several. First and foremost: copy the values of each range into variant
arrays. Also note the GoTo; that avoids continuing to compare with rows
that you have already hidden. Thus (untested):

Option Explicit

Sub compare()
Dim todaycells As Variant, yesterdaycells As Variant
Dim r as Range
Dim n9 as Long, n10 as Long, i as Long, ii as Long
Application.ScreenUpdating = False
n9 = lastrow(Sheet9)
n10 = lastrow(Sheet10)
Set r = Sheet9.Range("a1:a" & n9)
yesterdaycells = r
todaycells = Sheet10.Range("a1:a" & n10)
For i = 1 To n9
For ii = 1 To n10
If yesterdaycells(i) = todaycells(ii) Then
r(i).EntireRow.Hidden = True
GoTo getNext9
End If
Next ii

getNext9:

Next i
Application.ScreenUpdating = True
End Sub

It might also improve speed if you build a Union of rows to hide. But I
don't know what and if any size limits might apply.
 
Improvement, replacing GoTo with more-structured Exit For.

Option Explicit

Sub compare()
Dim todaycells As Variant, yesterdaycells As Variant
Dim r as Range
Dim n9 as Long, n10 as Long, i as Long, ii as Long
Application.ScreenUpdating = False
n9 = lastrow(Sheet9)
n10 = lastrow(Sheet10)
Set r = Sheet9.Range("a1:a" & n9)
yesterdaycells = r
todaycells = Sheet10.Range("a1:a" & n10)
For i = 1 To n9
For ii = 1 To n10
If yesterdaycells(i) = todaycells(ii) Then
r(i).EntireRow.Hidden = True
Exit For
End If
Next ii
Next i
Application.ScreenUpdating = True
End Sub
 
Improvement, replacing GoTo with more-structured Exit For. Option Explicit Sub compare() Dim todaycells As Variant, yesterdaycells As Variant Dim r as Range Dim n9 as Long, n10 as Long, i as Long, ii as Long Application.ScreenUpdating = False n9 = lastrow(Sheet9) n10 = lastrow(Sheet10) Set r= Sheet9.Range("a1:a" & n9) yesterdaycells = r todaycells = Sheet10.Range("a1:a" & n10) For i = 1 To n9 For ii = 1 To n10 If yesterdaycells(i) = todaycells(ii) Then r(i).EntireRow.Hidden = True Exit For End If Next ii Next i Application.ScreenUpdating = True End Sub

combining both of your reccomendations I was able to significantly cut downthe time by well over half! By setting both today and yesterday ranges in active memory then running my itterations that way it runs much more efficiently! Plus the Exit For line makes sure i'm not running additional iterations for a 'find' that already occured! Thanks for the help!

Sub compare()
Application.ScreenUpdating = False
Dim todaycell As Variant
Dim yesterdaycell As Variant
Dim todayrange As Range
Dim yesterdayrange As Range
Set todayrange = Sheet10.Range("a1:a" & lastrow(Sheet10))
Set yesterdayrange = Sheet9.Range("a1:a" & lastrow(Sheet9))

For i = 1 To lastrow(Sheet9)
Set yesterdaycell = yesterdayrange(i)
For ii = 1 To lastrow(Sheet10)
Set todaycell = todayrange(ii)
If yesterdaycell = todaycell Then
yesterdayrange(i).EntireRow.Hidden = True
Exit For
End If
Next ii
Next i
Application.ScreenUpdating = True
End Sub
 
Matthew Dyer said:
combining both of your reccomendations I was able
to significantly cut down the time by well over half!
By setting both today and yesterday ranges in active
memory then running [....]
Set todayrange = Sheet10.Range("a1:a" & lastrow(Sheet10))
Set yesterdayrange = Sheet9.Range("a1:a" & lastrow(Sheet9))
For i = 1 To lastrow(Sheet9)
Set yesterdaycell = yesterdayrange(i)
For ii = 1 To lastrow(Sheet10)
Set todaycell = todayrange(ii)
If yesterdaycell = todaycell Then
yesterdayrange(i).EntireRow.Hidden = True
Exit For
End If
Next ii
Next i

You can do much better. The savings by half is probably just due to the
Exit For.

You missed a key point of my suggestion, namely: to copy the values of each
range into variant arrays.

By use ``Set rangeVar = range``, you are not bring each into "active
memory". Instead, you are merely setting up a "pointer", very much like
what a variable __name__ is to its __value__.

The point is: with your implementation, each time you reference
todayrange(ii), VBA must ask Excel for the value. At a mimimum, you are
making about 14,000 such requests -- 7000 for each range. But in fact, you
are making up to about 42 million such requests because you might reference
the same todayrange(ii) for each of 7000 iterations through yesterdayrange.

These "requests" are much more expensive than merely indexing into an array
variable. Excel and VBA are separate threads, almost like separate
processes. Moving data between the two requires a lot of system overhead.

By comparison, what I wrote requires only two such requests (maybe three),
one for each range. (The additional requests to hide rows is the same in
both cases.)

(It is unclear whether each Set statement results in communication between
VBA and Excel. I think it does.)

Implement my suggestion exactly as I wrote it. But I forgot to include
timing code so that you can accurately measure the performance. I also
added .Value qualifiers; they are unnecessary in the context, but it might
make it clearer to you what the code is doing. To wit:

Option Explicit

Sub compare()
Dim todaycells As Variant, yesterdaycells As Variant
Dim r as Range
Dim n9 as Long, n10 as Long, i as Long, ii as Long
Dim st as Single, et as Single
Application.ScreenUpdating = False
st = Timer
n9 = lastrow(Sheet9)
n10 = lastrow(Sheet10)
Set r = Sheet9.Range("a1:a" & n9)
yesterdaycells = r.Value
todaycells = Sheet10.Range("a1:a" & n10).Value
For i = 1 To n9
For ii = 1 To n10
If yesterdaycells(i) = todaycells(ii) Then
r(i).EntireRow.Hidden = True
Exit For
End If
Next ii
Next i
et = Timer
Application.ScreenUpdating = True
MsgBox Format(et - st, "0.000") & " sec"
End Sub
 
"Matthew Dyer" <[email protected]> wrote: > combining both of yourreccomendations I was able > to significantly cut down the time by well over half! > By setting both today and yesterday ranges in active > memory then running [....] > Set todayrange = Sheet10.Range("a1:a" & lastrow(Sheet10)) > Set yesterdayrange = Sheet9.Range("a1:a" & lastrow(Sheet9)) > For i = 1 To lastrow(Sheet9) > Set yesterdaycell = yesterdayrange(i) > For ii = 1 To lastrow(Sheet10) > Set todaycell = todayrange(ii) > If yesterdaycell = todaycell Then > yesterdayrange(i).EntireRow.Hidden = True > Exit For > End If > Next ii > Next i You can do much better. The savings byhalf is probably just due to the Exit For. You missed a key point of my suggestion, namely: to copy the values of each range into variant arrays. By use ``Set rangeVar = range``, you are not bring each into "active memory".. Instead, you are merely setting up a "pointer", very much like what a variable __name__ is to its __value__. The point is: with your implementation,each time you reference todayrange(ii), VBA must ask Excel for the value. At a mimimum, you are making about 14,000 such requests -- 7000 for each range. But in fact, you are making up to about 42 million such requests because you might reference the same todayrange(ii) for each of 7000 iterations through yesterdayrange. These "requests" are much more expensive than merely indexing into an array variable. Excel and VBA are separate threads, almost like separate processes. Moving data between the two requires a lot of system overhead. By comparison, what I wrote requires only two such requests(maybe three), one for each range. (The additional requests to hide rows is the same in both cases.) (It is unclear whether each Set statement results in communication between VBA and Excel. I think it does.) Implement my suggestion exactly as I wrote it. But I forgot to include timing code so thatyou can accurately measure the performance. I also added .Value qualifiers; they are unnecessary in the context, but it might make it clearer to you what the code is doing. To wit: Option Explicit Sub compare() Dim todaycells As Variant, yesterdaycells As Variant Dim r as Range Dim n9 as Long, n10 as Long, i as Long, ii as Long Dim st as Single, et as Single Application.ScreenUpdating = False st = Timer n9 = lastrow(Sheet9) n10 = lastrow(Sheet10) Set r = Sheet9.Range("a1:a" & n9) yesterdaycells = r.Value todaycells = Sheet10.Range("a1:a" & n10).Value For i = 1 To n9 For ii =1 To n10 If yesterdaycells(i) = todaycells(ii) Then r(i).EntireRow.Hidden = True Exit For End If Next ii Next i et = Timer Application.ScreenUpdating = True MsgBox Format(et - st, "0.000") & " sec" End Sub

I tried copy/paste your code exactly and i get an error at the comparison statement (if x = x). When i said i cut the time by over half, i was under-estimating the time saved. the origional code would have taken appx 405 seconds to run... my version only takes 95 seconds, about 23.5% of the time. With the Exit For statement we also cut the number of comparisons down from45 mil to 21.5 mil.
 
"Matthew Dyer" wrote: > combining both of your reccomendations I was able> to significantly cut down the time by well over half! > By setting both today and yesterday ranges in active > memory then running [....] > Set todayrange = Sheet10.Range("a1:a" & lastrow(Sheet10)) > Set yesterdayrange = Sheet9.Range("a1:a" & lastrow(Sheet9)) > For i = 1 To lastrow(Sheet9)> Set yesterdaycell = yesterdayrange(i) > For ii = 1 To lastrow(Sheet10) > Set todaycell = todayrange(ii) > If yesterdaycell = todaycell Then> yesterdayrange(i).EntireRow.Hidden = True > Exit For > End If > Next ii > Next i You can do much better. The savings by half is probably just dueto the Exit For. You missed a key point of my suggestion, namely: to copy the values of each range into variant arrays. By use ``Set rangeVar = range``, you are not bring each into "active memory". Instead, you are merely setting up a "pointer", very much like what a variable __name__ is to its __value__. The point is: with your implementation, each time you reference todayrange(ii), VBA must ask Excel for the value. At a mimimum, you are making about 14,000 such requests -- 7000 for each range. But in fact, you are making up to about 42 million such requests because you might reference thesame todayrange(ii) for each of 7000 iterations through yesterdayrange. These "requests" are much more expensive than merely indexing into an array variable. Excel and VBA are separate threads, almost like separate processes.. Moving data between the two requires a lot of system overhead. By comparison, what I wrote requires only two such requests (maybe three), one for each range. (The additional requests to hide rows is the same in both cases.)(It is unclear whether each Set statement results in communication betweenVBA and Excel. I think it does.) Implement my suggestion exactly as I wrote it. But I forgot to include timing code so that you can accurately measure the performance. I also added .Value qualifiers; they are unnecessary in the context, but it might make it clearer to you what the code is doing. Towit: Option Explicit Sub compare() Dim todaycells As Variant, yesterdaycells As Variant Dim r as Range Dim n9 as Long, n10 as Long, i as Long, ii as Long Dim st as Single, et as Single Application.ScreenUpdating = False st= Timer n9 = lastrow(Sheet9) n10 = lastrow(Sheet10) Set r = Sheet9..Range("a1:a" & n9) yesterdaycells = r.Value todaycells = Sheet10.Range("a1:a" & n10).Value For i = 1 To n9 For ii = 1 To n10 If yesterdaycells(i) = todaycells(ii) Then r(i).EntireRow.Hidden = True Exit For End IfNext ii Next i et = Timer Application.ScreenUpdating = True MsgBox Format(et - st, "0.000") & " sec" End Sub

I tried a copy/paste of your code but get an error message at the comparison statement. I think it's because you are setting 'r' range to the entire range of sheet9, then you set yesterdaycells to r.value... that's one of thereasons i didn't use your code explicitly in my version, i saw the usage of the r variable as redundant.

in either case, your setting the ranges in vba and then doing the comparisons directly through vba and only directing back to excel in the hide row event was key to getting the compile time down. I was able to get my run timedown from over 6 min to only 95 second. thanks again for all your help!
 
Matthew Dyer said:
I tried copy/paste your code exactly and i get an
error at the comparison statement (if x = x).

Matthew Dyer said:
I tried a copy/paste of your code but get an error
message at the comparison statement. I think it's
because you are setting 'r' range to the entire range
of sheet9, then you set yesterdaycells to r.value

No. Your first assessment was correct. I had a common typo (for me):
missing the column index. When we copy even a column range into a variant
array, the result is a 2-dimensional array. Mea culpa!

See the correct and now-tested implementation below.


Matthew Dyer said:
I was able to get my run time down from over 6 min
to only 95 second. thanks again for all your help!

You probably can do a __lot__ better.

I tested with two ranges of 1000 cells each, resulting in hiding about 10%.

With your original algorithm, it took about 15 seconds. With your improved
algorithm, it took about 2 seconds. With my algorithm below, it took about
0.3 seconds.

Of course, YMMV due to the size of data and number of hidden rows as well as
differences between computers.

Minor improvements to note are:

1. lastrow() is called only __once__ for each range.

2. And again, VBA communicates with Excel to access each cell value only
__once__.

3. Declaring integer variables (e.g. i and ii) as type Long.

Note that I use Long, not Integer. It makes no performance difference in
modern computers. But importantly, it allows for the possibility that
ranges exceed the numerical range of type Integer.

PS: Setting yesterdaySheet and todaySheet is "good practice"; but it was a
convenience for me. You could Sheet9 and Sheet10 in multiple places, just
as you did before. There is no performance difference. It only makes it
easier to change the "names" of the worksheets, as I needed to do.


-----

Option Explicit

Sub compare()
Dim yesterdaySheet As Worksheet, todaySheet As Worksheet
Dim todaycells As Variant, yesterdaycells As Variant
Dim r As Range
Dim n9 As Long, n10 As Long, i As Long, ii As Long
Dim st As Single, et As Single
Application.ScreenUpdating = False
Set yesterdaySheet = Sheet9
Set todaySheet = Sheet10
st = Timer
n9 = lastRow(yesterdaySheet)
n10 = lastRow(todaySheet)
Set r = yesterdaySheet.Range("a1:a" & n9)
yesterdaycells = r.Value
todaycells = todaySheet.Range("a1:a" & n10).Value
For i = 1 To n9
For ii = 1 To n10
If yesterdaycells(i, 1) = todaycells(ii, 1) Then
r(i).EntireRow.Hidden = True
Exit For
End If
Next ii
Next i
et = Timer
Application.ScreenUpdating = True
MsgBox Format(et - st, "0.000") & " sec"
End Sub
 
wow... down to 5 seconds for a total of 5.125 sec for 21,595,420 individual comparisons. i think that's a bit more of an improvement! Thanks Again!
 
Matthew Dyer said:
could you describe the purpose of r in your example?

For the same reason that you did:

Set yesterdayrange = yesterdaySheet.Range("a1:a" & lastRow(yesterdaySheet))

It is more principle than practical. I usually avoid computing things
multiple times, especially inside a loop that might executed 7000 times in
your case.

There are many alternative codings. But I certainly would not call lastrow
multiple times unnecessarily.
 
Back
Top