how to find union of two arrays

  • Thread starter Thread starter Andrew
  • Start date Start date
GS said:
Peter T pretended :

Peter,
I think the time it takes lies in the fact that *Arrays* aren't being
compared, but rather the loops read the ranges directly.

If the 2 ranges were dumped into arrays where at least 1 array was sorted,
a binary search would cut the processing time considerably. Better,
though, (as you say) that both arrays are sorted. The bottleneck is trying
to loop the ranges directly while doing this in memory would run orders of
magnitude faster.<IMO>

Garry,

Indeed, reading cells is considerably longer than reading array values,
though there will be some overhead initially reading 2x50k cells to a array
or pair of arrays.

It really is important to sort the first array, whether using built-in Excel
functions or VBA. Also if using VBA extremely worthwhile to sort the 2nd
array. That's because for each cell in the 2nd array, code will only need to
look from the index+1 of the last match to the first item that doesn't
match, so much less work than available to Excel's built-in functions.
Despite Excel's normally faster methods, it's difficult to take advantage of
that approach in Excel (although could be done) so VBA might be
easier/faster.

As for sorting, as you say in your adjacent post "that depends on how you do
it". There are some extremely fast sort methods out there for VBA, some
compare well if not better than Excel's built-in method (excluding the time
to transfer from cells to array to cells), particularly if the data is all
say doubles.

However to keep things simple, maybe get VBA to leverage Excel's sort, then
compare the sorted data in VBA as above. Without testing not sure which'd be
fastest, VBA+Excel or just Excel.

Peter T
 
GS said:
<FWIW>
Here's a snippet from Professional Excel Development Ch.07 that explains
how memory leaks occur, and why it's considered *good practice* to set
objects we're done with to *= Nothing*.


<Quote>
"...Normally when you overwrite an object in VBA, VBA cleans up the old
version of the object and reclaims the memory that was used to hold it.
You can also set an object equal to Nothing to reclaim the memory used by
it. It is good practice to do this explicitly when you no longer need an
object, rather than relying on VBA to do it.



Set gclsCells = Nothing

Not sure I entirely agree, or rather it depends. There may be good reasons
to do that for other objectives, eg done with the object and want to test if
Nothing later. Also probably a good idea to clean up globals as you go when
done.

However simply good practice for the sake of being sure the object reference
is destroyed, perhaps not. Ideally there should never be any doubt about
when and where the reference will looses scope. VB/A's builtin garbage
collection works very efficiently, and explicitly setting to Nothing is
unnecessary work.

Regards,
Peter T
 
Peter T wrote on 5/24/2011 :
However to keep things simple, maybe get VBA to leverage Excel's sort, then
compare the sorted data in VBA as above. Without testing not sure which'd be
fastest, VBA+Excel or just Excel.

Peter,
This is what I usually do with large arrays as I find Excel's Sort much
faster than any array sort. I simply dump the array into a temp wks,
sort the data, dump the data back into the array, and delete the wks
after I'm done with it. Works a treat.

Otherwise, I use VB[A] methods as described in PED ch17 for
sorting/matching in arrays.
 
Here's a snippet from Professional Excel Development
Ch.07 that explains how memory leaks occur, and why
it's considered *good practice* to set objects we're
done with to *= Nothing*.

That's a matter of opinion, just as some people might say it is "good
practice" to explicitly initialize non-static local variables to zero
in a procedure instead of relying on the fact they are initialized to
zero each time a procedure is entered.

The development guide describes a complex situation where, I would
agree, it is necessary to set an object variable to Nothing, namely
objects that reference other objects.

But I was asking about a very limited case, namely: setting the
object variable to Nothing just before exiting a procedure, when the
object variable is a non-static local variable.

In that case, VBA help explains:

"When several object variables refer to the same object, memory and
system resources associated with the object to which the variables
refer are released only after all of them have been set to Nothing,
either explicitly using Set, __or_implicitly__ after the last object
variable set to Nothing goes out of scope."

Exiting a procedure is an example of non-static local variable going
out of scope. In that case, VBA should implicitly set the object
variable to Nothing, decreasing the object reference count. If that
causes the reference count to go to zero, the system should free the
memory associated with it.

When Isabelle mentioned a "memory leak", I was thinking of a systemic
memory leak (defect in VBA), not a program memory leak (programming
error).
 
I do not notice any such memory leak in XL2003 SP3 with
VBA 6.5.1024. [....] I tried the module below.

Not good enough. That's not the memory leak we are concerned with.
 
I do not notice any such memory leak in XL2003 SP3 with
VBA 6.5.1024.

Here is a better demonstration. Again, I must rely on Task Manager to
monitor process memory usage.

Copy the class object below into a class module names myTest1. Note
that all it does is allocate 1 MB of memory.

Copy the macros below into a regular module. Follow the instructions
at the top.

My observations (YMMV).... When I run testit1st, I get the following
results:

Mem Usage VM Size
18968 7680 before Set=New
20016 8712 global Set=New
21044 9740 local Set=New
20020 8712 exit testit2; local object freed
20020 8712 end testit1st; global object remains

Interpretation.... Local object memory is freed automatically when
the procedure exits; Set=Nothing is not needed. This is the situation
in Isabelle's example.

Global object memory is not free when the procedure exits. That's to
be expected, and it may be desired since often we want values of
global variables to persist between calls. Of course, if not desired,
we must do Set=Nothing. But that is not the situation in Isabelle's
example.

Note: Minor differences of a page or two (4K per page) probably
reflects Excel or VBA memory usage, e.g. for MsgBox, not a leak. The
key difference to look for is 1000K (1 MB), which reflects the
allocation of the class object.

When I run testit2nd, I get the following results:

Mem Usage VM Size
20040 8716 previous global Set=New
20040 8716 new global Set=New

Interpretation.... Set=New to an object variable that references
object memory causes the original reference count to be decremented,
and the original memory to be freed if the reference count goes to
zero. Thus, the new object allocation "replaces" the old object
allocation; Set=Nothing is not needed.


-----
class module myTest1:

Private xArr(1 To 1024 * 1024&) As Byte


-----
regular module:

'****************************************
'start Task Manager
'select Excel process in Processes tab
'scroll Excel process to the top
'position TM window low and to the side
' so that Excel MsgBox can be seen
'run testit1st
'then run testit2nd
'****************************************

Dim a As myTest1

Sub testit1st()
Dim s As String
'*** be sure no memory is in-use
Set a = Nothing
checkit "record"
'*** allocate memory
Set a = New myTest1
checkit "increase"
'*** demonstrate that testit2 exit
'*** frees memory for its myTest1 object
testit2
checkit "decrease"
'*** demonstrate that global myTest1 object
'*** remains in-use after test1 exits
checkit "unchanged later", False
End Sub

Sub testit2nd()
Dim s As String
'*** demonstrate that memory is replaced
'*** (probably freed, then re-allocated)
'*** when re-Set'g New object variable
checkit "record"
Set a = New myTest1
checkit "unchanged"
End Sub

Private Sub testit2()
Dim a As myTest1
Set a = New myTest1
checkit "increase"
If setNothing Then Set a = Nothing
End Sub

Private Sub checkit(tag As String, _
Optional doWait As Boolean = True)
If doWait Then
'*** time for Task Manager to change
Application.Wait Now() + TimeSerial(0, 0, 3)
End If
MsgBox tag
End Sub
 
Excellent example of the snippet I posted earlier!

I must re-iterate Rick's compliment to you about the effort and
research you do to provide such detailed replies. Most awesome! I hope
the casual posters appreciate those details as much as some of us
regulars do.

Personally, I was satisfied with the first sentence of your earlier
post:

"That's a matter of opinion..."

Some people may just decide to make it their personal rule that any
objects they create will be set "= Nothing" when done with so they are
covered for all situations, *regardless of context*. That, of course,
is entirely their choice.<g>
 
I haven't fully followed this thread so apologies in advance if I've missed
something, however I don't follow how this example relates to Isabelle's
wanting to explicitly destroy worksheet references.

The memory occupied by an object variable is trivial and very unlikely ever
to be noticed, it's only a pointer to the handle of an object. In the case
of a worksheet reference, even if global and not destroyed when say the
worksheet is deleted, no harm will be done (albeit bad practice for other
reasons). It will simply store a number, the old pointer to the worksheet.
That said, if the reference was to a sheet in an automated instance, failure
to release could leave the Excel instance hanging in memory.

In your example you created an object, and as a COM object (an instance of
the Class) it will 'stay alive' until the last reference to it is released
(explicitly or goes out of scope), then the object will destroy itself. Sure
the memory the object occupies (not the reference to it) while alive is
relative to the class, its methods etc, and not least allocation for any
declared variables.

FWIW issues can arise with classes and circular references (classes refer to
each other), then they can be difficult if not impossible to tear down and
release memory, but that's a different matter.

Regards,
Peter T


I do not notice any such memory leak in XL2003 SP3 with
VBA 6.5.1024.

Here is a better demonstration. Again, I must rely on Task Manager to
monitor process memory usage.

Copy the class object below into a class module names myTest1. Note
that all it does is allocate 1 MB of memory.

Copy the macros below into a regular module. Follow the instructions
at the top.

My observations (YMMV).... When I run testit1st, I get the following
results:

Mem Usage VM Size
18968 7680 before Set=New
20016 8712 global Set=New
21044 9740 local Set=New
20020 8712 exit testit2; local object freed
20020 8712 end testit1st; global object remains

Interpretation.... Local object memory is freed automatically when
the procedure exits; Set=Nothing is not needed. This is the situation
in Isabelle's example.

Global object memory is not free when the procedure exits. That's to
be expected, and it may be desired since often we want values of
global variables to persist between calls. Of course, if not desired,
we must do Set=Nothing. But that is not the situation in Isabelle's
example.

Note: Minor differences of a page or two (4K per page) probably
reflects Excel or VBA memory usage, e.g. for MsgBox, not a leak. The
key difference to look for is 1000K (1 MB), which reflects the
allocation of the class object.

When I run testit2nd, I get the following results:

Mem Usage VM Size
20040 8716 previous global Set=New
20040 8716 new global Set=New

Interpretation.... Set=New to an object variable that references
object memory causes the original reference count to be decremented,
and the original memory to be freed if the reference count goes to
zero. Thus, the new object allocation "replaces" the old object
allocation; Set=Nothing is not needed.


-----
class module myTest1:

Private xArr(1 To 1024 * 1024&) As Byte


-----
regular module:

'****************************************
'start Task Manager
'select Excel process in Processes tab
'scroll Excel process to the top
'position TM window low and to the side
' so that Excel MsgBox can be seen
'run testit1st
'then run testit2nd
'****************************************

Dim a As myTest1

Sub testit1st()
Dim s As String
'*** be sure no memory is in-use
Set a = Nothing
checkit "record"
'*** allocate memory
Set a = New myTest1
checkit "increase"
'*** demonstrate that testit2 exit
'*** frees memory for its myTest1 object
testit2
checkit "decrease"
'*** demonstrate that global myTest1 object
'*** remains in-use after test1 exits
checkit "unchanged later", False
End Sub

Sub testit2nd()
Dim s As String
'*** demonstrate that memory is replaced
'*** (probably freed, then re-allocated)
'*** when re-Set'g New object variable
checkit "record"
Set a = New myTest1
checkit "unchanged"
End Sub

Private Sub testit2()
Dim a As myTest1
Set a = New myTest1
checkit "increase"
If setNothing Then Set a = Nothing
End Sub

Private Sub checkit(tag As String, _
Optional doWait As Boolean = True)
If doWait Then
'*** time for Task Manager to change
Application.Wait Now() + TimeSerial(0, 0, 3)
End If
MsgBox tag
End Sub
 
I haven't fully followed this thread so apologies in
advance if I've missed something, however I don't follow
how this example relates to Isabelle's wanting to explicitly
destroy worksheet references.

You may be right. In a last-minute effort to simplify things, I
removed optional code that I believe would have related to Isabelle's
example better.

The VBA help page states: "memory and system resources associated
with the object to which the variables refer are released only after
all of them have been set to Nothing, either explicitly using Set, or
implicitly after the last object variable set to Nothing goes out of
scope."

I believe that says that non-static local object variables are
implicitly set to Nothing when a procedure is exited.

The better demonstration of that fact would be if in testit1st, I set
the global object variable "a" to Nothing explicitly some time after
returning from testit2, and use Task Manager to show that the memory
is freed.

That could only happen if the reference count for the object went to
zero at that point, which could only happen if the testit2 reference
were implicitly decreased when leaving the testit2 scope by implicitly
setting the local object variable "a" to Nothing.

In fact, I had done exactly that in one version of the example. "The
exercise is left to the student".

Not a very good example, I agree. I could clean all this up. But I
think I'm beating a dead horse at this point.
FWIW issues can arise with classes and circular
references (classes refer to each other), then they can
be difficult if not impossible to tear down and release
memory, but that's a different matter.

Yes it is. Nonetheless, I had intended to demonstrate that very
situation with a separate example anyway, primarily for GS's benefit.
I forgot.
if the reference was to a sheet in an automated instance,
failure to release could leave the Excel instance hanging
in memory.

Can you explain that with an example? I do not understand the
terminology "automated instance" and how that can lead to a memory
leak.

One example that I believe would "leave the Excel instance hanging in
memory" is if we Set a global object variable to a Worksheet or Range
object and neglect to Set the global object variable to Nothing.

That was really the point of the last step in my testit1st macro,
which goes beyond the question I had for Isabelle. It is certainly
the programmer's responsibility to know when it is necessary to Set a
global object variable to Nothing. It depends on the program design.

But that is not the kind of memory leak I thought Isabelle is
referring to. Since the VBA help documentation explains that we can
expect that exiting from the scope effectively sets non-static local
object variables to Nothing (i.e. decrease the object reference
count), I thought Isabelle might be referring to a __VBA__ defect, not
a programming error.

In any case, I believe that you and I are in "violent agreement" not
only on when we __must__ use Set=Nothing, but also when it is really
not necessary, notwithstanding some people's idea of "good practice".
 
joeu2004 said:
Can you explain that with an example? I do not understand the
terminology "automated instance" and how that can lead to a memory
leak.

"Automation" is where code (in particular object variables) in one app
refers to another application, whether after creating a new instance or
grabbing an existing one. All code in say VB6 doing anything with Excel is
"Automation", and any instance of Excel so referenced is an "automated
instance".
One example that I believe would "leave the Excel instance hanging in
memory" is if we Set a global object variable to a Worksheet or Range
object and neglect to Set the global object variable to Nothing.

*Only* if the reference is to an automated instance, eg code in Excel1
refers to Excel2, otherwise no problem (good practice is another matter).
Try this -

Private mWs As Worksheet
Sub Test1()
Dim xl As Excel.Application
Dim wb As Workbook
Set xl = New Excel.Application
Set wb = xl.Workbooks.Add
Set mWs = wb.Worksheets(1)
xl.Quit
End Sub
Sub CleanUp()
Set mWs = Nothing
End Sub

Look at Processes in Task Manager (maybe sort "Image Name"), step through
Test1 and see the 2nd Excel instance appear, and remain after doing xl.Quit.
Step through CleanUp and see the 2nd instance removed as mWs is destroyed.
Actually before doing CleanUp stop or break on 'end sub' and look at mWs in
Locals.

Now try this -
Sub Test2()
Dim wb As Workbook
Set wb = Workbooks.Add
Set mWs = wb.Worksheets(1)
wb.Close False
End Sub

See the difference with mWs this time in Locals. mWs retains the now
redundant pointer but that's all, no memory leak. However it's not good
practice, eg "mWs = Not Nothing" would give a misleading result. Of course
in this example no need to explicity release 'wb'.

In any case, I believe that you and I are in "violent agreement" not
only on when we __must__ use Set=Nothing, but also when it is really
not necessary, notwithstanding some people's idea of "good practice".

Violent, ?, -:) But sure, I think we said pretty much the same in response
to Garry's quote from the book.

Regards,
Peter T
 
hello gentlemen joeu2004, Peter and Garry

i would like first thank you, it's a real pleasure to read your discussions and also
learn something new, so thank you for taking time to do those research and then give us
to learn about it, it is very appreciated, guys's you're the best
 
joeu2004 said:
Can you explain that with an example?  I do not understand
the terminology "automated instance" and how that can lead
to a memory leak.

"Automation" is where code (in particular object variables)
in one app refers to another application, [...] and any
instance of Excel so referenced is an "automated instance".

Okay, I understand now what you mean by "automated instance". I
completely misunderstand what you meant by "Excel instance". My
fault, not yours; just a brain fart.

Your example is just another example of what I had explained before,
namely: of course, the memory and resources used by global objects
persist until they are explicitly released by setting the last object
reference to Nothing, or implicitly released either by setting them to
another object or by resetting VBA explicitly or implicitly.

That is the nature of global variables, which can be intentional.

It is a programming error only if the programmer did not intend for
the object memory and resources (including the Excel instance in your
example) to persist after exiting the macro.

That is in contrast to a memory leak caused by a VBA defect, which I
thought, perhaps incorrectly, that Isabelle was alluding to.

I reiterate that point for the benefit of other readers. I believe
you know that already.

Thanks again for taking the time to clear my misunderstanding about
the terminology.
 
Back
Top