max values in two ranges

  • Thread starter Thread starter Johnny Petran
  • Start date Start date
J

Johnny Petran

Hi

I want to find max of date1 for the latest date2 whitin each company.

In other words I have this:

date1,company_number,date2
1997-01-31,a,null
1997-02-28,a,null
1997-03-31,a,1997-02-28
1998-02-28,b,1998-01-31
1998-03-31,b,1998-02-28
1998-03-30,b,null
1998-04-30,b,null
1994-05-31,c,1993-02-28
1994-06-30,c,null
1994-07-31,c,1993-02-28
1994-08-30,c,null

and would like to have this:

date1,company_number,date2,max_date1
1997-01-31,a,null ,1997-03-31
1997-02-28,a,null ,1997-03-31
1997-03-31,a,1997-02-28 ,1997-03-31
1998-02-28,b,1998-01-31 ,1998-03-31
1998-03-31,b,1998-02-28 ,1998-03-31
1998-03-30,b,null ,1998-03-31
1998-04-30,b,null ,1998-03-31
1994-05-31,c,1993-02-28 ,1994-07-31
1994-06-30,c,null ,1994-07-31
1994-07-31,c,1993-02-28 ,1994-07-31
1994-08-30,c,null ,1994-07-31

Is there a way to do this in vba? I have about milion rows so it would
be much faster I believe.

TIA
Johnny
 
Johnny Petran said:
I want to find max of date1 for the latest date2
whitin each company.
In other words I [...] would like to have this:
date1,company_number,date2,max_date1
1997-01-31,a,null ,1997-03-31
1997-02-28,a,null ,1997-03-31
1997-03-31,a,1997-02-28 ,1997-03-31
1998-02-28,b,1998-01-31 ,1998-03-31
1998-03-31,b,1998-02-28 ,1998-03-31
1998-03-30,b,null ,1998-03-31
1998-04-30,b,null ,1998-03-31
1994-05-31,c,1993-02-28 ,1994-07-31
1994-06-30,c,null ,1994-07-31
1994-07-31,c,1993-02-28 ,1994-07-31
1994-08-30,c,null ,1994-07-31

You could write the following array formula [*] starting D2 and copy down:

=MAX(IF($B$2:$B$12=B2,IF($C$2:$C$12<>"",$A$2:$A$12)))

That assumes "date1" is column A, "company number" is column B, and "date2"
is column C. And it assumes that "null" means the null string ("");
otherwise, substitute the word "null" for "" in the formula.

[*] Enter an array formula by pressing ctrl+shift+Enter instead of Enter.
Excel will display an array formula surrounded by curly braces in the
Formula Bar, i.e. {=formula}. You cannot type the curly braces yourself.
If you make a mistake, select the cell, press F2 and edit, then press
ctrl+shift+Enter.
Is there a way to do this in vba? I have about
milion rows so it would be much faster I believe.

It probably would not be faster, especially if you use a user-defined
function. However, if you use a subroutine, which you might associate with
a button for ease of use, it would make recalculations and edits faster
because you would only calculate column D ("max date1") when you choose to.

Can we assume that all data for each company is in adjacent rows, as they
are in your example?

If so, the VBA subroutine algorithm can be very different. For example:


Option Explicit

Sub doit()

Dim lastrow As Long, n As Long
Dim p As Long, i As Long, j As Long
Dim date1, company, date2, maxDate1

' assume at least 2 rows of data
lastrow = Range("a2").End(xlDown).Row
date1 = Range("a2:a" & lastrow)
company = Range("b2:b" & lastrow)
date2 = Range("c2:c" & lastrow)

n = UBound(date1, 1)
ReDim date3(1 To n, 1 To 1)

p = 1
If date2(1, 1) <> "" _
Then maxDate1 = date1(1, 1) _
Else maxDate1 = 0

For i = 2 To n
If company(i, 1) = company(p, 1) Then
If date2(i, 1) <> "" And date1(i, 1) > maxDate1 _
Then maxDate1 = date1(i, 1)
Else
If maxDate1 <> 0 Then
For j = p To i - 1
date3(j, 1) = maxDate1
Next
End If
p = i
If date2(p, 1) <> "" _
Then maxDate1 = date1(p, 1) _
Else maxDate1 = 0
End If
Next

If maxDate1 <> 0 Then
For j = p To i - 1
date3(j, 1) = maxDate1
Next
End If
Range("d2:d" & lastrow) = date3

End Sub
 
hi,

array formula to validate with "ctrl + shift + enter"

=MIN(IF($B$2:$B$20=B2,$A$2:$A$20))
 
PS....
It probably would not be faster, especially if you
use a user-defined function. However, if you use a
subroutine, [...] it would make recalculations and
edits faster because you would only calculate column D
("max date1") when you choose to.

Of course, the downside is that you must manually choose when to recalculate
column D.

You could automate that by using a Worksheet_Change event macro that might
invoke the VBA subroutine that I provided whenever there is a change in
columns A, B or C.

However, that would defeat the purpose of the subroutine, and you would
probably be better off with the MAX array formula, unless the subroutine
truly is faster. That would surprise me; but it I would have to test both
approaches to know.

On the other hand, one advantage to using a Worksheet_Change event macro
with the subroutine provided is that there would be fewer formulas. That
is, you would avoid the "millions" of MAX array formulas, which can impact
Excel's recalculation behavior.
 
I said:
You could write the following array formula [*] starting D2 and copy down:
=MAX(IF($B$2:$B$12=B2,IF($C$2:$C$12<>"",$A$2:$A$12))) [....]
Is there a way to do this in vba? I have about
milion rows so it would be much faster I believe.

It probably would not be faster, especially if you use a user-defined
function. However, if you use a subroutine, which you might associate
with a button for ease of use, it would make recalculations and edits
faster

I wrote later:
You could automate that by using a Worksheet_Change
event macro that might invoke the VBA subroutine that
I provided whenever there is a change in columns A, B or C.
However, that would defeat the purpose of the subroutine,
and you would probably be better off with the MAX array
formula, unless the subroutine truly is faster. That
would surprise me

What an odd thing for me to say, on second thought.

The MAX array formula above would access the entire range of data 3 times
__for_each__ instance of the formula. So for "millions" of rows of data,
that would be 3 trillion(!) accesses, not to mention the additional
computations in the formula.

In contrast, the subroutine accesses the entire range data just one time --
a total of 1 million sets of computations for "millions" of rows of data.

The user-defined function below can be used in place of the MAX array
formula. It should be significantly better than the MAX array formula -- on
the order of 1 million sets of computations for "millions" of rows of data.
But it should be slower than the subroutine.

In an actual test with just about 12000 rows of data, the subroutine took
about 0.225 sec, and the UDF took about 1.128 sec for a full recalculation
(ctrl+alt+F9). I did not dare to measure the MAX array formula because of
its combinatorial performance characteristic. (All times are on my
computer. YMMV.)

However, oddly, any edit that causes a recalculation took significantly
longer -- about 94 sec(!). I cannot explain why that is longer than a
forced full recalculation (ctrl+alt+F9), which I confirmed does indeed call
the UDF for all 12000 formulas. And I cannot explain why I cannot duplicate
that long edit-induced recalculation by doing the same thing in a macro.

In any case, my point is: that is 1 to 94 sec for a mere 12000 rows.
Figure about 8000(!) times more for "millions" of rows.

That is why the manual subroutine might still be your best option. Even
then, we are talking about 30 min of computation (on my computer; YMMV).

For what it's worth, my user-defined function is as follows. Once again, I
assume that all data for a company are in adjacent rows, as they are in your
example.


Option Explicit

Function maxDate(date1 As Range, company As Range, date2 As Range)

Dim firstRow As Long, n As Long, myRow As Long
Dim myCompany, d1 As Long, i As Long
Dim maxDate1 As Long ' initially zero

firstRow = date1.Row
If firstRow <> company.Row Or firstRow <> date2.Row _
Then GoTo err

n = date1.Rows.Count
If n <> company.Rows.Count Or n <> date2.Rows.Count _
Then GoTo err
If date1.Columns.Count <> 1 Or company.Columns.Count <> 1 _
Or date2.Columns.Count <> 1 _
Then GoTo err

myRow = Application.Caller.Row - firstRow + 1
If myRow < 1 Or myRow > n Then GoTo err

myCompany = company(myRow)
If date2(myRow) <> "" Then maxDate1 = date1(myRow)

For i = myRow - 1 To 1 Step -1
If company(i) <> myCompany Then Exit For
If date2(i) <> "" Then
d1 = date1(i)
If d1 > maxDate1 Then maxDate1 = d1
End If
Next

For i = myRow + 1 To n
If company(i) <> myCompany Then Exit For
If date2(i) <> "" Then
d1 = date1(i)
If d1 > maxDate1 Then maxDate1 = d1
End If
Next

maxDate = IIf(maxDate1 <> 0, maxDate1, "")
Exit Function

err:

maxDate = CVErr(xlErrValue)
End Function
 
Back
Top