I said:
You could write the following array formula [*] starting D2 and copy down:
=MAX(IF($B$2:$B$12=B2,IF($C$2:$C$12<>"",$A$2:$A$12))) [....]
Is there a way to do this in vba? I have about
milion rows so it would be much faster I believe.
It probably would not be faster, especially if you use a user-defined
function. However, if you use a subroutine, which you might associate
with a button for ease of use, it would make recalculations and edits
faster
I wrote later:
You could automate that by using a Worksheet_Change
event macro that might invoke the VBA subroutine that
I provided whenever there is a change in columns A, B or C.
However, that would defeat the purpose of the subroutine,
and you would probably be better off with the MAX array
formula, unless the subroutine truly is faster. That
would surprise me
What an odd thing for me to say, on second thought.
The MAX array formula above would access the entire range of data 3 times
__for_each__ instance of the formula. So for "millions" of rows of data,
that would be 3 trillion(!) accesses, not to mention the additional
computations in the formula.
In contrast, the subroutine accesses the entire range data just one time --
a total of 1 million sets of computations for "millions" of rows of data.
The user-defined function below can be used in place of the MAX array
formula. It should be significantly better than the MAX array formula -- on
the order of 1 million sets of computations for "millions" of rows of data.
But it should be slower than the subroutine.
In an actual test with just about 12000 rows of data, the subroutine took
about 0.225 sec, and the UDF took about 1.128 sec for a full recalculation
(ctrl+alt+F9). I did not dare to measure the MAX array formula because of
its combinatorial performance characteristic. (All times are on my
computer. YMMV.)
However, oddly, any edit that causes a recalculation took significantly
longer -- about 94 sec(!). I cannot explain why that is longer than a
forced full recalculation (ctrl+alt+F9), which I confirmed does indeed call
the UDF for all 12000 formulas. And I cannot explain why I cannot duplicate
that long edit-induced recalculation by doing the same thing in a macro.
In any case, my point is: that is 1 to 94 sec for a mere 12000 rows.
Figure about 8000(!) times more for "millions" of rows.
That is why the manual subroutine might still be your best option. Even
then, we are talking about 30 min of computation (on my computer; YMMV).
For what it's worth, my user-defined function is as follows. Once again, I
assume that all data for a company are in adjacent rows, as they are in your
example.
Option Explicit
Function maxDate(date1 As Range, company As Range, date2 As Range)
Dim firstRow As Long, n As Long, myRow As Long
Dim myCompany, d1 As Long, i As Long
Dim maxDate1 As Long ' initially zero
firstRow = date1.Row
If firstRow <> company.Row Or firstRow <> date2.Row _
Then GoTo err
n = date1.Rows.Count
If n <> company.Rows.Count Or n <> date2.Rows.Count _
Then GoTo err
If date1.Columns.Count <> 1 Or company.Columns.Count <> 1 _
Or date2.Columns.Count <> 1 _
Then GoTo err
myRow = Application.Caller.Row - firstRow + 1
If myRow < 1 Or myRow > n Then GoTo err
myCompany = company(myRow)
If date2(myRow) <> "" Then maxDate1 = date1(myRow)
For i = myRow - 1 To 1 Step -1
If company(i) <> myCompany Then Exit For
If date2(i) <> "" Then
d1 = date1(i)
If d1 > maxDate1 Then maxDate1 = d1
End If
Next
For i = myRow + 1 To n
If company(i) <> myCompany Then Exit For
If date2(i) <> "" Then
d1 = date1(i)
If d1 > maxDate1 Then maxDate1 = d1
End If
Next
maxDate = IIf(maxDate1 <> 0, maxDate1, "")
Exit Function
err:
maxDate = CVErr(xlErrValue)
End Function