. . . With it absent, the function will return the last Ennn; and I believe the regex
engine will come to the solution in fewer steps with much less back-tracking.
Unclear. You mean that greedy * would find the last match faster than
non-greedy * would find the first? Unlikely. Also irrelevant if the
goal is the first match.
Maybe there's more backtracking in my function (picky: unlikely
there's any true backtracking, but there's A LOT of look ahead
checking), but it seems to run faster than your function. Maybe the
Matches collection overhead is that much of a drag. Also, rewriting
the regex to avoid non-greedy quantifiers requires something along the
lines of
re.Pattern = "^([^E]*(E\D)?)*E(\d+(\.\d+)?).*$"
t = re.Replace(s, "$3")
If t <> vbNullString Then hg = CDbl(t) Else hg = CVErr(xlErrNum)
which involves back-to-back-to-back 0-or-more quantifiers, so probably
even more backtracking.
FWLIW, it's a pity VBScript regexs lack a simple .Find method that
returns the position of the first match in the string as a Long, or
even better something like the awk approach
result = match(s, /E[0-9]*\.?[0-9]+/) ? substr(s, RSTART, RLENGTH) :
""
Constructing and dereferencing the Matches collection takes a lot of
time.
As I wrote before, I can certainly accept that the .Replace function will work
faster.
However, with regard to your regex, testing it in another program, against the
OP's test string, revealed that it took significantly more steps to match than
if you omit the lazy qualifier. However, upon closer examination, I discovered
that the number of steps was more dependent on the location of the matched
substring than on the use of a lazy vs greedy ".". Oh well, live and learn.
Now being curious, I ran some procedure timings. I did a run of 100 calls and
measured the elapsed time for four different routines. I repeated this several
times and averaged the results.
I then changed the position of the Ennn substring from near the end to near the
beginning of the string and did more runs.
The string was in A1; and the four function calls were in B1:E1
The four routines were:
foo: Your original with the lazy ".?"
foobar: Yours with the greedy "."
NumAfterE: Mine with the match collection
DoItInVBA: A VBA routine that I cobbled together (see below)
If my method of obtaining timing is accurate, the differences in the timing of
the various regex approaches seems trivial. The VBA routine is MUCH faster.
On my machine, the hi resolution counter frequency is 266,680 and these were
the results with the four different routines:
Ennn near
Start End
foo 1.914E-02 1.980E-02
foobar 1.832E-02 1.881E-02
NumAfterE 1.851E-02 1.888E-02
DoItInVBA 1.912E-04 1.825E-04
This is unexpected as I would have expected my regex routine to take
significantly longer. As a matter of fact, if I place the timer within the
function, it does seem to. However, the execution is so quick that we are
talking about differences of only a few counts of the timer.
One of these days, I'll do a bit more extensive testing.
======================================================
Option Explicit
Sub CheckTiming()
Dim oTime As New CHiResTimer
Const s As String =
"1http://
www.glassdoor.com/Jobs/First-Data-Account-Executive-Jobs-EI_IE1783.0,10_KO11,28.htm"
Const i As Long = 100
Dim j As Long
Dim res
oTime.StartTimer
For j = 1 To i
res = foo(s)
Next j
oTime.StopTimer
Debug.Print "foo took " & Format(oTime.Elapsed, "#.0000000") & " seconds."
oTime.StartTimer
For j = 1 To i
res = foobar(s)
Next j
oTime.StopTimer
Debug.Print "foobar took " & Format(oTime.Elapsed, "#.0000000") & " seconds."
oTime.StartTimer
For j = 1 To i
res = NumAfterE(s)
Next j
oTime.StopTimer
Debug.Print "NumAfterE took " & Format(oTime.Elapsed, "#.0000000") & "
seconds."
oTime.StartTimer
For j = 1 To i
res = DoItInVBA(s)
Next j
oTime.StopTimer
Debug.Print "DoItInVBA took " & Format(oTime.Elapsed, "#.0000000") & "
seconds."
End Sub
'----------------------------------------
Option Explicit
Function foo(s As String) As Variant
Dim re As Object
Set re = CreateObject("VBScript.Regexp")
re.Pattern = ".*?E([-+]?\d*\.?\d+).*"
If re.test(s) Then foo = CDbl(re.Replace(s, "$1")) _
Else foo = CVErr(xlErrNum)
Set re = Nothing 'paranoia
End Function
'-------------------------------------------
Function foobar(s As String) As Variant
Dim re As Object
Set re = CreateObject("VBScript.Regexp")
re.Pattern = ".*E([-+]?\d*\.?\d+).*"
If re.test(s) Then foobar = CDbl(re.Replace(s, "$1")) _
Else foobar = CVErr(xlErrNum)
Set re = Nothing 'paranoia
End Function
'----------------------------------------
Function NumAfterE(s As String) As Variant
Dim re As Object, mc As Object
Set re = CreateObject("vbscript.regexp")
re.Pattern = "E(\d+(\.\d+)?)"
If re.test(s) = True Then
Set mc = re.Execute(s)
NumAfterE = CDbl(mc(0).submatches(0))
Else: NumAfterE = CVErr(xlErrNum)
End If
Set re = Nothing
End Function
'-------------------------------------------
Function DoItInVBA(s As String) As Variant
Dim lStart As Long
lStart = 1
Do
lStart = InStr(lStart, s, "E") + 1
Loop Until Mid(s, lStart, 1) Like "#"
DoItInVBA = Val(Mid(s, lStart))
End Function
'------------------------------
The following is in a Class Module:
==================================
Option Explicit
'How many times per second is the counter updated?
Private Declare Function QueryFrequency Lib "kernel32" _
Alias "QueryPerformanceFrequency" ( _
lpFrequency As Currency) As Long
'What is the counter's value
Private Declare Function QueryCounter Lib "kernel32" _
Alias "QueryPerformanceCounter" ( _
lpPerformanceCount As Currency) As Long
'Variables to store the counter information
Dim cFrequency As Currency
Dim cOverhead As Currency
Dim cStarted As Currency
Dim cStopped As Currency
Private Sub Class_Initialize()
Dim cCount1 As Currency, cCount2 As Currency
'Get the counter frequency
QueryFrequency cFrequency
'Call the hi-res counter twice, to check how long it takes
QueryCounter cCount1
QueryCounter cCount2
'Store the call overhead
cOverhead = cCount2 - cCount1
End Sub
Public Sub StartTimer()
'Get the time that we started
QueryCounter cStarted
End Sub
Public Sub StopTimer()
'Get the time that we stopped
QueryCounter cStopped
End Sub
Public Property Get Elapsed() As Double
Dim cTimer As Currency
'Have we stopped or not?
If cStopped = 0 Then
QueryCounter cTimer
Else
cTimer = cStopped
End If
'If we have a frequency, return the duration, in seconds
If cFrequency > 0 Then
Elapsed = (cTimer - cStarted - cOverhead) / cFrequency
End If
End Property
=================================
--ron