Regular Expression Help on syntax

  • Thread starter Thread starter Jason
  • Start date Start date
J

Jason

Need help on regular expression, where we are trying to grab number values as
well as minus sign if negative. Currently we can grab all the numbers
correctly, just if numbers are negative, we only grab the number not the
number and the minus sign:

Example for:
-10 we get 10 but want -10 of course

I am new to Regular Expressions and have played with the syntax but cannot
get it work properly. Currently we are using
"\D+(\d+)"
 
Need help on regular expression, where we are trying to grab number values as
well as minus sign if negative. Currently we can grab all the numbers
correctly, just if numbers are negative, we only grab the number not the
number and the minus sign:

Example for:
-10 we get 10 but want -10 of course

I am new to Regular Expressions and have played with the syntax but cannot
get it work properly. Currently we are using
"\D+(\d+)"

I note that your expression will only capture integers. And also will not
capture an integer at the start of a line.

For integers with an optional sign, and to allow capture of positive integers
at the beginning of a line, try:

"[-+]?\b\d+\b"
--ron
 
Hello Dan,

Based on my researches, if we want to identify the negative number, we can
use the following regular expressions,
^(-)?\D+(\d+)

Best regards,
Ji Zhou
Microsoft Online Community
 
Neither of these suggestions would grab the negative sign, so maybe there is
more to it. I have posted what I believe to be the more relevant portions of
the code below:

c.Offset(0, i + 1).Value = RegexMid(myStr, sURLdate, "bl gb")
c.Offset(0, i + 2).Value = RegexMid(myStr, sURLdate, "br gb")
c.Offset(0, i + 3).Value = RegexMid(myStr, sURLdate, "class=gb")

Private Function RegexMid(s As String, sDate As String, sTempType As String)
As String
Dim re As Object, mc As Object
Set re = CreateObject("vbscript.regexp")
re.IgnoreCase = True
re.MultiLine = True
re.Global = True
re.Pattern = "\b" & sDate & "/DailyHistory[\s\S]+?" & sTempType & "\D+(\d+)"

If re.test(s) = True Then
Set mc = re.Execute(s)
RegexMid = mc(0).submatches(0)
End If
Set re = Nothing
End Function


***Here is an example of the data from the source page that we are trying to
pull from

</tr>
</tbody>
<tbody>
<tr>
<td><a href="/history/airport/KSTP/2010/1/1/DailyHistory.html">1</a></td>
<td class="bl gb">
8
</td>
<td class="gb">
0
</td>
<td class="br gb">
-8
</td>

Thank you!
 
Joel,

I tried using your suggestion, but it won't pull any numbers, much less the
negatives.

The closest I got was using:
"(\D+(-)?\d+)"

but it pulls
">-8
instead of
-8

or
instead of
-8

joel said:
You need to OR to expressions like I did below

"(\d+|([-+]\d+))"

the pipe character is an or function to either have the minus sign or
not have the minus sign


--
joel
------------------------------------------------------------------------
joel's Profile: 229
View this thread: http://www.thecodecage.com/forumz/showthread.php?t=168082

Microsoft Office Help

.
 
Joel,

I tried using your suggestion, but it won't pull any numbers, much less the
negatives.

The closest I got was using:
"(\D+(-)?\d+)"

but it pulls
">-8
instead of
-8

or
instead of
-8

I'm just curious about what happened with my suggestion?

"[-+]?\b\d+\b"
--ron
 
"[-+]?\b\d+\b"

This doesn't seem to grab any numbers at all. Perhaps there is more info I
can provide from the script below:

c.Offset(0, i + 1).Value = RegexMid(myStr, sURLdate, "bl gb")
c.Offset(0, i + 2).Value = RegexMid(myStr, sURLdate, "br gb")
c.Offset(0, i + 3).Value = RegexMid(myStr, sURLdate, "class=gb")

Private Function RegexMid(s As String, sDate As String, sTempType As String)
As String
Dim re As Object, mc As Object
Set re = CreateObject("vbscript.regexp")
re.IgnoreCase = True
re.MultiLine = True
re.Global = True
re.Pattern = "\b" & sDate & "/DailyHistory[\s\S]+?" & sTempType & "\D+(\d+)"

If re.test(s) = True Then
Set mc = re.Execute(s)
RegexMid = mc(0).submatches(0)
End If
Set re = Nothing
End Function


***Here is an example of the data from the source page that we are trying to
pull from

</tr>
</tbody>
<tbody>
<tr>
<td><a href="/history/airport/KSTP/2010/1/1/DailyHistory.html">1</a></td>
<td class="bl gb">
8
</td>
<td class="gb">
0
</td>
<td class="br gb">
-8
</td>

Thank you!
 
Ron,

Sorry I posted this reply earlier.

"[-+]?\b\d+\b"

This doesn't seem to grab any numbers at all. Perhaps there is more info I
can provide from the script below:

c.Offset(0, i + 1).Value = RegexMid(myStr, sURLdate, "bl gb")
c.Offset(0, i + 2).Value = RegexMid(myStr, sURLdate, "br gb")
c.Offset(0, i + 3).Value = RegexMid(myStr, sURLdate, "class=gb")

Private Function RegexMid(s As String, sDate As String, sTempType As String)
As String
Dim re As Object, mc As Object
Set re = CreateObject("vbscript.regexp")
re.IgnoreCase = True
re.MultiLine = True
re.Global = True
re.Pattern = "\b" & sDate & "/DailyHistory[\s\S]+?" & sTempType & "\D+(\d+)"

If re.test(s) = True Then
Set mc = re.Execute(s)
RegexMid = mc(0).submatches(0)
End If
Set re = Nothing
End Function


***Here is an example of the data from the source page that we are trying to
pull from

</tr>
</tbody>
<tbody>
<tr>
<td><a href="/history/airport/KSTP/2010/1/1/DailyHistory.html">1</a></td>
<td class="bl gb">
8
</td>
<td class="gb">
0
</td>
<td class="br gb">
-8
</td>

Thank you!
 
Ron,

Sorry I posted this reply earlier.

"[-+]?\b\d+\b"

This doesn't seem to grab any numbers at all.
Perhaps there is more info I
can provide from the script below:

c.Offset(0, i + 1).Value = RegexMid(myStr, sURLdate, "bl gb")
c.Offset(0, i + 2).Value = RegexMid(myStr, sURLdate, "br gb")
c.Offset(0, i + 3).Value = RegexMid(myStr, sURLdate, "class=gb")

Private Function RegexMid(s As String, sDate As String, sTempType As String)
As String
Dim re As Object, mc As Object
Set re = CreateObject("vbscript.regexp")
re.IgnoreCase = True
re.MultiLine = True
re.Global = True
re.Pattern = "\b" & sDate & "/DailyHistory[\s\S]+?" & sTempType & "\D+(\d+)"

If re.test(s) = True Then
Set mc = re.Execute(s)
RegexMid = mc(0).submatches(0)
End If
Set re = Nothing
End Function


***Here is an example of the data from the source page that we are trying to
pull from

</tr>
</tbody>
<tbody>
<tr>
<td><a href="/history/airport/KSTP/2010/1/1/DailyHistory.html">1</a></td>
<td class="bl gb">
8
</td>
<td class="gb">
0
</td>
<td class="br gb">
-8
</td>

Thank you!

There is no question that the regex I provided should return all signed and
unsigned integers, as I understand things.

There is also no question that your regex of \D+(\d+) will never capture a
negative integer into the capturing group, since the "-" will always be
captured by the \D.

The problem is in the rest of your regex. It could be that either sDate or
sTempType is not properly formed.

For example, the following three regex's return the desired numbers in
capturing group 1 when run against your test data:

"\b/2010/1/1/DailyHistory[\s\S]+?""bl gb""[\s\S]+?-?\b(\d+)\b"

--> 8

"\b/2010/1/1/DailyHistory[\s\S]+?""gb""[\s\S]+?-?\b(\d+)\b"

--> 0

"\b/2010/1/1/DailyHistory[\s\S]+?""br gb""[\s\S]+?(-?\b\d+)\b"

--> -8

Above is what your regex should look like after you finish concatenating things
together.

So in addition to your problem with the part of your regex that captures the
integer, which should be corrected by using my suggestion, you need to evaluate
the rest of the regex and how it is being constructed.
--ron
 
I tested Ron's code and it didn't return what you really need. Ron's
string returns everthing to the end of the number you are looking for.

My "code" was merely a regex which is quite different from your code.

If implemented properly, it returns only the number. If you are returning only
what you say, then you probably are not implementing it properly.

To demonstrate some code segments, we could set up the following, with the data
in an Excel worksheet. And code similar to what the OP posted, with a few
minor changes to correct his errors.

I just put MyStr into A1 for testing, not having access to the rest of the OP's
code. And, by setting c=A2 and leaving i=0, the OP's sub would place the
results in B2:D2.



A1: (the sample text given by the OP
</tr>
</tbody>
<tbody>
<tr>
<td><a href="/history/airport/KSTP/2010/1/1/DailyHistory.html">1</a></td>
<td class="bl gb">
8
</td>
<td class="gb">
0
</td>
<td class="br gb">
-8
</td>

Then use this routine -- quite similar to that of the OP:

====================================
Option Explicit
Sub TestExtract()
Dim i As Long
Dim c As Range
Dim myStr As String
Dim sURLdate As String
Set c = Range("A2")
myStr = Range("A1").Value
sURLdate = Format(CDate("1/1/2010"), "/yyyy/m/d/")


c.Offset(0, i + 1).Value = RegexMid(myStr, sURLdate, "bl gb")
c.Offset(0, i + 2).Value = RegexMid(myStr, sURLdate, "br gb")
c.Offset(0, i + 3).Value = RegexMid(myStr, sURLdate, "class=""gb""")
End Sub

'------------------------------------------------------------------
Private Function RegexMid(s As String, sDate As String, sTempType As String) _
As String
Dim re As Object, mc As Object

Set re = CreateObject("vbscript.regexp")
re.ignorecase = True
re.Pattern = sDate & "DailyHistory[\s\S]+?" & _
sTempType & "[\s\S]+?(-?\b\d+)\b"

If re.test(s) = True Then
Set mc = re.Execute(s)
RegexMid = mc(0).submatches(0)
End If
End Function
=================================

As expected, this returns into B2, C2 and D2 the signed integers following the
sTempType variables.

Note also the extra quote marks required in the third call, c/w the OP's.

Also note the terminal regex representing the signed integer.



I finally figure it out. You need to put parenthsis around the two sub
parts of the search string. the first part is the tag and the second
part is the number. Then use the submatches prperty to get the 2nd
submatch.

Note that your code for a signed integer:

(\d+|([-+]\d+))

could be more simply expressed as

([-+]?\d+)

Why did you choose to use alternation?


--ron
 
I tested Ron's code and it didn't return what you really need. Ron's
string returns everthing to the end of the number you are looking for.

My "code" was merely a regex which is quite different from your code.

If implemented properly, it returns only the number. If you are returning only
what you say, then you probably are not implementing it properly.

To demonstrate some code segments, we could set up the following, with the data
in an Excel worksheet. And code similar to what the OP posted, with a few
minor changes to correct his errors.

I just put MyStr into A1 for testing, not having access to the rest of the OP's
code. And, by setting c=A2 and leaving i=0, the OP's sub would place the
results in B2:D2.



A1: (the sample text given by the OP
</tr>
</tbody>
<tbody>
<tr>
<td><a href="/history/airport/KSTP/2010/1/1/DailyHistory.html">1</a></td>
<td class="bl gb">
8
</td>
<td class="gb">
0
</td>
<td class="br gb">
-8
</td>

Then use this routine -- quite similar to that of the OP:

====================================
Option Explicit
Sub TestExtract()
Dim i As Long
Dim c As Range
Dim myStr As String
Dim sURLdate As String
Set c = Range("A2")
myStr = Range("A1").Value
sURLdate = Format(CDate("1/1/2010"), "/yyyy/m/d/")


c.Offset(0, i + 1).Value = RegexMid(myStr, sURLdate, "bl gb")
c.Offset(0, i + 2).Value = RegexMid(myStr, sURLdate, "br gb")
c.Offset(0, i + 3).Value = RegexMid(myStr, sURLdate, "class=""gb""")
End Sub

'------------------------------------------------------------------
Private Function RegexMid(s As String, sDate As String, sTempType As String) _
As String
Dim re As Object, mc As Object

Set re = CreateObject("vbscript.regexp")
re.ignorecase = True
re.Pattern = sDate & "DailyHistory[\s\S]+?" & _
sTempType & "[\s\S]+?(-?\b\d+)\b"

If re.test(s) = True Then
Set mc = re.Execute(s)
RegexMid = mc(0).submatches(0)
End If
End Function
=================================

As expected, this returns into B2, C2 and D2 the signed integers following the
sTempType variables.

Note also the extra quote marks required in the third call, c/w the OP's.

Also note the terminal regex representing the signed integer.



I finally figure it out. You need to put parenthsis around the two sub
parts of the search string. the first part is the tag and the second
part is the number. Then use the submatches prperty to get the 2nd
submatch.

Note that your code for a signed integer:

(\d+|([-+]\d+))

could be more simply expressed as

([-+]?\d+)

Why did you choose to use alternation?


--ron

In order to get a proper match I have to use the \ character before
the / characters in the Format function, like this

sURLdate = Format(CDate("1/1/2010"), "\/yyyy\/m\/d\/")

Maybe this has something to do with the Regional and Language
settings.

I also have a question. What is the function of the two \b in the
regexp? I get a match even without them.
And the negative number is also matched.

Lars-Åke
 
I tested Ron's code and it didn't return what you really need. Ron's
string returns everthing to the end of the number you are looking for.

My "code" was merely a regex which is quite different from your code.

If implemented properly, it returns only the number. If you are returning only
what you say, then you probably are not implementing it properly.

To demonstrate some code segments, we could set up the following, with the data
in an Excel worksheet. And code similar to what the OP posted, with a few
minor changes to correct his errors.

I just put MyStr into A1 for testing, not having access to the rest of the OP's
code. And, by setting c=A2 and leaving i=0, the OP's sub would place the
results in B2:D2.



A1: (the sample text given by the OP
</tr>
</tbody>
<tbody>
<tr>
<td><a href="/history/airport/KSTP/2010/1/1/DailyHistory.html">1</a></td>
<td class="bl gb">
8
</td>
<td class="gb">
0
</td>
<td class="br gb">
-8
</td>

Then use this routine -- quite similar to that of the OP:

====================================
Option Explicit
Sub TestExtract()
Dim i As Long
Dim c As Range
Dim myStr As String
Dim sURLdate As String
Set c = Range("A2")
myStr = Range("A1").Value
sURLdate = Format(CDate("1/1/2010"), "/yyyy/m/d/")


c.Offset(0, i + 1).Value = RegexMid(myStr, sURLdate, "bl gb")
c.Offset(0, i + 2).Value = RegexMid(myStr, sURLdate, "br gb")
c.Offset(0, i + 3).Value = RegexMid(myStr, sURLdate, "class=""gb""")
End Sub

'------------------------------------------------------------------
Private Function RegexMid(s As String, sDate As String, sTempType As String) _
As String
Dim re As Object, mc As Object

Set re = CreateObject("vbscript.regexp")
re.ignorecase = True
re.Pattern = sDate & "DailyHistory[\s\S]+?" & _
sTempType & "[\s\S]+?(-?\b\d+)\b"

If re.test(s) = True Then
Set mc = re.Execute(s)
RegexMid = mc(0).submatches(0)
End If
End Function
=================================

As expected, this returns into B2, C2 and D2 the signed integers following the
sTempType variables.

Note also the extra quote marks required in the third call, c/w the OP's.

Also note the terminal regex representing the signed integer.



I finally figure it out. You need to put parenthsis around the two sub
parts of the search string. the first part is the tag and the second
part is the number. Then use the submatches prperty to get the 2nd
submatch.

Note that your code for a signed integer:

(\d+|([-+]\d+))

could be more simply expressed as

([-+]?\d+)

Why did you choose to use alternation?


--ron

In order to get a proper match I have to use the \ character before
the / characters in the Format function, like this

sURLdate = Format(CDate("1/1/2010"), "\/yyyy\/m\/d\/")

Maybe this has something to do with the Regional and Language
settings.

I also have a question. What is the function of the two \b in the
regexp? I get a match even without them.
And the negative number is also matched.

Lars-Åke


I found the explanation to \b (word boundary)
http://msdn.microsoft.com/en-us/library/ms974570.aspx

Lars-Åke
 
In order to get a proper match I have to use the \ character before
the / characters in the Format function, like this

sURLdate = Format(CDate("1/1/2010"), "\/yyyy\/m\/d\/")

Maybe this has something to do with the Regional and Language
settings.

I don't know. AFAIK, all characters except

[\^$.|?*+()

should get matched literally.

Preceding a character (that has no special meaning) by a backslash ("\") merely
represents a single instance of the second character.

If you type, into the immediate window:

?CDate("1/1/2010")

what is returned?
I also have a question. What is the function of the two \b in the
regexp? I get a match even without them.
And the negative number is also matched.

\b represents a word boundary, or, more specifically, it matches at the
position between a word character (anything matched by \w) and a non-word
character (anything matched by [^\w] or \W) as well as at the start and/or end
of the string (or line) if the first and/or last characters in the string (or
line) are word characters.
--ron
 
In order to get a proper match I have to use the \ character before
the / characters in the Format function, like this

sURLdate = Format(CDate("1/1/2010"), "\/yyyy\/m\/d\/")

Maybe this has something to do with the Regional and Language
settings.

I don't know. AFAIK, all characters except

[\^$.|?*+()

should get matched literally.

Preceding a character (that has no special meaning) by a backslash ("\") merely
represents a single instance of the second character.

If you type, into the immediate window:

?CDate("1/1/2010")

what is returned?
I also have a question. What is the function of the two \b in the
regexp? I get a match even without them.
And the negative number is also matched.

\b represents a word boundary, or, more specifically, it matches at the
position between a word character (anything matched by \w) and a non-word
character (anything matched by [^\w] or \W) as well as at the start and/or end
of the string (or line) if the first and/or last characters in the string (or
line) are word characters.
--ron

If I write ?CDate("1/1/2010") in the immediate window the following is
returned
2010-01-01

If I write ?Format(CDate("1/1/2010"), "/yyyy/m/d/") the following is
returned
-2010-1-1-

Only if I write ?Format(CDate("1/1/2010"), "\/yyyy\/m\/d\/" the wanted
result
/2010/1/1/
is achieved.

So, with my settings the / character seems to generate a - .
To have a / generated the / has to be escaped with \.

I found the explanation in the Excel help for Format:

"(/) Date separator. In some locales, other characters may be used to
represent the date separator. The date separator separates the day,
month, and year when date values are formatted. The actual character
used as the date separator in formatted output is determined by your
system settings."

And in my settings - (hyphen) is used as the date separator, in
accordance with ISO 8601 extended format
http://en.wikipedia.org/wiki/ISO_8601

Lars-Åke
 
In message <[email protected]> of Sun, 10 Jan 2010 04:55:10 in
microsoft.public.excel.programming said:
Ron: have you ever looked at the 1970's unix manual volume 2b under the
YACC topic. See this webpage. Look for the link for the PDF files and
use the link : v7vol2b.pdf (819KB)

Webpage
'7th Edition Manual PDF'
(http://plan9.bell-labs.com/7thEdMan/bswv7.html)

pdf file
http://plan9.bell-labs.com/7thEdMan/v7vol2b.pdf



It is the only good description of pattern matching that I have ever
seen.

The question mark indicates any single character.

Joel, How do you conclude that? (I expect "." to match any single
character within a line and ".|\s" to match any single character).

I found this table on page 49/250 (Joel is probably looking elsewhere):

Regular expressions in Lex use the following operators:

....
x? an optional x.
x* 0,1,2, ... instances of x.
x+ 1,2,3, ... instances of x.
x?y an x or a y.
....

I am confused by that x?y. I think it means an optional x followed by a
literal y. I think there may be a glyph confusion. I suspect it is
intended to be x|y where the character - for which I know no name -
between x and y means "or" as & - ampersand - means "and".

The symbols used by Set RE = CreateObject("VBScript.RegExp")
RE.Pattern = ... are a superset of the BRE described in <http://opengrou
p.org/onlinepubs/007908775/xbd/re.html>. My own view is that thinks like
\d is an unnecessary shorthand for [0-9]. I concede it is reasonable if
a range is not allowed in a character set. i.e. if [0123456789] is
needed. In VBA, I refer to <http://msdn.microsoft.com/en-
us/library/ms974570.aspx> which specifies everything I want to know
other than the meanings of the exceptions given.
<http://msdn.microsoft.com/en-us/library/xe43cc8d(VS.85).aspx>
specifies 5019, which I hit yesterday. ;)

For my part, I would start with "-?\d+" to grab a whole number. i.e a
whole number is a minus which is optional followed by a digit one or
more times. I found the OP's situation too complicated to want to follow
and offer a suggestion.
 
In order to get a proper match I have to use the \ character before
the / characters in the Format function, like this

sURLdate = Format(CDate("1/1/2010"), "\/yyyy\/m\/d\/")

Maybe this has something to do with the Regional and Language
settings.

I don't know. AFAIK, all characters except

[\^$.|?*+()

should get matched literally.

Preceding a character (that has no special meaning) by a backslash ("\") merely
represents a single instance of the second character.

If you type, into the immediate window:

?CDate("1/1/2010")

what is returned?
I also have a question. What is the function of the two \b in the
regexp? I get a match even without them.
And the negative number is also matched.

\b represents a word boundary, or, more specifically, it matches at the
position between a word character (anything matched by \w) and a non-word
character (anything matched by [^\w] or \W) as well as at the start and/or end
of the string (or line) if the first and/or last characters in the string (or
line) are word characters.
--ron

If I write ?CDate("1/1/2010") in the immediate window the following is
returned
2010-01-01

If I write ?Format(CDate("1/1/2010"), "/yyyy/m/d/") the following is
returned
-2010-1-1-

Only if I write ?Format(CDate("1/1/2010"), "\/yyyy\/m\/d\/" the wanted
result
/2010/1/1/
is achieved.

So, with my settings the / character seems to generate a - .
To have a / generated the / has to be escaped with \.

I found the explanation in the Excel help for Format:

"(/) Date separator. In some locales, other characters may be used to
represent the date separator. The date separator separates the day,
month, and year when date values are formatted. The actual character
used as the date separator in formatted output is determined by your
system settings."

And in my settings - (hyphen) is used as the date separator, in
accordance with ISO 8601 extended format
http://en.wikipedia.org/wiki/ISO_8601

Lars-Åke

Well, that is an interesting difference, and clearly related to
Excel/VBA/Regional Settings and not to the regex engine per se, as I originally
thought.

When I place ?Cdate("1/1/2010") in the immediate window, it returns
1/1/2010.

So it would seem that for code that will run properly, regardless of the
Windows Regional settings, it might be best to use "\/yyyy\/m\/d" in the Format
command to properly format the date string to match with the URL.

Here in the US, I don't generally have to deal with international requirements.
Thanks for pointing that out.
--ron
 
Ron: have you ever looked at the 1970's unix manual volume 2b under the
YACC topic. See this webpage. Look for the link for the PDF files and
use the link : v7vol2b.pdf (819KB)

Webpage
'7th Edition Manual PDF'
(http://plan9.bell-labs.com/7thEdMan/bswv7.html)

pdf file
http://plan9.bell-labs.com/7thEdMan/v7vol2b.pdf



It is the only good description of pattern matching that I have ever
seen.

The question mark indicates any single character.


I have not looked at that reference. However, if that is what it says, it is
wrong.

The question mark "?": makes the preceding item optional. Greedy, so the
optional item is included in the match if possible.

=================================
Looking at the reference you provided, I see that you have misread it. It does
*NOT* read, as you claim, that "the question mark indicates any single
character". It states the almost the SAME definition as *I* posted above:

"Optional Expressions . The operator ? indicates an optional element of an
expression. Thus ab?c matches either ac or abc."
===================================

There are some instances where "?" represents any single character, but it does
NOT do this with regard to regular expressions. Perhaps that is where you got
confused.
There are a number of problems with the above search string

1) why do you care that a blank occurs immediately after the last
digit. what happens if there is a return, tab or end of file

I don't know what you mean by a "blank". Perhaps you are confusing the use of
the "\b" token? Check that definition (which I previously posted). It is also
does NOT seem to be part of the LEX regular expression flavor which you are
quoting. Rather, in LEX, it would mean a backspace; but that is not the case
with more commonly used flavors.
2) The match is looking for a positive or negative sign folowed by any
character followed by a blank. Most people don't have a blank.

No it is not. See above. You are misinterpreting "\b".\
\d+|([-+]\d+)

It is where I found the saying

If you can't bring Mohammed to the mountain, bring the mountain to
Mohammed.

If Mohammed goes to the wrong mountain, he may get the wrong information.

You might try some of these "mountains", if the links are still valid:

Regular Expressions
http://www.regular-expressions.info/reference.html
http://support.microsoft.com/default.aspx?scid=kb;en-us;818802&Product=vbb
http://msdn2.microsoft.com/en-us/library/6wzad2b2.aspx
http://msdn2.microsoft.com/en-us/library/ms974619.aspx
http://www.regex-guru.info/
--ron
 
For my part, I would start with "-?\d+" to grab a whole number. i.e a
whole number is a minus which is optional followed by a digit one or
more times. I found the OP's situation too complicated to want to follow
and offer a suggestion.

I agree with a lot of what you wrote.

I would point out that the above construct will also extract whole numbers that
are embedded within other strings. That may or may not be desirable.

In other words, it can extract both integers from the string below:

abc123abc 123
--ron
 
Back
Top