RegExp not matching what I want

  • Thread starter Thread starter Phil Hibbs
  • Start date Start date
P

Phil Hibbs

This is my regular expression:
lineRegEx.Pattern = " *([^ ]+) ?(.*) *"

What I want is for it to strip leading and trailing spaces, and then
split the remainder on the first space that it encounters. What it
does in practice is to return the entire string as the first match.
Any ideas what I'm doing wrong?

I've tested it here:
http://www.regular-expressions.info/vbscriptexample.html

....and it shows the same behaviour, but the same pattern works here:
http://www.regular-expressions.info/javascriptexample.html

I can only assume that VBA is not using the regular expression
implementation that I have been led to expect.

Phil Hibbs.
 
Aha, I fixed my problem, I was iterating over the Matches instead of
the Matches.SubMatches!

Phil.
 
You don't show enough code to figure out what was wrong.
I'm glad you sorted it out.
The act of posting often stimulates thought.

I find it helpful to develop Regular Expression code in the immediate
window. The test method is useful. It is also useful to put the pattern
into English. e.g. zero or more spaces; remember one of more non-spaces;
an optional space; remember zero or more characters; zero or more
spaces. I would be inclined to try "^ *([^ ]+) ?(.*) *$". ^ means "start
of string" and $ means "end of string". I can then use "$1" and "$2" in
the replace method to show the remembered chunks. I suspect the "?" is
superfluous. It is good to see someone else using regular expressions. I
find them clearer than a sequence of left, mid and right calls. Also,
would you expect to match "ABC"? i.e. there is no space to split on.

In message <[email protected]
s.com> of Mon, 19 Apr 2010 07:14:04 in microsoft.public.excel.programmin
g said:
This is my regular expression:
lineRegEx.Pattern = " *([^ ]+) ?(.*) *"

What I want is for it to strip leading and trailing spaces, and then
split the remainder on the first space that it encounters. What it
does in practice is to return the entire string as the first match.
Any ideas what I'm doing wrong?

I've tested it here:
http://www.regular-expressions.info/vbscriptexample.html

...and it shows the same behaviour, but the same pattern works here:
http://www.regular-expressions.info/javascriptexample.html

I can only assume that VBA is not using the regular expression
implementation that I have been led to expect.

Phil Hibbs.
 
If I understand what you are trying to do, you can do it without using
regular expressions. I think you are saying you want to end up with an array
containing two elements... the first "word" (in front of the first space)
after leading and trailing spaces have been removed in the array's first
element and all the rest of the text in the array's second element. If that
is correct, then give this code a try (just substitute your text or a
variable/range/whatever that contains your text where I have written
"YourTextGoesHere")...

Dim Result() As String
.......
.......
' This is the only important line of code
Result =Split(Trim(YourTextGoesHere), " ", 2)
.......
.......
' These two lines are only here to show you it worked
MsgBox "First word: " & Result(0)
MsgBox "Rest of text: " & Result(1)

Note: The Split function will **always** return a zero-based array even if
you are using an "Option Base 1" statement.
 
Rick said:
If I understand what you are trying to do, you can do it without using
regular expressions.

Sure, but I was replacing my current implementation with a regexp
version to see if it was more efficient. I thought that maybe a single
regexp call that returns the elements and their lengths would be more
efficient than separate trim, instr, mid, and length calls (or trim,
split, length calls in your suggeston). Turns out it isn't, partly
because I misunderstood the length thing. You don't get the lengths of
the sub-matches, just the length of the full expression match, so I
had to do the length call anyway, and the "rest of" can be very long
(up to 3MB).

Phil Hibbs.
 
This is my regular expression:
lineRegEx.Pattern = " *([^ ]+) ?(.*) *"

What I want is for it to strip leading and trailing spaces, and then
split the remainder on the first space that it encounters. What it
does in practice is to return the entire string as the first match.
Any ideas what I'm doing wrong?

I've tested it here:
http://www.regular-expressions.info/vbscriptexample.html

...and it shows the same behaviour, but the same pattern works here:
http://www.regular-expressions.info/javascriptexample.html

I can only assume that VBA is not using the regular expression
implementation that I have been led to expect.

Phil Hibbs.

Phill,

If I might suggest:

"^\s*(\S+)\s(.*?)\s*$"

or, if you don't want to return any leading spaces prior to the second
submatch:

"^\s*(\S+)\s+(.*?)\s*$"



--ron
 
Sure, but I was replacing my current implementation with a regexp
version to see if it was more efficient. I thought that maybe a single
regexp call that returns the elements and their lengths would be more
efficient than separate trim, instr, mid, and length calls (or trim,
split, length calls in your suggeston). Turns out it isn't, partly
because I misunderstood the length thing. You don't get the lengths of
the sub-matches, just the length of the full expression match, so I
had to do the length call anyway, and the "rest of" can be very long
(up to 3MB).

Phil Hibbs.


Hi. Don't know if this is any faster...

Sub Demo()
Dim S As String
Dim v(1 To 2)
Dim P As Long

S = " This is a test "
S = Trim(S)
P = InStr(1, S, Space(1))
v(1) = Left$(S, P - 1)
v(2) = Mid$(S, P + 1)
End Sub

= = = = = = =
HTH :>)
Dana DeLouis
 
Ron said:
If I might suggest:
"^\s*(\S+)\s(.*?)\s*$"

It wasn't the regular expression that was wrong, it's the way I was
using the return value. But that's probably a better regex than mine.

Phil Hibbs.
 
Dana said:
Hi.  Don't know if this is any faster...

That's almost exactly what my original code did. It has to process the
string three times (if you add a length call at the end), I was trying
to reduce the number of times.

Phil Hibbs.
 
It wasn't the regular expression that was wrong, it's the way I was
using the return value. But that's probably a better regex than mine.

Phil Hibbs.

OIC

By the way, reading through some other messages in the thread, I was wondering
about why you needed the LENgth calls? Is this for some subsequent processing
of the outputs? Because if you just want to trim off leading and trailing
spaces, that can be done with the regex.
--ron
 
Back
Top