Desperate for some help with regular expression pattern

  • Thread starter Thread starter Lisa Bogart
  • Start date Start date
L

Lisa Bogart

I am trying to take a string and parse it out into multiple strings
based on a pattern but am stuck and am hoping someone can give me a
clue.

My pattern looks like so: sMatch =
"\d\d\d\d-\d\d-\d\d\s\d\d:\d\d\sby\s<a class=link href=" & Chr(34) &
"javascript:jsOpen[\s\S]*"

What I want is to take a single string and for anything that starts
with a date/time and "<a class=link href=" & Chr(34) &
"javascript:jsOpen", divide the single string into multiple strings.
The trouble seems to be the last part of the pattern "[\s\S]*".
Anything can appear after the first part of the string including
linefeeds, tabs, form-feeds, etc. The above pattern gives me the
whole string instead of several as I expected.

I have tried various combinations of Singleline and Multiline options.

sMatch = "^(\d\d\d\d-\d\d-\d\d\s\d\d:\d\d\sby\s<a class=link href=" &
Chr(34) & "javascript:jsOpen)(.*)$" - gives me the whole string.
sMatch = "^(\d\d\d\d-\d\d-\d\d\s\d\d:\d\d\sby\s<a class=link href=" &
Chr(34) & "javascript:jsOpen)[\s|\S]*" - gives me the whole string
sMatch = "^(\d\d\d\d-\d\d-\d\d\s\d\d:\d\d\sby\s<a class=link href=" &
Chr(34) & "javascript:jsOpen).*" - gives me several strings but those
strings end when the first chr(10) is encountered after the first part
of the pattern.

I have tried the RegeEx.Split where I split by patterns similiar to
above and this gets me closer but then I lose the first part of the
string that matches the pattern.

Any ideas or advice would be appreciated.
 
Lisa

There are a number of ways to make this work. All are explained in Jeff Friedl's book, "Mastering Regular Expressions" from O'Reilly

If I were doing this, I would either use "regex.matches" and see if the matches collection had collected more than one group, or "regex.match" followed by "regex.replace" repeatedly to find the first hit, replace it with something the regex won't catch, and try to find another match in the same input. When m.success = false you can stop looking

There is probably a real slick way to do what you want, but I've barely finished reading the book myself

By the way, there is an excellent section in the back of the book about using regexes in .NE

Hope this helps

JimT
 
Lisa

If you paste the following into a "button_click" subroutine it will pop up a message box that says "found 3 of 'em" when you click on the button. It looks pretty messy here, but pasting it into a vb program should straighten it out. If it doesn't, everything should be on a line by itself except the string s, which is three lines long

Once you have the match collection, you can extract a string index for each instance which tells you where your search string begins in the target string. Then s.substring(index, ...) will peel out the piece you want

If this is what you need, wonderful! If not, send me an e-mail and we can try to figure it out together

Good Luck

Jim

Dim r As New Regex("\d{4}\-\d{2}\s\d{2}:\d{2}\sby\s<a class=link href=" & Chr(34) & """javascript:jsOpen"""
Dim mc As MatchCollectio
Dim s As String = "2004-23 21:04 by <a class=link href=" & Chr(34) & """javascript:jsOpen""" & " stuff " &
"2002-04 08:59 by <a class=link href=" & Chr(34) & """javascript:jsOpen""" & " nonsense " &
"2001-99 12:00 by <a class=link href=" & Chr(34) & """javascript:jsOpen""" & " garbage
mc = r.Matches(s
If mc.Count > 0 The
MsgBox("Found " & Format(mc.Count, "D") & " of 'em"
End I
 
Hi Lisa,

First of all, I would like to confirm my understanding of your issue.
From your description, I understand that you wants to use a regular
expression to exact a matched string from a long string.
Have I fully understood you? If there is anything I misunderstood, please
feel free to let me know.

I think you may try to use the group to do the job.

Imports System.Text.RegularExpressions
Module Module2
Public Sub Main()
Dim Text As String = "2004-04-12 09:31 by <a class=link
href=""javascript:jsOpen fsafsadfs"
System.Console.WriteLine("text=[" & Text & "]")
Dim r As Regex = New
Regex("^(?<MatchedStringName>\d\d\d\d-\d\d-\d\d\s\d\d:\d\d\sby\s<a
class=link href=""javascript:jsOpen)(.*)", RegexOptions.IgnoreCase)
Dim mt As Match = r.Match(Text)
Console.WriteLine(mt.Groups("MatchedStringName").ToString())
'The Line will print out 2004-04-12 09:31 by <a class=link
href="javascript:jsOpen
End Sub
End Module

For detailed information about group and regular expression, you may take a
look at the link below.
Grouping Constructs
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/cpgenref/ht
ml/cpcongroupingconstructs.asp
Regular Expression Language Elements
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/cpgenref/ht
ml/cpconregularexpressionslanguageelements.asp

Please apply my suggestion above and let me know if it helps resolve your
problem.

Best regards,

Peter Huang
Microsoft Online Partner Support

Get Secure! - www.microsoft.com/security
This posting is provided "AS IS" with no warranties, and confers no rights.
 
Back
Top