string/regex: extracting the context of a string match around the found search term?

G

Guest

I'm analyzing large strings and finding matches using the Regex class. I
want to find the context those matches are found in and to display excerpts
of that context, just as a search engine might. In terms of code, what's the
easiest way to make that happen? The code below works fine for identifying
the matches, but it doesn't try to extract the surrounding context or
display it:

currPageText = [method which grabs text from my source]
numberOfMatches = Regex.Matches(currPageText,
pattern,RegexOptions.IgnoreCase).Count;
Response.Write("found " + numberOfMatches + "<br>");

Thank you,
-KF
 
P

Peter Duniho

I'm analyzing large strings and finding matches using the Regex class. I
want to find the context those matches are found in and to display
excerpts
of that context, just as a search engine might. In terms of code, what's
the
easiest way to make that happen?

The Regex.Matches() method returns a Matches instance, which is a
collection of Match instances. That's why you can look at the Count
property to see how many matches there were.

The Match class has the Index and Length property (inherited from the
Capture class), which tells you where in the original string the text was
found. You can easily use that information to look at the larger region
of text containing the matching text.

So, just enumerate the MatchCollection returned by Matches(), and for each
Match instance look at the substring defined by expanding the Match.Index
to Match.Index+Match.Length range to be as large as you think is
appropriate.

Pete
 
P

Peter Duniho

The Regex.Matches() method returns a Matches instance [...]

Obviously, that should read "returns a MatchCollection instance".
Hopefully the later part of my post made that clear. Forry sor any
foncusion. :)
 
E

Ethan Strauss

You could write a Regex to give you some text around the match doing by
doing something like

Regex MatchWithContext = new Regex
("(?<ContextBeforeMatch>.{10})(?<ActualMatch>StringYouWantTofind)(?<ContextAfterMatch>.{10})")

This would allow you to capture three groups. An "ActualMatch" group, a
"ContextBeforeMatch" group of 10 characters, and a "ContextAfterMatch"
group of 10 characters. I am almost certain this would work the way you want
it in the middle of the string, but you would not match anything fewer than
10 characters from the beginning or end of the string. I think that you
could fix that as follows

Regex MatchWithContext = new Regex
("(?<ContextBeforeMatch>.{0,10})(?<ActualMatch>StringYouWantTofind)(?<ContextAfterMatch>.{0,10})")
to allow the context groups to be as small as zero if needed, but I would
defiantly test this one before using it...
Ethan
Ethan Strauss Ph.D.
Bioinformatics Scientist
Promega Corporation
2800 Woods Hollow Rd.
Madison, WI 53711
608-274-4330
800-356-9526
(e-mail address removed)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top