Regular Expression Help!

  • Thread starter Thread starter Steve Peterson
  • Start date Start date
S

Steve Peterson

Hi - I'll admit it.. I'm clueless using regular expressions. That's why I
was hoping someone could help me out a bit. I need to search an html file
and find strings that are between tags. For example, everything in bewtween
the <body></body> tag. As you well know, this string could span many lines,
white spaces, tabs, et.... So I need an expression that will match the
<body> tag, ingnore EVERYTHING then match the end </body> tag.

Anyone to the rescue?? :)

Thanks a lot in advance
Steve
 
Hi,

I have been using this for doing the job in C#:

private string getBodyText(string inputString)
{
string pattern = "<body(<?body>.*)</body>";
Regex r = new Regex(pattern,
RegexOptions.IgnoreCase|RegexOptions.Compiled);
Match m = r.Match(inputString);
if (m.Success)
{
return m.Groups["body"].Value;
}
else
{
return inputString;
}
}

Regards,
Svend
 
Did the trick! Thanks!!!
Steve


Svend Dyhr Hansen said:
Hi,

I have been using this for doing the job in C#:

private string getBodyText(string inputString)
{
string pattern = "<body(<?body>.*)</body>";
Regex r = new Regex(pattern,
RegexOptions.IgnoreCase|RegexOptions.Compiled);
Match m = r.Match(inputString);
if (m.Success)
{
return m.Groups["body"].Value;
}
else
{
return inputString;
}
}

Regards,
Svend

Steve Peterson said:
Hi - I'll admit it.. I'm clueless using regular expressions. That's why I
was hoping someone could help me out a bit. I need to search an html file
and find strings that are between tags. For example, everything in bewtween
the <body></body> tag. As you well know, this string could span many lines,
white spaces, tabs, et.... So I need an expression that will match the
<body> tag, ingnore EVERYTHING then match the end </body> tag.

Anyone to the rescue?? :)

Thanks a lot in advance
Steve
 
Hi Steve,

You could host the Microsoft Web Browser (ShDocVw) and use the Document
Object Model to do the work for you.

In your case it would be
Browser.Document.Body.InnerHtml
(just as it would in Javascript or VbScript).

Regards,
Fergus
 
Back
Top