C
Chris Wertman
Hello, I posted a slightly different version of theis to dotnet.general
where it was suggested I post it here.
Well I have to say Im getting exicted about my app , its almost there,
I have added a button to IE and am calling the current instance of IE
and grabbing th URL out just fine. Im using the webclient to grab the
html so far so good and Im only half bald.
Now I am at the point I need to extract out a couple of fields from
the HTML itself. I have read about usin regex to do this but am a
little confused, maybe Ive just been staring at the screen too long.
I get this HTML returned.
<b>Binding:</b> Paperback<br> <b>Publisher:</b>
What I need to extract is the word Paperback from the above string.
Here is what I have so far
Dim regex As New Regex("<b>Binding:</b>((.|\n)*?)<br> <b>Publisher:",
RegexOptions.IgnoreCase)
MsgBox(regex.Match(html).ToString)
But that returns <b>Binding:</b>Paperback<br> <b>Publisher:
This I am sure is something wrong with my regular expression , but can I
strip multiple items using this method say naming then rexez1 regex2 etc
?
Someone suggested I use the DOM using microsoft.mshtml
Is this the most efficient way ?
Do I need to somehow put it into StreamReader or ......well what do I do
with it then.
Chris
where it was suggested I post it here.
Well I have to say Im getting exicted about my app , its almost there,
I have added a button to IE and am calling the current instance of IE
and grabbing th URL out just fine. Im using the webclient to grab the
html so far so good and Im only half bald.
Now I am at the point I need to extract out a couple of fields from
the HTML itself. I have read about usin regex to do this but am a
little confused, maybe Ive just been staring at the screen too long.
I get this HTML returned.
<b>Binding:</b> Paperback<br> <b>Publisher:</b>
What I need to extract is the word Paperback from the above string.
Here is what I have so far
Dim regex As New Regex("<b>Binding:</b>((.|\n)*?)<br> <b>Publisher:",
RegexOptions.IgnoreCase)
MsgBox(regex.Match(html).ToString)
But that returns <b>Binding:</b>Paperback<br> <b>Publisher:
This I am sure is something wrong with my regular expression , but can I
strip multiple items using this method say naming then rexez1 regex2 etc
?
Someone suggested I use the DOM using microsoft.mshtml
Is this the most efficient way ?
Do I need to somehow put it into StreamReader or ......well what do I do
with it then.
Chris