String manipulation

  • Thread starter Thread starter the dude
  • Start date Start date
T

the dude

I need to find some text in a string. I don't know what the text is
that I need to find, I need to find the text between one string and
another string, like:

<b>Product Detail: "Text I need to find"</b>

I need it to start reading after "Product Detail: " and stop reading
at "</b>"

I wrote the function below, but the only problem is that
thecode.indexof("</b>") will find the first time "</b>" occurs in the
string, which is not what I want:

Function parsehtml(website as string, btag as string, etag as string)
as string
dim thecode as string = readHtmlPage(website)
dim bstr as integer = thecode.indexof(btag) + btag.length
dim estr as integer = thecode.indexof(etag)
dim result as string = thecode.substring(bstr, (estr-bstr))

return result

end function
 
Regular Expressions are the way to go, but I'm not sure what the exact
requirement is..let me know and I'll write it for you. Is the rule,
Everything after the ":" following the <b> tag, ending at the </b> tag?
 
(e-mail address removed) (the dude) scripsit:
I need to find some text in a string. I don't know what the text is
that I need to find, I need to find the text between one string and
another string, like:

<b>Product Detail: "Text I need to find"</b>

I need it to start reading after "Product Detail: " and stop reading
at "</b>"

I wrote the function below, but the only problem is that
thecode.indexof("</b>") will find the first time "</b>" occurs in the
string, which is not what I want:

Function parsehtml(website as string, btag as string, etag as string)
as string
dim thecode as string = readHtmlPage(website)
dim bstr as integer = thecode.indexof(btag) + btag.length
dim estr as integer = thecode.indexof(etag)

Replace the line above with this:

\\\
Dim estr As Integer = thecode.IndexOf(etag, bstr)
///
 
The dude,
In addition to William's suggestion.

Are you saying that "</b>" occurs multiple times and you want to find each
occurrence?

Then you need a loop, you can use the startIndex parameter of the
overloaded String.IndexOf to continue looking from the last one.

Dim startIndex As Integer = 0
dim bstr as integer = thecode.indexof(btag, startIndex) +
btag.length

Or are you saying that "<b>" & "</b>" can be nested and you need to find the
last one?

Then you need a more sophisticated algorithm to parse HTML, you may want
to consider finding an HTML parse on the web you can use.

Hope this helps
Jay
 
Back
Top