Search and Replace while ignoring HTML formatting?

  • Thread starter Thread starter Josiwe
  • Start date Start date
J

Josiwe

I have a search program that returns an HTML string which I display to
the user. I want to highlight the search terms. However a simple
search and replace on the HTML causes problems.

If the user searches on Georgia and I get back this:
<div style="font-name:Arial">Georgia, Alabama, and Louisiana</div>

It works fine:
<div style="font-name:Arial"><span style="background-
color:yellow;">Georgia</span>, Alabama, and Louisiana</div>

However if the HTML that comes back is this:
<div style="font-name:Georgia">Georgia, Alabama, and Louisiana</div>

I get a serious problem which breaks the formatting and looks
terrible:
<div style="font-name:<span style="background-color:yellow;">Georgia</
span>"><span style="background-color:yellow;">Georgia</span>, Alabama,
and Louisiana</div>

The HTML I'm getting back is quite complex, with nested spans, style
tags, etc. I'm stuck for how to solve this problem - is there a
regular expression I can use to match chunks of non-formatting text to
replace? I have neither the time nor the resources to write a full
blown html tokenizer.
 
You will probably want to use XHTML instead of HTML and use an XML parser to
do the work. You should be able to loop through text nodes and apply the
search/replace there.
 
Back
Top