Regular expressions

  • Thread starter Thread starter Guest
  • Start date Start date
G

Guest

I need to parse some HTML and add links to some keywords (up to 1000) defined
in a DB table. What I need to do is search for these keywords and if they
are not already a link, and they are not inside a paragraph tag, ie <p
class=tab>, I convert it to a link. There are also 8 other conditions to
decide whether text is converted to a link.

Is there an easy way to compare match collections that are returned from
separate regular expressions? If some text is matched by all 10 or so
regular expressions then I know it should be converted. Or am I better off,
regarding performance
and maintainablilty, just building 1 huge regular expression to do all the
parsing in one go.
 
My recommendation is to build 1 big regular expression because it will
probably be more efficient code, but the best choice is really whichever one
you are more comfortable with. Depending on how good you are with regular
expressions you may want to make some documentation for the Regular
Expression so that you know why you did what in case you need to change it.
However, make sure you understand and test the Regular Expression, because
it is easy to make a mistake. Good Luck!
 
you could use XSL to transform the html.
the only thing is that the html *should* be XML standard conforming
othiswise iam not sure if XSL can process a not wellformed document.
 
That's what I should be doing but it will be too much of a performance hit.
I just have to be very careful about the RegEx's that I'm writing.
 
: I need to parse some HTML and add links to some keywords (up to 1000)
: defined in a DB table. What I need to do is search for these keywords
: and if they are not already a link, and they are not inside a
: paragraph tag, ie <p class=tab>, I convert it to a link. There are
: also 8 other conditions to decide whether text is converted to a link.
:
: Is there an easy way to compare match collections that are returned
: from separate regular expressions? If some text is matched by all 10
: or so regular expressions then I know it should be converted. Or am I
: better off, regarding performance and maintainablilty, just building 1
: huge regular expression to do all the parsing in one go.

The first consideration is of course correctness. Regular expressions
tend to be blunt tools in this context, so cody's suggestion is
probably the best course.

Your HTML may be suitable for scanning by a regular expression as
opposed to general HTML for which you'd need a full-blown parser,
but we don't know because you didn't provide sample inputs.

Greg
 
Back
Top