Repost: Can anyone help with this Regex problem?

  • Thread starter Thread starter Greg Vereschagin
  • Start date Start date
G

Greg Vereschagin

I'm trying to figure out a regular expression that will match the
innermost tag and the contents in between. Specifically, the string
that I am attempting to match looks as follows:

....<table>...<table>...>Final<...</table>...</table>...

I want to match: <table>...>Final<...</table> from this example.

The string could also, of course, look like the following:

....<table>...<table>...</table>...<table>...>Final<...</table>...<table>...</table>...</table>...

I am looking for the innermost <table> </table> tags that have a
specific string in that table - in this case >Final<.

Any help would be greatly appreciated. If there are other newsgroups
dedicated to regular expressions I would be happy to redirect my post
there.

Thanks in advance,
Greg
 
Try using or modifying the following expression:

<table>(?><table>(?<level>)|(?<contents-level>)</table>|.)*(?(level)(?!))</t
able>

This will give you the contents of the innermost table tags in the Captures
collection of the named group "contents". You could then just iterate
through them and find the ones that contain the string you are looking for.
You could probably modify this expression to match exactly what you want
without this step.


Hope this helps,

Brian Davis
http://www.knowdotnet.com
 
Cor,

1) I want to learn about regular expressions. I wrote a lot of code
to extract data from HTML before I got that chapter in Balena's book,
using the VB string processing commands and now find that a few lines
of regex does the job of dozens lines of my current code.
2) A few months ago, I asked a more general question along the same
lines as the one you have responded to and it was suggested that
regex's were the way to go.
3) Please give me a suggestion as to how to use mshtl. I'm learning
VB.net partly as a hobby (although I have some things I would like to
use it for in my day job). I once was a professional programmer, and
here I'm really going to date myself, I spent 6 years at IBM writing
tons of Fortran. So....some aspects of programming I can hang in
there with anyone, but in other aspects (anything that's become
mainstream in the last 20 years say) I'm a newbie.

I am very appreciative of any help and guidance.

Greg
 
Greg,
The following sites provide a wealth of information on regular expressions.

A tutorial & reference on using regular expressions:
http://www.regular-expressions.info/

The MSDN's documentation on regular expressions:
http://msdn.microsoft.com/library/d...l/cpconRegularExpressionsLanguageElements.asp

Instead of writing your own parser or using RegEx, have you considered using
mshtml as Cor suggested or a SgmlReader (HTML reader)?

http://www.gotdotnet.com/Community/...mpleGuid=B90FDDCE-E60D-43F8-A5C4-C3BD760564BC

Hope this helps
Jay
 
Hi Greg,

I am one of those in this newsgroup who knows someting more about the
document object model.
DOM

When you are acting with HTML or better to say DHTML you have to know have
to know more about DHTML.

Using the DOM you can do OOP programming, while with the reged it is more in
a classic procedural way. (The regex is more something you find back in
scripting languages).

I have no problem to guide you a little bit, however before you see the
tools I think it is better to have a look at that Document Object Model.

The document object model is described by W3C however looking at that site
is in my opinion a endless way to go and you never find something because of
the impossible way everything is everytimge by someone described in his own
way.

On/in Msdn it is also hard to find however better. You can search using
always the keyword "Object".

This is the document object itself
http://msdn.microsoft.com/library/d...thor/dhtml/reference/objects/obj_document.asp

The head object
http://msdn.microsoft.com/library/default.asp?url=/workshop/author/dhtml/reference/objects/head.asp


This is the body object
http://msdn.microsoft.com/library/default.asp?url=/workshop/author/dhtml/reference/objects/body.asp

Mshtml are the classes to access those objects in a OOP way. However it are
endless classes which when referenced in your program have endless members.

You never should import it in your IDE but always do the reference direct
before what you need as example mshtml.document2 bla bla

When you are busy with these classes in VS net you have to set at the help
the search path to all.

Have a look at those links

I hope this helps?

Cor
 
Back
Top