Regular Expression help - Extract links from certain tag

  • Thread starter Thread starter roberto321
  • Start date Start date
R

roberto321

Hi Guys,

I was wondering if someone could help me out with the following
requirements
<mydocument>
<div id="other">
<a href="linkother">linkother</a>
</div>

<div id="hello">
<a href="link1url">link1</a>
<a href="link2url">link2</a>
</div>
</mydocument>

If I wanted to extract all links from the div tag id="hello" how do I
go about it
Desired result would be:
link1url
link2url

So far I'm extracting links like this: <a href="[^"]+">[^<]+</a> but
how do I go about only making sure they are from a particular tag
group?

Regards DotnetShadow
 
Hi Guys,

I was wondering if someone could help me out with the following
requirements
<mydocument>
<div id="other">
<a href="linkother">linkother</a>
</div>

<div id="hello">
<a href="link1url">link1</a>
<a href="link2url">link2</a>
</div>
</mydocument>

If I wanted to extract all links from the div tag id="hello" how do I
go about it
Desired result would be:
link1url
link2url

So far I'm extracting links like this: <a href="[^"]+">[^<]+</a> but
how do I go about only making sure they are from a particular tag
group?

Regards DotnetShadow

You could look at loading it into a DOM tree (possibly XML DOM if the
document is well formed). Then you just have to navigate the tree
looking for div tags with an id attribute of hello, then fetch all the A
tags under that node (could be possible using XPATH).
 
Back
Top