regex Question

  • Thread starter Thread starter eBob.com
  • Start date Start date
E

eBob.com

I'm trying to come up the right regex expression to pull the strings abc ...
pqu out of the following:

align="right" class="maintext">abc<br>def<br>ghi<br>jlk<br>lmn<br>pqu</div>

The problem is that the the "<br>" separating string does not occur before
the first one or after the last one. And there won't always be 6 strings
(corresponding to the abc ... pqu strings).

I'm looking for something like this ...

align="right" class="maintext">(?<attr>\w+<br>)+

but the expression has to insist on the terminating "</div>" and the <br>
string is required between the strings but not before the </div> terminating
string.

The expression above causes the "attr" values to pick up the <br> text which
is not what I want but I can deal with that later. What I am seeking help
on is how to insist on the <br> separating strings except before the </div>
terminating string.

Thanks, Bob
 
eBob.com said:
I'm trying to come up the right regex expression to pull the strings abc
... pqu out of the following:

align="right"
class="maintext">abc<br>def<br>ghi<br>jlk<br>lmn<br>pqu</div>

The problem is that the the "<br>" separating string does not occur before
the first one or after the last one. And there won't always be 6 strings
(corresponding to the abc ... pqu strings).

I'm looking for something like this ...

align="right" class="maintext">(?<attr>\w+<br>)+

but the expression has to insist on the terminating "</div>" and the <br>
string is required between the strings but not before the </div>
terminating string.

The expression above causes the "attr" values to pick up the <br> text
which is not what I want but I can deal with that later. What I am
seeking help on is how to insist on the <br> separating strings except
before the </div> terminating string.

Thanks, Bob
After sleeping on this I thought that MAYBE I could have more than one
"attr" group in the expression, so I gave it a shot and that does work.
Then the problem was to make sure the <br> text was not captured in the attr
group. So what I have at the moment and will probably go with is ...

align="right" class="maintext">((?<attr>\w+)(<br>))+(?<attr>\w+)+</div>

It's not great because I end up with two numbered groups which I don't need.
But the attr group captures what I need so I think that will have to do.

Thanks, Bob
 
Back
Top