Regex Help needed (New)

  • Thread starter Thread starter barry
  • Start date Start date
B

barry

Hi

<IMG style="FILTER:
progid:DXImageTransform.Microsoft.AlphaImageLoader(src='/images/IconExperience/ApplicationBasics/16x16/shadow/edit.png',
sizingMethod='scale'); WIDTH: 16px; HEIGHT: 16px" height=16
src="images/PNGPH.gif" width=16 align=absMiddle border=0>

Here is a update to my previous problem, the following IMG tag needs to be

<IMG height=16 src="images/edit.png'" width=16 align=absMiddle border=0>

you will note that "images/PNGPH.gif " has changed to "images/edit.png"

Can someone help.

Barry
 
Hello Barry,
<IMG style="FILTER:
progid:DXImageTransform.Microsoft.AlphaImageLoader(src='/images/IconEx
perience/ApplicationBasics/16x16/shadow/edit.png',
sizingMethod='scale'); WIDTH: 16px; HEIGHT: 16px" height=16
src="images/PNGPH.gif" width=16 align=absMiddle border=0>


I took the liberty of guessing that the order of the attributes would not
always be the same. So I made the pattern robust enough to work around that:

<img(?=[^>]*(?<styleblock>style="((?!src=|").)+src='[^']+/(?<imgname>[^']+)'[^"]+"))(?=[^>]*(?<height>height=\d+))(?=[^>]*(?<width>width=\d+))(?=[^>]*(?<align>align=[a-z]+))[^>]+>

This pattern captures everything we need to form a new url using the following
replacement pattern:

<img src="/images/${imgname}" ${width} ${height} ${align}>

It basically fetches each of the important values from within the IMG tag
and then puts a new tag together using all these different parts.
 
Thanks

Jesse Houwing said:
Hello Barry,
<IMG style="FILTER:
progid:DXImageTransform.Microsoft.AlphaImageLoader(src='/images/IconEx
perience/ApplicationBasics/16x16/shadow/edit.png',
sizingMethod='scale'); WIDTH: 16px; HEIGHT: 16px" height=16
src="images/PNGPH.gif" width=16 align=absMiddle border=0>


I took the liberty of guessing that the order of the attributes would not
always be the same. So I made the pattern robust enough to work around
that:

<img(?=[^>]*(?<styleblock>style="((?!src=|").)+src='[^']+/(?<imgname>[^']+)'[^"]+"))(?=[^>]*(?<height>height=\d+))(?=[^>]*(?<width>width=\d+))(?=[^>]*(?<align>align=[a-z]+))[^>]+>

This pattern captures everything we need to form a new url using the
following replacement pattern:

<img src="/images/${imgname}" ${width} ${height} ${align}>

It basically fetches each of the important values from within the IMG tag
and then puts a new tag together using all these different parts.
 
Hi,

It works, i am trying to modify the source url of the images on some
webpages using this approach (similar to what Firefox does when you
"SaveAs"), it works on the tables at the left and top, but not on the tables
at the center and right, if you provide me your email-id, i can send you the
url(s) for testing purposes.

Regards
Barry


Jesse Houwing said:
Hello Barry,
<IMG style="FILTER:
progid:DXImageTransform.Microsoft.AlphaImageLoader(src='/images/IconEx
perience/ApplicationBasics/16x16/shadow/edit.png',
sizingMethod='scale'); WIDTH: 16px; HEIGHT: 16px" height=16
src="images/PNGPH.gif" width=16 align=absMiddle border=0>


I took the liberty of guessing that the order of the attributes would not
always be the same. So I made the pattern robust enough to work around
that:

<img(?=[^>]*(?<styleblock>style="((?!src=|").)+src='[^']+/(?<imgname>[^']+)'[^"]+"))(?=[^>]*(?<height>height=\d+))(?=[^>]*(?<width>width=\d+))(?=[^>]*(?<align>align=[a-z]+))[^>]+>

This pattern captures everything we need to form a new url using the
following replacement pattern:

<img src="/images/${imgname}" ${width} ${height} ${align}>

It basically fetches each of the important values from within the IMG tag
and then puts a new tag together using all these different parts.
 
Hello Barry,
Hi,

It works, i am trying to modify the source url of the images on some
webpages using this approach (similar to what Firefox does when you
"SaveAs"), it works on the tables at the left and top, but not on the
tables at the center and right, if you provide me your email-id, i can
send you the url(s) for testing purposes.

Those images don't have an alignment set. You basically have two options
here. Either make the alignment optional, or run a a second regex
There were quotes aroudn the wisth and the height, the previous ones didn't
have that.
Als lastly there was no url in a style block, just in the src.
These images also have other attributes, that weren't in the original samples
you sent. I leave how to deal with those up to your imagination. With the
samples provided in this and previous posts you should be able to fix that
yourself.

This regex solves most of the above mentioned things:
<img(?=[^>]*(?<styleblock>style="((?!src=|").)+src='[^']+/(?<newlink>[^']+)'[^"]+")|[^>]*src="[^"]*/(?<newlink>[^"]+"))(?=[^>]*(?<height>height=(\d+|"\d+")))(?=[^>]*(?<width>width=(\d+|"\d+")))(?=[^>]*(?<align>align=[a-z]+))?[^>]+>

To fix the alignment you basically have to make the part of the regex optional:
(?=[^>]*(?<align>align=[a-z]+)|)
(?=[^>]*(?<align>align=[a-z]+))?
(?=([^>]*(?<align>align=[a-z]+))?)
should all do, I choose the middle one.

On top of that, thay don't have a style with a DXTransform on them, just
a src=
I added a second option in the first look ahead to start looking for a src
if the syleblock hadn't been found. That way it defaults back to the norms
src.
(?=[^>]*(?<styleblock>style="((?!src=|").)+src='[^']+/(?<newlink>[^']+)'[^"]+")|[^>]*src="[^"]*/(?<newlink>[^"]+"))

And to solve the missing quotes around the width/height:
(?=[^>]*(?<height>height=(\d+|"\d+")))

I hope you're doing this for a one time conversion, because if you want some
screen scraping solution, or want to reach some other 'in production' goal,
I'd step away from this line of thought. You're better off with the HTML
Agility pack (www.codeplex.com/htmlagilitypack/), or by just re-rendering
the page from the database content it was originally created with.

Kind Regards,

Jesse
Regards
Barry
Hello Barry,
<IMG style="FILTER:
progid:DXImageTransform.Microsoft.AlphaImageLoader(src='/images/Icon
Ex perience/ApplicationBasics/16x16/shadow/edit.png',
sizingMethod='scale'); WIDTH: 16px; HEIGHT: 16px" height=16
src="images/PNGPH.gif" width=16 align=absMiddle border=0>
I took the liberty of guessing that the order of the attributes would
not always be the same. So I made the pattern robust enough to work
around that:

<img(?=[^>]*(?<styleblock>style="((?!src=|").)+src='[^']+/(?<imgname>
[^']+)'[^"]+"))(?=[^>]*(?<height>height=\d+))(?=[^>]*(?<width>width=\
d+))(?=[^>]*(?<align>align=[a-z]+))[^>]+>

This pattern captures everything we need to form a new url using the
following replacement pattern:

<img src="/images/${imgname}" ${width} ${height} ${align}>

It basically fetches each of the important values from within the IMG
tag and then puts a new tag together using all these different parts.
 
Back
Top