P
Peter Rilling
I am trying to write a regular expression that locates href attributes in
some html content. I used the example in the .NET documentation as a
starting point
("background\\s*=\\s*(?:\"(?<webpath>[^\"]*)\"|(?<webpath>\\S+))"). This
works well for most cases, but I ran into a problem when the href is not
quoted and it is the last attribute in the tag (meaning that there is a
greater-then (>) rather then a space that immediately following the URL).
How might I modified the URL so that it allows all the following?
href="..."
href=... <--space right here
href=...>
Also, are there any other formats that I need to be aware of?
some html content. I used the example in the .NET documentation as a
starting point
("background\\s*=\\s*(?:\"(?<webpath>[^\"]*)\"|(?<webpath>\\S+))"). This
works well for most cases, but I ran into a problem when the href is not
quoted and it is the last attribute in the tag (meaning that there is a
greater-then (>) rather then a space that immediately following the URL).
How might I modified the URL so that it allows all the following?
href="..."
href=... <--space right here
href=...>
Also, are there any other formats that I need to be aware of?