finding href with regex.

  • Thread starter Thread starter Peter Rilling
  • Start date Start date
P

Peter Rilling

I am trying to write a regular expression that locates href attributes in
some html content. I used the example in the .NET documentation as a
starting point
("background\\s*=\\s*(?:\"(?<webpath>[^\"]*)\"|(?<webpath>\\S+))"). This
works well for most cases, but I ran into a problem when the href is not
quoted and it is the last attribute in the tag (meaning that there is a
greater-then (>) rather then a space that immediately following the URL).
How might I modified the URL so that it allows all the following?

href="..."
href=... <--space right here
href=...>

Also, are there any other formats that I need to be aware of?
 
Hi,
[inline]
----- Original Message -----
From: "Peter Rilling" <[email protected]>
Newsgroups: microsoft.public.dotnet.languages.csharp
Sent: Friday, December 26, 2003 8:04 PM
Subject: finding href with regex.

I am trying to write a regular expression that locates href attributes in
some html content. I used the example in the .NET documentation as a
starting point
("background\\s*=\\s*(?:\"(?<webpath>[^\"]*)\"|(?<webpath>\\S+))"). This

Try this instead:
"background\\s*=\\s*(?:\"(?<webpath>[^\"]*)\"|(?<webpath>[^\\s>]+)[\\s>])"

\\S+ one character or more, no space
[^\\s>]+ one character or more, no space and no >
works well for most cases, but I ran into a problem when the href is not
quoted and it is the last attribute in the tag (meaning that there is a
greater-then (>) rather then a space that immediately following the URL).
How might I modified the URL so that it allows all the following?

href="..."
href=... <--space right here
href=...>

Also, are there any other formats that I need to be aware of?

Don't think there are other "valid" formats.

HTH,
Greetings
 
Back
Top