B
Bradley Plett
I'm hopeless at regular expressions (I just don't use them often
enough to gain/maintain knowledge), but I need one now and am looking
for help. I need to parse through a document to find a URL, and then
reconstruct another URL based on it. For example, I need to scan a
web page looking for something like <a
href="some_dir/list_20050815100225.csv">. I don't know in advance
what the date/time in the file name will be. I need to take the
result of that and construct a URL out of it so that I can automate
the download of this file on a regular basis. The replace can be done
by replacing "<token>" in
"http://www.whatever.com/some_dir/list_<token>" with the result from
above. However, I would like the directory information included in
the search result so that I don't have to hard-code it (i.e. I'd
rather look for a URL with "list_<datetime>.csv" in it).
I have a regular expression that comes close:
"href=""some_dir/list_(??<1>[^""]*)""|(?<1>\S+))". I got that by
tweaking the example at
http://msdn.microsoft.com/library/d...uide/html/cpconexamplechangingdateformats.asp.
If I can't find a cleaner sample, that will have to do. However,
there are two minor problems with this expression: 1) I would rather
be returning the complete URL in the href (to make it easier to
capture variable subdirectories, for example), and 2) it would require
a two-step process (the match followed by the replace). Is it
possible have a single regular expression to do both? That would
simplify configuration of my program, since the intent is that none of
this be hard-coded.
Any help would be appreciated.
Thanks!
Brad.
P.S. If there's a better place to post this kind of question, I'd
love to hear about it. I was tempted to cross-post, but....
enough to gain/maintain knowledge), but I need one now and am looking
for help. I need to parse through a document to find a URL, and then
reconstruct another URL based on it. For example, I need to scan a
web page looking for something like <a
href="some_dir/list_20050815100225.csv">. I don't know in advance
what the date/time in the file name will be. I need to take the
result of that and construct a URL out of it so that I can automate
the download of this file on a regular basis. The replace can be done
by replacing "<token>" in
"http://www.whatever.com/some_dir/list_<token>" with the result from
above. However, I would like the directory information included in
the search result so that I don't have to hard-code it (i.e. I'd
rather look for a URL with "list_<datetime>.csv" in it).
I have a regular expression that comes close:
"href=""some_dir/list_(??<1>[^""]*)""|(?<1>\S+))". I got that by
tweaking the example at
http://msdn.microsoft.com/library/d...uide/html/cpconexamplechangingdateformats.asp.
If I can't find a cleaner sample, that will have to do. However,
there are two minor problems with this expression: 1) I would rather
be returning the complete URL in the href (to make it easier to
capture variable subdirectories, for example), and 2) it would require
a two-step process (the match followed by the replace). Is it
possible have a single regular expression to do both? That would
simplify configuration of my program, since the intent is that none of
this be hard-coded.
Any help would be appreciated.
Thanks!
Brad.
P.S. If there's a better place to post this kind of question, I'd
love to hear about it. I was tempted to cross-post, but....