J
JoJo
Folks:
I have an HTML file (see general structure below ) that is about 100 pages
long. Scattered throughout this document (in a really disorganized fashion)
are 6 or 7
categories or fields of information like: Name of article, Author of
Article, Date Published, Comment, Full Story, Printer version, etc. Some of
this information in
my document is in the form of HTML hyperlinks. I am attempting to extract
some of this information in such a way that the HTML links are preserved and
not
automatically transformed to pure text.
Specifically, I am interested in the DOS code that would extract the
following 2 pieces of information from my HTML document and create a
separate document:
(1) Names of the articles (always starts with ">>" followed by the
actual article name in the HTML format
(11) Date published ( | Published 08/9/2009 | ) Note this information
is contained between the "|"
* I am interested in the DOS code to extract these 2 pieces of
information listed above.
--------------HTML file that is about 100 pages
long----------------------------------------
Articles by this Author - Page 1 (Page 1 of 100) « Back | 1
| 2 | 3 | 4 | 5 | Next »
By James Johnson | Published 08/9/2009 |
Do you have a written goal of what you expect to make from your trading this
year?
full story printer version
.............................................................................
.................................................................
----------------------------------------------------------------------------
I have an HTML file (see general structure below ) that is about 100 pages
long. Scattered throughout this document (in a really disorganized fashion)
are 6 or 7
categories or fields of information like: Name of article, Author of
Article, Date Published, Comment, Full Story, Printer version, etc. Some of
this information in
my document is in the form of HTML hyperlinks. I am attempting to extract
some of this information in such a way that the HTML links are preserved and
not
automatically transformed to pure text.
Specifically, I am interested in the DOS code that would extract the
following 2 pieces of information from my HTML document and create a
separate document:
(1) Names of the articles (always starts with ">>" followed by the
actual article name in the HTML format
(11) Date published ( | Published 08/9/2009 | ) Note this information
is contained between the "|"
* I am interested in the DOS code to extract these 2 pieces of
information listed above.
--------------HTML file that is about 100 pages
long----------------------------------------
Articles by this Author - Page 1 (Page 1 of 100) « Back | 1
| 2 | 3 | 4 | 5 | Next »
By James Johnson | Published 08/9/2009 |
Do you have a written goal of what you expect to make from your trading this
year?
full story printer version
.............................................................................
.................................................................
----------------------------------------------------------------------------