Creating Table from Web-Based Material

  • Thread starter Thread starter Ken
  • Start date Start date
K

Ken

This is a large question, and I appreciate any level of response that
can be given, including whether what I seek to accomplish is possible
or practical.

I am trying to determine how completely I can automate the process of
taking material from a web site and putting it into an Excel table.
The web-based tables show names and "x" for a vote under "Aye" or
"Nay." An example may be found at
http://frwebgate.access.gpo.gov/cgi...8&directory=/diskb/wais/data/108_cong_reports
by searching the page for fc-14 (there are other votes below this
one). As you will see, these vote tables are a small portion of a
larger report.

I would like each vote to be pulled into an Excel sheet and displayed
as follows:

Name Aye(vote1) Nay(vote1) Aye(vote2) Nay (vote2) etc.

Name1 x x
Name2 x x
etc.

Even better would be to pull the votes from multiple reports which may
be found by going to
http://www.gpoaccess.gov/serialset/creports/index.html then checking
House and putting "financial services" in the search box and clicking
submit. Not all of the reports that come up show votes. There are,
of course, a number of committees, but start with this one.

This can be done, of course, by a web query to the report in Excel,
but non-table text must be deleted, the votes lined up with "text to
columns" (fixed width), each set of votes copied and pasted
horizontally next to the preceding and repeated names (across)
deleted.

Since search engine spiders grab huge amounts of information and
e-mail addresses and other information are culled from the web, what I
am trying to do seems possible to this non-programmer. Your advice is
appreciated.


Ken
 
Hi Ken,

Actually I don't know, how deep you want to analyze the data embedded
in the html, but if you want a custom-solving, you have to write an
HTML-Parser (as Add-In or dll).

I needed a similar thing with Access once, but I have to admit, if your
knowledge is not far enough with VBA, it will be very difficult to code
it. And data in irregular tables is komplex to extract. I think there
are some standart solutions for Excel on the market which you can buy
and easily install.
 
Ken,

Your sample page is not an HTML document as such, just a lot of plain
text wrapped in <pre></pre> to preserve the formatting. This makes it
more difficult to extract data, since there's no real tabular
structure to attack.

To get at the plain text you'd have to come up with some sort of
parameterized parsing routine: is definitely doable if the tables are
have similar structure. If structures vary slightly then the routine
might be able to compensate for (e.g.) different column widths, but
beyond that the amount of coding required would increase.

So yes, it's possible but as a "non-programmer" it would present a
challenge.

Regards,
Tim.
 
Back
Top