Get info from a page

  • Thread starter Thread starter James
  • Start date Start date
J

James

Hi everyone,
The fragment below is from a table on a page I pull, (scrape), information
from. The fragment is one row of what is potentially several rows.

The items with id's I can get:
id="dgdBusSchedule__ctl1_lblDepartureDate" yields "06:11"
id="dgdBusSchedule__ctl1_lnkRouteNumber" yields "6"

The id's above change with each row - ... _ctl2_..., ... _ctl3_..., easy to
loop through.

The id's above is the major data items for my app. The last cell in the row
I cannot seem to get. It is the text which reads "University". It is the
destination of route 6. This item seems to have no discerning id, name or
tag, even the class is generic on this page.

What methods are used to iterate over a table and all of its rows/cells?

All opinions appreciated!



<td scope="row" class="gAltContentSection">
<span
id="dgdBusSchedule__ctl1_lblDepartureDate">06:11</span></td>
<td scope="row" class="gAltContentSection">
<a id="dgdBusSchedule__ctl1_lnkRouteNumber"
href="javascript:document.forms[&quot;frmBusStopScheduleResults&quot;][&quot;txtRouteId&quot;].value
=&quot;30&quot;;document.forms[&quot;frmBusStopScheduleResults&quot;][&quot;txtRouteDepartureTime&quot;].value
=&quot;371&quot;;gfnTransit_SwitchPostBackUrl(&quot;BusStopSchedule_Results.aspx|BusStopDetail.aspx&quot;,&quot;RouteSchedule_Results.aspx&quot;,&quot;frmBusStopScheduleResults&quot;,&quot;_blank&quot;,&quot;NOVIEWSTATE&quot;);RouteSchedulePostBack(&quot;&quot;,&quot;&quot;,&quot;frmBusStopScheduleResults&quot;,&quot;txtRouteNumber&quot;,&quot;6&quot;);">6</a></td>
<td scope="row" class="gAltContentSection">University</td>
 
James said:
What methods are used to iterate over a table and all of its rows/cells?

If you would want to loop over the table, you would have to parse the
html code into some kind of object tree. I would suggest that you just
use a regular expression to get the data from the code.

Something like:

Matches m = Regex.Matches(page,
"<span[^>]+?id=""[^""]+?lblDepartureDate""[^>]*?>([^<]+?)</span>.*?<a[^>]+?id=""[^""]+?lnkRouteNumber""[^>]*?>(\d+)</a>")
 
Thanks Göran,

Regex seems quite powerful. Looking on msdn I have found info on the methods
for regex, but lack the knowledge how to code one. Where is a page to teach
me how to make regex? Once I learn the syntax, agree, get the info I need is
possible.

Thanks Göran!





Göran Andersson said:
James said:
What methods are used to iterate over a table and all of its rows/cells?

If you would want to loop over the table, you would have to parse the html
code into some kind of object tree. I would suggest that you just use a
regular expression to get the data from the code.

Something like:

Matches m = Regex.Matches(page,
"<span[^>]+?id=""[^""]+?lblDepartureDate""[^>]*?>([^<]+?)</span>.*?<a[^>]+?id=""[^""]+?lnkRouteNumber""[^>]*?>(\d+)</a>")
 
Found a page! http://www.regular-expressions.info/reference.html

Göran Andersson said:
James said:
What methods are used to iterate over a table and all of its rows/cells?

If you would want to loop over the table, you would have to parse the html
code into some kind of object tree. I would suggest that you just use a
regular expression to get the data from the code.

Something like:

Matches m = Regex.Matches(page,
"<span[^>]+?id=""[^""]+?lblDepartureDate""[^>]*?>([^<]+?)</span>.*?<a[^>]+?id=""[^""]+?lnkRouteNumber""[^>]*?>(\d+)</a>")
 
Found a page!http://www.regular-expressions.info/reference.html




If you would want to loop over the table, you would have to parse the html
code into some kind of object tree. I would suggest that you just use a
regular expression to get the data from the code.
Something like:
Matches m = Regex.Matches(page,
"<span[^>]+?id=""[^""]+?lblDepartureDate""[^>]*?>([^<]+?)</span>.*?<a[^>]+?­id=""[^""]+?lnkRouteNumber""[^>]*?>(\d+)</a>")

- Show quoted text -

you can also use javascript to iterate in rows and cells...
<script>
function delete_all(table_element)
{
for(i=table_element.rows.length-1; i > -1; i--)
{
... check row content ...
}
}
</script>


... more at http://www.siccolo.com/articles.asp
 
Back
Top