Trolling a site for data

  • Thread starter Thread starter Steffan A. Cline
  • Start date Start date
S

Steffan A. Cline

I was trying to find a way to troll/poll/scrape a site for data.
Unfortunately the site uses AJAX with .asp which I have never worked with
before. If this site used php, lasso or the like it would be easy to grab
the url and query the data directly. I have done similar things before where
the pages use a plain form and then paginate through the results. (1-100,
101-200 etc).

The example would be this site :
http://www.tblaw.com/FsSales/PendingSales.aspx

I can simply include the url to the site via the likes of curl or something
but it only gets the first 60 records. No matter what I do, I can't find out
how to get more than the 60.

On another site, for example, I hit the page first to get the cookie and
event and action so that I can keep posting them to the next page with the
page parameters and then parse the results.

Sorry if I am not explaining this very well.

Any suggestions?

Thanks,
Steffan
 
I was trying to find a way to troll/poll/scrape a site for data.
Unfortunately the site uses AJAX with .asp which I have never worked with
before. If this site used php, lasso or the like it would be easy to grab
the url and query the data directly. I have done similar things before where
the pages use a plain form and then paginate through the results. (1-100,
101-200 etc).

The example would be this site :http://www.tblaw.com/FsSales/PendingSales..aspx

I can simply include the url to the site via the likes of curl or something
but it only gets the first 60 records. No matter what I do, I can't find out
how to get more than the 60.

On another site, for example, I hit the page first to get the cookie and
event and action so that I can keep posting them to the next page with the
page parameters and then parse the results.

Sorry if I am not explaining this very well.

Any suggestions?

Thanks,
Steffan

You have to learn how ajax is working. Usually it's a java/vb script
that requests some data from the server, and takes and renders the
resulting data back to the page. It means that you need to find how it
is implemented in every particular case and read the output from the
script/page that returns the resulting data.
 
(e-mail address removed), Alexey
Smirnov at (e-mail address removed) wrote on 11/9/09 1:03 AM:
You have to learn how ajax is working. Usually it's a java/vb script
that requests some data from the server, and takes and renders the
resulting data back to the page. It means that you need to find how it
is implemented in every particular case and read the output from the
script/page that returns the resulting data.

Right. I get that. The problem is that asp.net does an outstanding way of
obfuscating. On a normal JS based AJAX query, you can easily see the URL and
parameters being sent. The deal is that asp.net sends waaaay more data.

I was hoping someone could help figure out the way exactly that asp.net is
doing it. I tried parsing the headers and no luck.

Thanks,
Steffan
 
one area that ap.net is different is its postback model. there are
hidden fields __EVENTTARGET and __EVENTARGUMENT that contain info on the
postback control. __VIEWSTATE contains state infomation. before you
can do a form post to a asp.net server, you must do a get to get a valid
viewstate.

in you case you need to go a get, to get page one dat and a viewstate.
then a form post (filling in __TEVENTTARGET) to get page two and the
viewstate for page 3.

if the site uses an update panel, then its just a little tricker. the
update panel posts all the form data (there will be hidden fields to
identify it as a async postback) via XmlHttpRequest, and gets back just
the html (pretty simple format) for a subsection of the page. You will
need to parse this for your data, new viewstate, and any form field
updates (keep track of the all form field from before the post and merge
results).

-- bruce (sqlwork.com)
 
(e-mail address removed), Alexey
Smirnov at (e-mail address removed) wrote on 11/9/09 1:03 AM:







Right. I get that. The problem is that asp.net does an outstanding way of
obfuscating. On a normal JS based AJAX query, you can easily see the URL and
parameters being sent. The deal is that asp.net sends waaaay more data.

I was hoping someone could help figure out the way exactly that asp.net is
doing it. I tried parsing the headers and no luck.

Thanks,
Steffan- Hide quoted text -

- Show quoted text -

As Bruce correctly noted, look into postback data, in most cases all
information is there. For instance, if we take your URL as an example,
we will see that the gridview has paging 1..2..3..etc. These links
initiate asynchronous postbacks and cause a partial-page update. Each
link has an id like 'ListView1$PagerTop$ctl01$ctlXX' where 00 is for
page #1, 01 for page #2, etc. and urls as javascript:__doPostBack
('ListView1$PagerTop$ctl01$ctlXX',''). What does it mean? It does mean
that the number of new page will be sent via postback as id of the
link control. Sounds simple, right? Send a request to the remote
server where you should say that your __EVENTTARGET is
ListView1%24PagerTop%24ctl01%24ctl01 when you want to get page #2. If
page controls are based on viewstate you need to copy the viewstate
into request as well. This is probably where you were confused by many
data. ViewState is used the retain the state of controls between
postbacks. Again, if we take your example, we don't change any control
state, and it means that you can copy original viewstate from the very
first page. If it's necessary to know what does ViewState includes,
you can decode it. There are some tools to do it, for example:

http://lachlankeown.blogspot.com/2008/05/online-viewstate-viewer-decoder.html

To debug HTTP requests, use Fiddler Web Debugger.
 
Back
Top