Downloading PDF files from website

  • Thread starter Thread starter Henry
  • Start date Start date
H

Henry

I am trying to download some PDF files from a website.

I am using the IWebBrowser2::Navigate Method

m_WebBrowser.Navigate

to get to the site. This works ok and the requested PDF file is displayed in
the window.

Then I am trying to SAVE TO A DIRECTORY the downloaded PDF file that is
currently displayed.

I am using the IWebBrowser2::ExecWB Method

m_WebBrowser.ExecWB(OLECMDID_SAVE, 2, NULL, NULL);

This does not work as far as I can tell. Any ideas?

Henry
 
I'm guessing that this won't work because the PDF is being displayed in
Acrobat, hence from this point on it's no use sending OLE commands to the
browser.

As an alternative I'd suggest checking-out AutoIt's InetGet() function. With
this you could download the page, parse for links to PDFs, and then download
the files directly.

Two readymade Web-downloaders are WGet or WinHTTrack. If this is for a
once-off job then either of these would probably involve less work than
coding it yourself.
 
Anteaus said:
I'm guessing that this won't work because the PDF is being displayed in
Acrobat, hence from this point on it's no use sending OLE commands to the
browser.

As an alternative I'd suggest checking-out AutoIt's InetGet() function. With
this you could download the page, parse for links to PDFs, and then download
the files directly.

Two readymade Web-downloaders are WGet or WinHTTrack. If this is for a
once-off job then either of these would probably involve less work than
coding it yourself.

Anteaus, you are a genius!

You are correct that the page displays in Acrobat. Nevertheless, the command
OLECMDID_PRINT does work and sends the PDF to the printer. Unfortunately, the
OLECMDID_SAVE does not work.

I have never heard of AutoIt. It is completely new to me. I shall obtain a
copy at once and follow your suggestion with InetGet(). From what you say it
seems to be a very powerful language. I am surprised that I have never heard
anyone mention it or read of it.

Regarding WGet and WinHTTrack, I have tried both and neither will bring down
the part of the website containing the PDF files (I am a fully paid-up user
of the website and wish to archive the PDF files that I normally receive in
paper format.)

I think the problem is that a page pops up when initially contacting the
website. This page has 2 check boxes. One of them (the correct one) must be
selected before one can logon with password. I don't think WGet and
WinHTTrack can handle a page like that.

Thank you again for your help.

Henry
 
Henry said:
Anteaus, you are a genius!

You are correct that the page displays in Acrobat. Nevertheless, the command
OLECMDID_PRINT does work and sends the PDF to the printer. Unfortunately, the
OLECMDID_SAVE does not work.

I have never heard of AutoIt. It is completely new to me. I shall obtain a
copy at once and follow your suggestion with InetGet(). From what you say it
seems to be a very powerful language. I am surprised that I have never heard
anyone mention it or read of it.

Regarding WGet and WinHTTrack, I have tried both and neither will bring down
the part of the website containing the PDF files (I am a fully paid-up user
of the website and wish to archive the PDF files that I normally receive in
paper format.)

I think the problem is that a page pops up when initially contacting the
website. This page has 2 check boxes. One of them (the correct one) must be
selected before one can logon with password. I don't think WGet and
WinHTTrack can handle a page like that.

Thank you again for your help.

Henry

Anteus,

I must ask a question. Does AutoIt allow you to manually click the webpage
check box that I refer to above? What I need to solve this problem is a
browser with the facility to perform a WGET. I don't know if AutoIt allows
you to click the check box with the mouse or whether this action has to be
programmed in some fashion.

Henry
 
Back
Top