html screen scrapper tool?

  • Thread starter Thread starter GaryDean
  • Start date Start date
G

GaryDean

Anyone know of html screen scrapper software that will work with .net
projects. We need to get data back from a gov site that only provides it on
a webpage.
Thanks,
Gary
 
Hi Gary,

Here's a working sample with source code. You can refer to it and write
your own project.
http://www.codeproject.com/KB/dotnet/Fast_XPath_Reader.aspx

If it's not what you need please let me know and clarify the requirement in
detail.

Regards,
Allen Chen
Microsoft Online Support

Delighting our customers is our #1 priority. We welcome your comments and
suggestions about how we can improve the support we provide to you. Please
feel free to let my manager know what you think of the level of service
provided. You can send feedback directly to my manager at:
(e-mail address removed).

==================================================
Get notification to my posts through email? Please refer to
http://msdn.microsoft.com/en-us/subscriptions/aa948868.aspx#notifications.

Note: The MSDN Managed Newsgroup support offering is for non-urgent issues
where an initial response from the community or a Microsoft Support
Engineer within 1 business day is acceptable. Please note that each follow
up response may take approximately 2 business days as the support
professional working with you may need further investigation to reach the
most efficient resolution. The offering is not appropriate for situations
that require urgent, real-time or phone-based interactions or complex
project analysis and dump analysis issues. Issues of this nature are best
handled working with a dedicated Microsoft Support Engineer by contacting
Microsoft Customer Support Services (CSS) at
http://support.microsoft.com/select/default.aspx?target=assistance&ln=en-us.
==================================================
This posting is provided "AS IS" with no warranties, and confers no rights.
--------------------
| From: "GaryDean" <[email protected]>
| Subject: html screen scrapper tool?
| Date: Sun, 19 Oct 2008 16:37:06 -0700
| Lines: 7
| X-Priority: 3
| X-MSMail-Priority: Normal
| X-Newsreader: Microsoft Outlook Express 6.00.2900.3028
| X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.3028
| X-RFC2646: Format=Flowed; Original
| Message-ID: <[email protected]>
| Newsgroups: microsoft.public.dotnet.framework.aspnet
| NNTP-Posting-Host: ip68-110-7-92.tc.ph.cox.net 68.110.7.92
| Path: TK2MSFTNGHUB02.phx.gbl!TK2MSFTNGP01.phx.gbl!TK2MSFTNGP03.phx.gbl
| Xref: TK2MSFTNGHUB02.phx.gbl
microsoft.public.dotnet.framework.aspnet:78136
| X-Tomcat-NG: microsoft.public.dotnet.framework.aspnet
|
| Anyone know of html screen scrapper software that will work with .net
| projects. We need to get data back from a gov site that only provides it
on
| a webpage.
| Thanks,
| Gary
|
|
|
 
Hi Gary,

Is this issue solved?

Regards,
Allen Chen
Microsoft Online Support

Delighting our customers is our #1 priority. We welcome your comments and
suggestions about how we can

improve the support we provide to you. Please feel free to let my manager
know what you think of the

level of service provided. You can send feedback directly to my manager at:
(e-mail address removed).

==================================================
Get notification to my posts through email? Please refer to
http://msdn.microsoft.com/en-

us/subscriptions/aa948868.aspx#notifications.

Note: The MSDN Managed Newsgroup support offering is for non-urgent issues
where an initial response

from the community or a Microsoft Support
Engineer within 1 business day is acceptable. Please note that each follow
up response may take

approximately 2 business days as the support
professional working with you may need further investigation to reach the
most efficient resolution. The

offering is not appropriate for situations
that require urgent, real-time or phone-based interactions or complex
project analysis and dump analysis

issues. Issues of this nature are best handled working with a dedicated
Microsoft Support Engineer by

contacting Microsoft Customer Support Services (CSS) at

http://support.microsoft.com/select/default.aspx?target=assistance&ln=en-us.
==================================================
This posting is provided "AS IS" with no warranties, and confers no rights.
--------------------
| From: "GaryDean" <[email protected]>
| Subject: html screen scrapper tool?
| Date: Sun, 19 Oct 2008 16:37:06 -0700
| Lines: 7
| X-Priority: 3
| X-MSMail-Priority: Normal
| X-Newsreader: Microsoft Outlook Express 6.00.2900.3028
| X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.3028
| X-RFC2646: Format=Flowed; Original
| Message-ID: <[email protected]>
| Newsgroups: microsoft.public.dotnet.framework.aspnet
| NNTP-Posting-Host: ip68-110-7-92.tc.ph.cox.net 68.110.7.92
| Path: TK2MSFTNGHUB02.phx.gbl!TK2MSFTNGP01.phx.gbl!TK2MSFTNGP03.phx.gbl
| Xref: TK2MSFTNGHUB02.phx.gbl
microsoft.public.dotnet.framework.aspnet:78136
| X-Tomcat-NG: microsoft.public.dotnet.framework.aspnet
|
| Anyone know of html screen scrapper software that will work with .net
| projects. We need to get data back from a gov site that only provides it
on
| a webpage.
| Thanks,
| Gary
|
|
|
 
Allen,
Well, not exactly. What we are trying to do is navigate a website
programmatically. Our research so far has turned up different methods...

1. Using the Webbrowser control with it's properties and methods

2. Creating an InternetExplorer application in memory as so ....
Set uf1_cbutt1_click_ie = CreateObject("InternetExplorer.Application")

With uf1_cbutt1_click_ie
.navigate
"http://www.cfmu.eurocontrol.be/chmi_public/ciahome.jsp?serv1=ifpuvs"
.Visible = True

With .document.ifpuvs
.arcid.Value = "A value here"
.rules.Value = "Another value here etc etc"
.cmd.Click
End With

End With

3. Using the XPath reader or the agility pack. (that reads but I don't know
if we can fill in fields and push buttons with it)
4. Creating WebClient, UTF8Encoding object - DownloadData method.

We haven't been able to find any "best practices" articles on this subject.
It seem there are several different approaches and different people know a
little bit about each one. Which way is best?
Thanks for following up,
Gary
 
Hi Gary,

By saying "navigate a website programmatically" do you mean you want to
create a robot to play with the page in IE? If so it's really a tough work.
If you only want to achieve this requirement you can use some third party
tools such as WatiN:

http://watin.sourceforge.net/index.html

This response contains a reference to a third party World Wide Web site.
Microsoft is providing this information as a convenience to you. Microsoft
does not control these sites and has not tested any software or information
found on these sites; therefore, Microsoft cannot make any representations
regarding the quality, safety, or suitability of any software or
information found there. There are inherent dangers in the use of any
software found on the Internet, and Microsoft cautions you to make sure
that you completely understand the risk before retrieving any software from
the Internet.

Quote from Gary==================================================
We haven't been able to find any "best practices" articles on this subject.
It seem there are several different approaches and different people know a
little bit about each one. Which way is best?
==================================================

It's not that easy to give you an answer as to which way is the best. I
think we'd better leave this topic open because each way deserves
investigation.

Please let me know if you made any progress on this issue.

Regards,
Allen Chen
Microsoft Online Support

--------------------
| From: "GaryDean" <[email protected]>
| References: <[email protected]>
<[email protected]>
| Subject: Re: html screen scrapper tool?
| Date: Thu, 23 Oct 2008 08:59:50 -0700
| Lines: 113
| X-Priority: 3
| X-MSMail-Priority: Normal
| X-Newsreader: Microsoft Outlook Express 6.00.2900.3028
| X-RFC2646: Format=Flowed; Original
| X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.3028
| Message-ID: <[email protected]>
| Newsgroups: microsoft.public.dotnet.framework.aspnet
| NNTP-Posting-Host: ip68-110-7-92.tc.ph.cox.net 68.110.7.92
| Path: TK2MSFTNGHUB02.phx.gbl!TK2MSFTNGP01.phx.gbl!TK2MSFTNGP05.phx.gbl
| Xref: TK2MSFTNGHUB02.phx.gbl
microsoft.public.dotnet.framework.aspnet:78482
| X-Tomcat-NG: microsoft.public.dotnet.framework.aspnet
|
| Allen,
| Well, not exactly. What we are trying to do is navigate a website
| programmatically. Our research so far has turned up different methods...
|
| 1. Using the Webbrowser control with it's properties and methods
|
| 2. Creating an InternetExplorer application in memory as so ....
| Set uf1_cbutt1_click_ie = CreateObject("InternetExplorer.Application")
|
| With uf1_cbutt1_click_ie
| .navigate
| "http://www.cfmu.eurocontrol.be/chmi_public/ciahome.jsp?serv1=ifpuvs"
| .Visible = True
|
| With .document.ifpuvs
| .arcid.Value = "A value here"
| .rules.Value = "Another value here etc etc"
| .cmd.Click
| End With
|
| End With
|
| 3. Using the XPath reader or the agility pack. (that reads but I don't
know
| if we can fill in fields and push buttons with it)
| 4. Creating WebClient, UTF8Encoding object - DownloadData method.
|
| We haven't been able to find any "best practices" articles on this
subject.
| It seem there are several different approaches and different people know
a
| little bit about each one. Which way is best?
| Thanks for following up,
| Gary
|
| | > Hi Gary,
| >
| > Is this issue solved?
| >
| > Regards,
| > Allen Chen
| > Microsoft Online Support
| >
| > Delighting our customers is our #1 priority. We welcome your comments
and
| > suggestions about how we can
| >
| > improve the support we provide to you. Please feel free to let my
manager
| > know what you think of the
| >
| > level of service provided. You can send feedback directly to my manager
| > at:
| > (e-mail address removed).
| >
| > ==================================================
| > Get notification to my posts through email? Please refer to
| > http://msdn.microsoft.com/en-
| >
| > us/subscriptions/aa948868.aspx#notifications.
| >
| > Note: The MSDN Managed Newsgroup support offering is for non-urgent
issues
| > where an initial response
| >
| > from the community or a Microsoft Support
| > Engineer within 1 business day is acceptable. Please note that each
follow
| > up response may take
| >
| > approximately 2 business days as the support
| > professional working with you may need further investigation to reach
the
| > most efficient resolution. The
| >
| > offering is not appropriate for situations
| > that require urgent, real-time or phone-based interactions or complex
| > project analysis and dump analysis
| >
| > issues. Issues of this nature are best handled working with a dedicated
| > Microsoft Support Engineer by
| >
| > contacting Microsoft Customer Support Services (CSS) at
| >
| >
http://support.microsoft.com/select/default.aspx?target=assistance&ln=en-us.
| > ==================================================
| > This posting is provided "AS IS" with no warranties, and confers no
| > rights.
| > --------------------
| > | From: "GaryDean" <[email protected]>
| > | Subject: html screen scrapper tool?
| > | Date: Sun, 19 Oct 2008 16:37:06 -0700
| > | Lines: 7
| > | X-Priority: 3
| > | X-MSMail-Priority: Normal
| > | X-Newsreader: Microsoft Outlook Express 6.00.2900.3028
| > | X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.3028
| > | X-RFC2646: Format=Flowed; Original
| > | Message-ID: <[email protected]>
| > | Newsgroups: microsoft.public.dotnet.framework.aspnet
| > | NNTP-Posting-Host: ip68-110-7-92.tc.ph.cox.net 68.110.7.92
| > | Path: TK2MSFTNGHUB02.phx.gbl!TK2MSFTNGP01.phx.gbl!TK2MSFTNGP03.phx.gbl
| > | Xref: TK2MSFTNGHUB02.phx.gbl
| > microsoft.public.dotnet.framework.aspnet:78136
| > | X-Tomcat-NG: microsoft.public.dotnet.framework.aspnet
| > |
| > | Anyone know of html screen scrapper software that will work with .net
| > | projects. We need to get data back from a gov site that only
provides
| > it
| > on
| > | a webpage.
| > | Thanks,
| > | Gary
| > |
| > |
| > |
| >
|
|
|
 
Back
Top