IE DOM under .NET

  • Thread starter Thread starter Leszek Taratuta
  • Start date Start date
L

Leszek Taratuta

Hello,

I am looking for a method to access the Internet Explorer DOM. The idea is
as follows:

1. Download an HTML file using System.Net.WebClient()
2. Put the file into a string.
3. Parse the string using IE DOM.
4. Extract some interesting tags (for example anchors)

It is feasible?

Thanks for any hints,
Leszek Taratuta
 
what i would go with is using the web.request method instead
and a regular expression to blast thru the file picking out what you want.
Regexlib.com can guide you there.
 
I was considering regular expressions but I would like to have a more
object-oriented approach. Something like this: Page --> Anchors
collections --> Anchor attributes

The IE parses web pages and creates DOM anyway. Maybe it exposes this
functionality to other applications, so I could access the web page parse
tree and traverse it in an object-oriented fashion?

Any other suggestions?
Thanks,
Leszek

Alvin Bruney said:
what i would go with is using the web.request method instead
and a regular expression to blast thru the file picking out what you want.
Regexlib.com can guide you there.

--
Regards,
Alvin Bruney [ASP.NET MVP]
Got tidbits? Get it here...
http://tinyurl.com/27cok
Leszek Taratuta said:
Hello,

I am looking for a method to access the Internet Explorer DOM. The idea is
as follows:

1. Download an HTML file using System.Net.WebClient()
2. Put the file into a string.
3. Parse the string using IE DOM.
4. Extract some interesting tags (for example anchors)

It is feasible?

Thanks for any hints,
Leszek Taratuta
 
John Saunders wrote "Why not let IE download the file and parse it?"

This is what I am trying to achieve: let IE parse a web page
programmatically.

Any code snippets?

Leszek Taratuta
 
Leszek Taratuta said:
John Saunders wrote "Why not let IE download the file and parse it?"

This is what I am trying to achieve: let IE parse a web page
programmatically.

Any code snippets?

Not really. It's a fair amount of tedious code. The main thing is to drag
the IE ActiveX control onto a designer surface (I used Windows Forms), then
hook up the events. You don't want to reference the DOM until you receive
the Downloaded event, if I remember right. I believe you use the Navigate
(or is it Navigate2) API to tell IE where to go.

I remember I once created a web browser in VB6 in less than 15 minutes, just
to prove it was possible. Should be easy enough in .NET.
 
Thanks. It's a good place to start.

Leszek

John Saunders said:
Not really. It's a fair amount of tedious code. The main thing is to drag
the IE ActiveX control onto a designer surface (I used Windows Forms), then
hook up the events. You don't want to reference the DOM until you receive
the Downloaded event, if I remember right. I believe you use the Navigate
(or is it Navigate2) API to tell IE where to go.

I remember I once created a web browser in VB6 in less than 15 minutes, just
to prove it was possible. Should be easy enough in .NET.
 
Approach A:

1. Download the HTML site.
2 .Run it through some code that converts HTML into XHTML tags (code
examples can be found at http://www.nikhilk.net or the Writer.NET workspace
at GotDotNet).
3. Parse the file using System.Xml.

Approach B:

Another option would be writing your own HTML parser :-(

Approach C:

Take a look at Regular Expression matching (which is supported in the .NET
Framework lib).
 
Back
Top