HtmlDocument

Le Minh · Sep 26, 2006

Hi, i want to write a program. Input of this is HTML source code of a web
page and output is a treeview representation it structure.
I want to write it with HtmlDocument in .net framework 2.0. how i write it?

Nicholas Paldino [.NET/C# MVP] · Sep 26, 2006

Le Minh,

Well, the HtmlDocument is the root. It contains references to all the
elements in the tree. It's just a matter of enumerating those elements and
performing whatever operation you want on them.

Dave Sexton · Sep 26, 2006

Hi,

In order to populate an HtmlDocument you need to load your HTML into a WebBrowser control. Then, using the WebBrowser.Document
property, which returns an HtmlDocument, you can iterate over the nodes as Nicholas suggested and populate a TreeView control.

Le Minh · Sep 26, 2006

can you show me the way, i 'm confuse about this. you can write a small app,
can't you ?

Le Minh · Sep 26, 2006

can you show me the way, i 'm confuse about this. you can write a small app,
can't you ?

Dave Sexton · Sep 26, 2006

Hi Le Minh,

I'm not sure where to begin. Do you have Visual Studio.NET 2005? Are you familiar with WinForms applications and Controls? Are
you familiar with the TreeView or WebBrowser controls? Are you familiar with handling events? If you don't understand some or any
of the above questions then I'm probably not going to be able to help you short of writing the entire application, which I won't do,
but I'll try to give you some guidance.

That said, here's the general idea if you're using Visual Studio.NET (any edition, including Express):

1. Create a new Windows application project.
2. Add a WebBrowser control from the toolbox onto Form1. (You can position it however you'd like)
3. Add a TreeView control from the toolbox onto Form1. (You can position it however you'd like)
4. Set the WebBrowser.Url property to the url of your html document (this is the easiest way to load the document).
5. Create an event handler for the WebBrowser.DocumentCompleted event. It should look like the following:

private void webBrowser1_DocumentCompleted(object sender,
WebBrowserDocumentCompletedEventArgs e)
{
// make sure that the TreeView is cleared in case the browser navigates to another url
treeView1.Nodes.Clear();

// create the root node for the TreeView, which will contain all other nodes
TreeNode rootNode = treeView1.Nodes.Add("Root");

// Fill the TreeView, recursively, starting from the root node
FillTreeViewRecursively(rootNode, webBrowser1.Document.All);
}

6. Create the FillTreeViewRecursively method:

private void FillTreeViewRecursively(TreeNode currentNode, HtmlElementCollection elements)
{
// loop through the specified collection of elements and create their respective nodes
foreach (HtmlElement element in elements)
{
// create a new node for the current element and add it under the currentNode
TreeNode node = currentNode.Nodes.Add(element.TagName);

// optional: store a reference to the element that this node represents
node.Tag = element;

// create the nodes under this node for the elements contained by the current element
FillTreeViewRecursively(node, element.All);
}
}

Please be aware that I didn't try to build this code and so it will probably need some modifications.

Le Minh · Sep 26, 2006

thanks for your help. Let me try!

Le Minh · Sep 26, 2006

I had tried it, but there's something not good. It's seem the value is
dupilcate in the tree.
I think is may be the recursively function. Why do need use recursively
function here ? is there any way ?

Dave Sexton · Sep 26, 2006

Hi Le Minh,

As I mentioned, I didn't test that code. You might have to make some modifications.

The recursive method is necessary because you can't predict how deep the tree structure is before it needs to be parsed. Your other
option is to hard-code a fixed number of loops to allow only a certain level of deepness to be parsed. Not very dynamic though.

What value is being duplicated in the tree?

Cor Ligthert [MVP] · Sep 27, 2006

Le Minh,

XML is created (invented) to overcome the problems with the non structured
format of HTML.

It was and is impossible to use HTML code in a structured way. Therefore I
am curious a little bit more what you want to do? In the way you ask this,
there can in my idea be no answer.

Cor

HtmlDocument like class for HtmlReader/	2	Jul 30, 2008
Retrieving the COM class factory for component with CLSID{} failed	2	Dec 22, 2008
"source view" of current page in IE with c# help...	8	May 2, 2009
load HTML from file into HTMLDocument object?	0	May 31, 2006
construct HTMLDocument	1	Jul 11, 2003
httpWebRequest result to HtmlDocument	2	May 6, 2008
Using a System.Windows.Forms.WebBrowser inside ASP web form	1	Nov 5, 2008
HTMLDocument , Access denied - URGENT !!!	2	Jul 11, 2003

HtmlDocument

Le Minh

Nicholas Paldino [.NET/C# MVP]

Dave Sexton

Le Minh

Le Minh

Dave Sexton

Le Minh

Le Minh

Dave Sexton

Cor Ligthert [MVP]

Ask a Question

Similar Threads