HtmlDocument

  • Thread starter Thread starter Le Minh
  • Start date Start date
L

Le Minh

Hi, i want to write a program. Input of this is HTML source code of a web
page and output is a treeview representation it structure.
I want to write it with HtmlDocument in .net framework 2.0. how i write it?
 
Le Minh,

Well, the HtmlDocument is the root. It contains references to all the
elements in the tree. It's just a matter of enumerating those elements and
performing whatever operation you want on them.
 
Hi,

In order to populate an HtmlDocument you need to load your HTML into a WebBrowser control. Then, using the WebBrowser.Document
property, which returns an HtmlDocument, you can iterate over the nodes as Nicholas suggested and populate a TreeView control.
 
can you show me the way, i 'm confuse about this. you can write a small app,
can't you ?
 
can you show me the way, i 'm confuse about this. you can write a small app,
can't you ?
 
Hi Le Minh,

I'm not sure where to begin. Do you have Visual Studio.NET 2005? Are you familiar with WinForms applications and Controls? Are
you familiar with the TreeView or WebBrowser controls? Are you familiar with handling events? If you don't understand some or any
of the above questions then I'm probably not going to be able to help you short of writing the entire application, which I won't do,
but I'll try to give you some guidance.

That said, here's the general idea if you're using Visual Studio.NET (any edition, including Express):

1. Create a new Windows application project.
2. Add a WebBrowser control from the toolbox onto Form1. (You can position it however you'd like)
3. Add a TreeView control from the toolbox onto Form1. (You can position it however you'd like)
4. Set the WebBrowser.Url property to the url of your html document (this is the easiest way to load the document).
5. Create an event handler for the WebBrowser.DocumentCompleted event. It should look like the following:

private void webBrowser1_DocumentCompleted(object sender,
WebBrowserDocumentCompletedEventArgs e)
{
// make sure that the TreeView is cleared in case the browser navigates to another url
treeView1.Nodes.Clear();

// create the root node for the TreeView, which will contain all other nodes
TreeNode rootNode = treeView1.Nodes.Add("Root");

// Fill the TreeView, recursively, starting from the root node
FillTreeViewRecursively(rootNode, webBrowser1.Document.All);
}

6. Create the FillTreeViewRecursively method:

private void FillTreeViewRecursively(TreeNode currentNode, HtmlElementCollection elements)
{
// loop through the specified collection of elements and create their respective nodes
foreach (HtmlElement element in elements)
{
// create a new node for the current element and add it under the currentNode
TreeNode node = currentNode.Nodes.Add(element.TagName);

// optional: store a reference to the element that this node represents
node.Tag = element;

// create the nodes under this node for the elements contained by the current element
FillTreeViewRecursively(node, element.All);
}
}


Please be aware that I didn't try to build this code and so it will probably need some modifications.
 
I had tried it, but there's something not good. It's seem the value is
dupilcate in the tree.
I think is may be the recursively function. Why do need use recursively
function here ? is there any way ?
 
Hi Le Minh,

As I mentioned, I didn't test that code. You might have to make some modifications.

The recursive method is necessary because you can't predict how deep the tree structure is before it needs to be parsed. Your other
option is to hard-code a fixed number of loops to allow only a certain level of deepness to be parsed. Not very dynamic though.

What value is being duplicated in the tree?
 
Le Minh,

XML is created (invented) to overcome the problems with the non structured
format of HTML.

It was and is impossible to use HTML code in a structured way. Therefore I
am curious a little bit more what you want to do? In the way you ask this,
there can in my idea be no answer.

Cor
 
Back
Top