B
Bryan D.
My C# application is currently using the WebBrowser Control and the
MSHTML library to walk the HTML DOM of documents and pull out
information of certain tags that it finds. I've found this to be an
extremely slow process in C# and have found references in this
newsgroup to the fact that this is a known issue with .NET
marshalling.
This post at this address:
http://groups.google.com/groups?hl=en&lr=&ie=UTF-8&oe=UTF-8&selm=u$FJlRgpBHA.2180@tkmsftngp07,
indicates that the performance is MUCH better in C++. I'm interested
in trying to build some unmanaged C++ to parse the HTML DOM and call
back into my managed code after parsing.
However, the code that I have written in C++ is just as slow as the
code in C#. I was wondering, does anyone have code that proves the
point that HTML DOM parsing is "instantaneous" in C++? I would really
appreciate any tips, code snippets, or links to example code.
Thank you very much,
Bryan
PS - Below is the C++ code I'm currently using to try this out, it
takes about 3 seconds!!! on a medium-small size document.
----------------- snip -----------------------------
#include "ole2.h"
#include <iostream>
#import <shdocvw.dll>
#import <mshtml.tlb>
void WalkChildElements( MSHTML::IHTMLElementPtr element )
{
MSHTML::IHTMLElementCollectionPtr children;
IDispatch* pDisp;
element->get_children(&pDisp);
pDisp->QueryInterface(&children);
long length;
children->get_length(&length);
for( int i = 0; i < length; i++ )
{
MSHTML::IHTMLElementPtr child;
child = children->item( (long)i, (long)i );
WalkChildElements(child);
}
}
int _tmain(int argc, _TCHAR* argv[])
{
CoInitialize(0);
{
SHDocVw::IWebBrowser2Ptr
pIE(__uuidof(SHDocVw::InternetExplorer));
MSHTML::IHTMLDocument2Ptr pHTMLDoc;
MSHTML::IHTMLDocument3Ptr pHTMLDoc3;
pIE->Visible = true;
pIE->Navigate("file://c:/tmp.html");
while(pIE->GetBusy())
Sleep(100);
pIE->GetDocument()->QueryInterface(&pHTMLDoc);
pIE->GetDocument()->QueryInterface(&pHTMLDoc3);
MSHTML::IHTMLElementPtr element;
pHTMLDoc3->get_documentElement(&element);
std::cout << "Begin\n";
WalkChildElements( element );
std::cout << "Done\n";
}
CoUninitialize();
}
MSHTML library to walk the HTML DOM of documents and pull out
information of certain tags that it finds. I've found this to be an
extremely slow process in C# and have found references in this
newsgroup to the fact that this is a known issue with .NET
marshalling.
This post at this address:
http://groups.google.com/groups?hl=en&lr=&ie=UTF-8&oe=UTF-8&selm=u$FJlRgpBHA.2180@tkmsftngp07,
indicates that the performance is MUCH better in C++. I'm interested
in trying to build some unmanaged C++ to parse the HTML DOM and call
back into my managed code after parsing.
However, the code that I have written in C++ is just as slow as the
code in C#. I was wondering, does anyone have code that proves the
point that HTML DOM parsing is "instantaneous" in C++? I would really
appreciate any tips, code snippets, or links to example code.
Thank you very much,
Bryan
PS - Below is the C++ code I'm currently using to try this out, it
takes about 3 seconds!!! on a medium-small size document.
----------------- snip -----------------------------
#include "ole2.h"
#include <iostream>
#import <shdocvw.dll>
#import <mshtml.tlb>
void WalkChildElements( MSHTML::IHTMLElementPtr element )
{
MSHTML::IHTMLElementCollectionPtr children;
IDispatch* pDisp;
element->get_children(&pDisp);
pDisp->QueryInterface(&children);
long length;
children->get_length(&length);
for( int i = 0; i < length; i++ )
{
MSHTML::IHTMLElementPtr child;
child = children->item( (long)i, (long)i );
WalkChildElements(child);
}
}
int _tmain(int argc, _TCHAR* argv[])
{
CoInitialize(0);
{
SHDocVw::IWebBrowser2Ptr
pIE(__uuidof(SHDocVw::InternetExplorer));
MSHTML::IHTMLDocument2Ptr pHTMLDoc;
MSHTML::IHTMLDocument3Ptr pHTMLDoc3;
pIE->Visible = true;
pIE->Navigate("file://c:/tmp.html");
while(pIE->GetBusy())
Sleep(100);
pIE->GetDocument()->QueryInterface(&pHTMLDoc);
pIE->GetDocument()->QueryInterface(&pHTMLDoc3);
MSHTML::IHTMLElementPtr element;
pHTMLDoc3->get_documentElement(&element);
std::cout << "Begin\n";
WalkChildElements( element );
std::cout << "Done\n";
}
CoUninitialize();
}