D
DotNetShadow
Hi Guys,
I have been reading heaps of information in regards to MSHTML object
being used in .NET applications, windows and UI-less. I have read the
walkall samples and tried various techniques with MSHTML. My biggest
problem seems to be running MSHTML in asp.net application and looping
elements in the document.
Example:
a) Grab site (http://www.amazon.com) into MSHTML htmldocument
b) Loop the element collection ihtmlelementCollection (~1300 elements
for amazon)
c) Accessing properties for each of these nodes such as tagname,
currentstyle.fontsize etc
I have found the following problems:
1) When using Ctype to cast to appropiate types its slower than the
late binding method such as collection(i).tagname
2) Looping 1300 elements takes 5 - 8 seconds ... in the equivalent
VB.NET windows application same code, takes only 0.5 seconds.
The following is the code I use as a sample aspx page:
' ========[ 3 - 4 seconds ]
========================================
Dim htmldocument as New mshtml.htmldocument
Dim doc As mshtml.IHTMLDocument2 =
CType(htmldocument.createDocumentFromUrl(strString, ""),
mshtml.IHTMLDocument2)
Dim timeout As DateTime = Now
Do Until htmldocument.readyState = "complete" Or Now.Subtract
(timeout).TotalSeconds >= 4
System.Threading.Thread.CurrentThread.Sleep(100)
Loop
' ========[ 5 - 8 seconds ]
========================================
Dim mycollection As mshtml.IHTMLElementCollection =
pageDoc.body.all
Dim colLength As Int32 = mycollection.length
Dim result As New StringBuilder("")
Dim a As String
Dim b As String
Dim c As String
Dim d As String
Dim e As String
Dim myInnerText As String
For i As Integer = 0 To colLength - 1
a = mycollection.item(i).tagName
b = mycollection.item(i).currentStyle.fontfamily
c = mycollection.item(i).currentStyle.fontSize
d = mycollection.item(i).currentStyle.fontWeight
e = mycollection.item(i).currentStyle.fontStyle
' Get TAG node
myInnerText = CStr(mycollection.item(i).innerHTML)
result.Append(a)
result.Append(" - ")
result.Append(b)
result.Append(" - ")
result.Append(c)
result.Append(" - ")
result.Append(" - ")
result.Append(d)
result.Append(" - ")
result.Append(e)
Next
Any help would be greatly appreciated on how to solve this annoying
problem to speed it up, I have read marshalling is what could be
killing me but what can I do about it?
Regards DotNetShadow
I have been reading heaps of information in regards to MSHTML object
being used in .NET applications, windows and UI-less. I have read the
walkall samples and tried various techniques with MSHTML. My biggest
problem seems to be running MSHTML in asp.net application and looping
elements in the document.
Example:
a) Grab site (http://www.amazon.com) into MSHTML htmldocument
b) Loop the element collection ihtmlelementCollection (~1300 elements
for amazon)
c) Accessing properties for each of these nodes such as tagname,
currentstyle.fontsize etc
I have found the following problems:
1) When using Ctype to cast to appropiate types its slower than the
late binding method such as collection(i).tagname
2) Looping 1300 elements takes 5 - 8 seconds ... in the equivalent
VB.NET windows application same code, takes only 0.5 seconds.
The following is the code I use as a sample aspx page:
' ========[ 3 - 4 seconds ]
========================================
Dim htmldocument as New mshtml.htmldocument
Dim doc As mshtml.IHTMLDocument2 =
CType(htmldocument.createDocumentFromUrl(strString, ""),
mshtml.IHTMLDocument2)
Dim timeout As DateTime = Now
Do Until htmldocument.readyState = "complete" Or Now.Subtract
(timeout).TotalSeconds >= 4
System.Threading.Thread.CurrentThread.Sleep(100)
Loop
' ========[ 5 - 8 seconds ]
========================================
Dim mycollection As mshtml.IHTMLElementCollection =
pageDoc.body.all
Dim colLength As Int32 = mycollection.length
Dim result As New StringBuilder("")
Dim a As String
Dim b As String
Dim c As String
Dim d As String
Dim e As String
Dim myInnerText As String
For i As Integer = 0 To colLength - 1
a = mycollection.item(i).tagName
b = mycollection.item(i).currentStyle.fontfamily
c = mycollection.item(i).currentStyle.fontSize
d = mycollection.item(i).currentStyle.fontWeight
e = mycollection.item(i).currentStyle.fontStyle
' Get TAG node
myInnerText = CStr(mycollection.item(i).innerHTML)
result.Append(a)
result.Append(" - ")
result.Append(b)
result.Append(" - ")
result.Append(c)
result.Append(" - ")
result.Append(" - ")
result.Append(d)
result.Append(" - ")
result.Append(e)
Next
Any help would be greatly appreciated on how to solve this annoying
problem to speed it up, I have read marshalling is what could be
killing me but what can I do about it?
Regards DotNetShadow