MSHTML asp.net web application SLOW

  • Thread starter Thread starter DotNetShadow
  • Start date Start date
D

DotNetShadow

Hi Guys,

I have been reading heaps of information in regards to MSHTML object
being used in .NET applications, windows and UI-less. I have read the
walkall samples and tried various techniques with MSHTML. My biggest
problem seems to be running MSHTML in asp.net application and looping
elements in the document.

Example:

a) Grab site (http://www.amazon.com) into MSHTML htmldocument
b) Loop the element collection ihtmlelementCollection (~1300 elements
for amazon)
c) Accessing properties for each of these nodes such as tagname,
currentstyle.fontsize etc

I have found the following problems:

1) When using Ctype to cast to appropiate types its slower than the
late binding method such as collection(i).tagname

2) Looping 1300 elements takes 5 - 8 seconds ... in the equivalent
VB.NET windows application same code, takes only 0.5 seconds.

The following is the code I use as a sample aspx page:

' ========[ 3 - 4 seconds ]
========================================
Dim htmldocument as New mshtml.htmldocument

Dim doc As mshtml.IHTMLDocument2 =
CType(htmldocument.createDocumentFromUrl(strString, ""),
mshtml.IHTMLDocument2)

Dim timeout As DateTime = Now

Do Until htmldocument.readyState = "complete" Or Now.Subtract
(timeout).TotalSeconds >= 4
System.Threading.Thread.CurrentThread.Sleep(100)
Loop


' ========[ 5 - 8 seconds ]
========================================
Dim mycollection As mshtml.IHTMLElementCollection =
pageDoc.body.all
Dim colLength As Int32 = mycollection.length

Dim result As New StringBuilder("")

Dim a As String
Dim b As String
Dim c As String
Dim d As String
Dim e As String
Dim myInnerText As String

For i As Integer = 0 To colLength - 1

a = mycollection.item(i).tagName
b = mycollection.item(i).currentStyle.fontfamily
c = mycollection.item(i).currentStyle.fontSize
d = mycollection.item(i).currentStyle.fontWeight
e = mycollection.item(i).currentStyle.fontStyle

' Get TAG node

myInnerText = CStr(mycollection.item(i).innerHTML)

result.Append(a)
result.Append(" - ")
result.Append(b)
result.Append(" - ")
result.Append(c)
result.Append(" - ")
result.Append(" - ")
result.Append(d)
result.Append(" - ")
result.Append(e)

Next

Any help would be greatly appreciated on how to solve this annoying
problem to speed it up, I have read marshalling is what could be
killing me but what can I do about it?

Regards DotNetShadow
 
Hi Natty,

The major problem I found was solved by making the asp.net call a thread
that was STA since MSHTML is a single threaded component.... so ur
suggestion was partially right... although I did encounter this strange
problem

Dim doc As mshtml.IHTMLDocument2 =
CType(htmldocument.createDocumentFromUrl(strString, ""),
mshtml.IHTMLDocument2)

seems to fail when in STA model... so I had to use the webrequest
object and write the data into the document object such as:

Dim client As New WebClient

' Add a user agent header in case the
' requested URI contains a query.
client.Headers.Add("user-agent", "Mozilla/4.0 (compatible;
MSIE 6.0; Windows NT 5.2; .NET CLR 1.0.3705;)")

Dim data As Stream =
client.OpenRead("http://www.microsoft.com")
Dim reader As New StreamReader(data)
Dim html As String = reader.ReadToEnd()
data.Close()
reader.Close()

Dim doc As New mshtml.HTMLDocument
Dim doc2 As mshtml.IHTMLDocument2 = doc
doc2.open()
doc2.write(html)
doc2.close()

Is there any reason why doing it the original way
htmldocument.createDocumentFromUrl
would cause a NullException error or even have a an empty document
with url as about:blank?

Regards DotNetShadow
 
Back
Top