MSHTML asp.net web application SLOW!

  • Thread starter Thread starter DotNetShadow
  • Start date Start date
D

DotNetShadow

Hi Guys,

I have been reading heaps of information in regards to MSHTML object
being used in .NET applications, windows and UI-less. I have read the
walkall samples and tried various techniques with MSHTML. My biggest
problem seems to be running MSHTML in asp.net application and looping
elements in the document.

Example:

a) Grab site (http://www.amazon.com) into MSHTML htmldocument
b) Loop the element collection ihtmlelementCollection (~1300 elements
for amazon)
c) Accessing properties for each of these nodes such as tagname,
currentstyle.fontsize etc

I have found the following problems:

1) When using Ctype to cast to appropiate types its slower than the
late binding method such as collection(i).tagname

2) Looping 1300 elements takes 5 - 8 seconds ... in the equivalent
VB.NET windows application same code, takes only 0.5 seconds.

The following is the code I use as a sample aspx page:

' ========[ 3 - 4 seconds ]
========================================
Dim htmldocument as New mshtml.htmldocument

Dim doc As mshtml.IHTMLDocument2 =
CType(htmldocument.createDocumentFromUrl(strString, ""),
mshtml.IHTMLDocument2)

Dim timeout As DateTime = Now

Do Until htmldocument.readyState = "complete" Or Now.Subtract
(timeout).TotalSeconds >= 4
System.Threading.Thread.CurrentThread.Sleep(100)
Loop


' ========[ 5 - 8 seconds ]
========================================
Dim mycollection As mshtml.IHTMLElementCollection =
pageDoc.body.all
Dim colLength As Int32 = mycollection.length

Dim result As New StringBuilder("")

Dim a As String
Dim b As String
Dim c As String
Dim d As String
Dim e As String
Dim myInnerText As String

For i As Integer = 0 To colLength - 1

a = mycollection.item(i).tagName
b = mycollection.item(i).currentStyle.fontfamily
c = mycollection.item(i).currentStyle.fontSize
d = mycollection.item(i).currentStyle.fontWeight
e = mycollection.item(i).currentStyle.fontStyle

' Get TAG node

myInnerText = CStr(mycollection.item(i).innerHTML)

result.Append(a)
result.Append(" - ")
result.Append(b)
result.Append(" - ")
result.Append(c)
result.Append(" - ")
result.Append(" - ")
result.Append(d)
result.Append(" - ")
result.Append(e)

Next

Any help would be greatly appreciated on how to solve this annoying
problem to speed it up, I have read marshalling is what could be
killing me but what can I do about it?

Regards DotNetShadow
 
Hi DotNetShadow

When you can use the 2 methods, it is sometimes faster.
I don't know if it works for you but maybe you can give it a try?
Dim mycollection As mshtml.IHTMLElementCollection2 =
pageDoc.body.all

Do you tell us if this was true, I am not sure of this?

Cor
 
Hi Cor,

What two methods you talking about? I figured out the answer in the
end thanks to igor who suggested that marshalling was killing me ...
so what I did was from the asp.net page spawn a new thread that was
STA since MSHTML is a single-threaded component and that dramatically
improved performance. Although I did encounter this strange problem

Dim doc As mshtml.IHTMLDocument2 =
CType(htmldocument.createDocumentFromUrl(strString, ""),
mshtml.IHTMLDocument2)

seems to fail when in STA model... so I had to use the webrequest
object and write the data into the document object such as:

Dim client As New WebClient

' Add a user agent header in case the
' requested URI contains a query.
client.Headers.Add("user-agent", "Mozilla/4.0 (compatible;
MSIE 6.0; Windows NT 5.2; .NET CLR 1.0.3705;)")

Dim data As Stream =
client.OpenRead("http://www.microsoft.com")
Dim reader As New StreamReader(data)
Dim html As String = reader.ReadToEnd()
data.Close()
reader.Close()

Dim doc As New mshtml.HTMLDocument
Dim doc2 As mshtml.IHTMLDocument2 = doc
doc2.open()
doc2.write(html)
doc2.close()

Is there any reason why doing it the original way
htmldocument.createDocumentFromUrl
would cause a NullException error or even have a an empty document
with url as about:blank?

Regards DotNetShadow
 
Hi DotNetShadow,

I also use the webclient for downloading and those things, it is very fast.

For retrieving the information from the document I also use the mshtml.

With the 2 methods I did want to say, that there are a lot of members like
documents2 and navigate2 and so.

The 2 methods are sometimes a lot faster.

I am happy you solved your problem.

Cor
 
Back
Top