UCOMIStream, MSHTML and WebBrowser control Persistence Problem

  • Thread starter Thread starter Charles Law
  • Start date Start date
C

Charles Law

Sorry for the cross post, but I'm not sure who is best placed to answer this
one.

This is the most bizarre behaviour of MSHTML and streams.

I have a WebBrowser control that contains nothing but some default HTML. I
want to copy the document and modify it before saving it to disk.

So, I clone the document like this:

<code>
Private Function CloneDocument(ByVal doc As mshtml.IHTMLDocument2) As
mshtml.IHTMLDocument2

Dim newdoc2 As mshtml.IHTMLDocument2
Dim ips As IPersistStreamInit

Dim strm As UCOMIStream

Dim source As String

' Create and initialise a new document object
newdoc2 = New mshtml.HTMLDocument

ips = DirectCast(newdoc2, IPersistStreamInit)

ips.InitNew()

Do Until newdoc2.readyState = "complete"
Application.DoEvents()
Loop

' Get the current document HTML as a stream
source = GetDocumentSource(AxWebBrowser1.Document) ' see below
strm = GetStream(source) ' see below

' Load the new document from the stream
ips.Load(strm)

' Wait until the new document has settled
Do Until newdoc2.readyState = "complete"
Application.DoEvents()
Loop

Return newdoc2

End Function

Private Function GetStream(ByVal s As String) As UCOMIStream

Dim iptr As IntPtr
Dim istrm As UCOMIStream

' Get a pointer to the string
iptr = Marshal.StringToHGlobalAuto(s)

' Create the stream from the pointer
CreateStreamOnHGlobal(iptr, True, istrm)

Return istrm

End Function
Private Function GetStream(ByVal size As Integer) As UCOMIStream

Dim iptr As IntPtr
Dim strm As UCOMIStream

' Create a pointer to a block of the required size
iptr = Marshal.AllocHGlobal(size)

' Create the stream from the pointer
CreateStreamOnHGlobal(iptr, True, strm)

Return strm

End Function

Private Function GetDocumentSource(ByVal doc As mshtml.IHTMLDocument2,
ByVal resetIsDirty As Boolean) As String

Dim stream As UCOMIStream

Dim ips As IPersistStreamInit

Dim s As String

ips = DirectCast(doc, IPersistStreamInit)

If ips Is Nothing Then
s = Nothing
Else
stream = GetStream(2048)

' Save the document into the comstream, without clearing the
IsDirty flag
ips.Save(stream, resetIsDirty)

s = StreamToString(stream)
End If

Return s

End Function

Private Function StreamToString(ByVal strm As UCOMIStream) As String

Dim iptr As IntPtr

Dim s As String

GetHGlobalFromStream(strm, iptr)

' *** THIS IS ODD TOO ***
' If the source is the WebBrowser control then Ansi must be used ***
s = Marshal.PtrToStringAnsi(iptr)

' If the source is the cloned and modified document then Auto must
be used ***
s = Marshal.PtrToStringAuto(iptr)

' ***

Return s

End Function
</code>

Having cloned the document, I modify it by inserting some tags into the body
element. I then call GetDocumentSource() to get the HTML from the cloned
document.

However, the HTML that is returned is the original HTML, from the WebBrowser
control, and not from the cloned document. I know that the cloned document
contains the correct HTML because if I execute the following for the cloned
document

?doc.all.tags("html").Item(0).outerhtml

in the command window, I get what I expect in the body element.

Can anyone suggest why this is happening?

I have also highlighted an oddity in function StreamToString(), which I do
not understand. Why would the encoding of the HTML change between the
browser and a cloned document?

TIA

Charles
 
Hi Cor

I don't use that because the outerHtml of the HTML tag doesn't include the
<!DOCTYPE tag at the top, so doesn't represent the complete document.

Charles
 
Hi Charles,

Have a look for this

mshtml.HTMLCommentElementClass

(I think the rest you know yourself but if you want more information tell
me).

I had still something to do for you.

Cor
 
Hi Cor

I can see that I could possibly retrieve the <!DOCTYPE tag info, but that
relies on me knowing that it is there to be retrieved. I could look for it
anyway, but then what else might there be for me to retrieve that I don't
know about, or am not expecting?

The HTML outerHTML is not designed to return the entire contents of the
document, and I don't know of a definition that states that I will get the
entire document if I retrieve the HTML outerHTML and all preceding comments.
The IPersistStreamInit interface, on the other hand, is designed to maintain
the entire document, through the Load and Save methods, so I feel that I
should persevere with this for now.

I appreciate your comments, though, but I am still perplexed why my
technique is not working. It is as though the IPersistStreamInit interface
is always going to the WebBrowser control document, even when I direct it to
my cloned document. Alternatively, could there be something wrong with the
way in which I am manipulating the stream? Sadly, I just can't see it.

Regards

Charles
 
Charels,

I have seen several threads where it appears you have been sucesfull loading
the WebBrowser control from a stream in the DocumentComplete event from
vb.net. I have been struggeling with this myself and was hoping you might
illuminate me.

Bottom line is that when I try to load from a stream, no error is thrown but
I seem to go into a infinite event loop where DocumentComplete keeps being
fired.

Any help would be appreciated.

Thanks,

Ken

Here are the bits that appear to be important:

Dim TrendHTML As Stream =
System.Reflection.Assembly.GetExecutingAssembly().GetManifestResourceStream(
"CTCTrend.TrendGraphics.htm")

....

Private Sub WebBrowser1_DocumentComplete(ByVal eventSender As System.Object,
ByVal eventArgs As AxSHDocVw.DWebBrowserEvents2_DocumentCompleteEvent)
Handles WebBrowser1.DocumentComplete

Dim bResult As Boolean

Dim strm As UCOMIStream

Dim objPersistStreamInit As IPersistStreamInit

Dim hGlobal As IntPtr

Dim strHTML As String

Dim objSR As New StreamReader(TrendHTML)

Try

objDocument = WebBrowser1.Document

strHTML = objSR.ReadToEnd() 'objSR is a stream reader initialized durring
form load

hGlobal = Marshal.StringToHGlobalAnsi(strHTML)

CreateStreamOnHGlobal(hGlobal, True, strm)

objPersistStreamInit = DirectCast(objDocument, IPersistStreamInit)

objPersistStreamInit.InitNew()

objPersistStreamInit.Load(strm)

Catch ex As Exception

msgbox ex.Mesage

End Try

End Sub
 
Hi Ken

It looks to me that your technique is correct. The problem, I suspect, is
where you load the document from the stream. Since documents are loaded
asynchronously, the logic of the load process is

1. Put html into stream
2. Load document from stream
3. Wait for operation to complete
4. Continue ...

Step 3 means waiting for the readyState to become 'complete', and when that
happens the DocumentComplete event fires. As you are loading the document in
the DocumentComplete event, you will get into a continuous loop.

I would suggest that you either load the document outside the
DocumentComplete event (preferred), or set a flag in the event, before you
load, to say that you are currently loading. This way you can avoid
reloading the document.

The preferred way would be to wait for the browser initialisation to
complete (that is, it finishes navigating to about:blank, which you must
instruct it to do), and then load the document, waiting for readyState
before continuing

HTH

Charles
 
Charles,

Thanks for the advice. Per your suggestion I moved the load from stream
logic to the form load event and it worked great. After looking a little
closer I saw that the C++ example that Microsoft shows avoids the infite
event firing loop by checking the interface pointer of the document to make
sure it is the one you want, this is difficult to do in .NET and not worth
the effort since it works as seen below.

Unfortantely I Now I have a new problem. It appears that I can only load
once from the stream. If I let the dialog form unload and release object
accordingly a second attempt at loading the document seems to succeed but
objects in the document are not available as they were the first time.

Are you familiar with any thing I need to explicity clean up to avoid
confusing COM/WebBrowser interface?

Thanks much for your help.

- Ken

In Form load:

objDocument = WebBrowser1.Document

strHTML = objSR.ReadToEnd()

hGlobal = Marshal.StringToHGlobalAnsi(strHTML)

CreateStreamOnHGlobal(hGlobal, True, strm)

objPersistStreamInit = DirectCast(objDocument, IPersistStreamInit)

objPersistStreamInit.InitNew()

objPersistStreamInit.Load(strm)

Do While WebBrowser1.ReadyState <> tagREADYSTATE.READYSTATE_COMPLETE

Application.DoEvents()

Loop

objDocument = WebBrowser1.Document

....

Subsequent call to vbscript function in document, works once, does not work
on subsequent doucment loads?

Call objDocument.Script.PopupDrawXYPlot(strXMLData, lngCurveFitStyles)
 
Ken

Although probably not the problem, it looks like you might be caching the
document object. It is best to retrieve it from the control each time you
need a reference.

When you talk about 'dialog form unload', do you mean that you close the
form that hosts the browser control and then re-open it? The browser control
needs to be disposed explicitly, so you should have a call to Dispose when
you close the form.

You say that objects are not available on subsequent occasions. Do you get
an error? When you load the form, I do not see a Navigate("about:blank").
This is required to initialise the control before you do anything else, and
then you must wait for DocumentComplete (readyState = "complete") before you
continue.

HTH

Charles
 
Charles,

Actually I had chopped off the top of the Form Load script which correctly
did the navigate. As it turns out the problem does not seem to have
anything to do with loading from a stream. If I navigate to a file
containing the html directly instead of loading from a stream I get the same
behaivour.

Basically I can call into the DHTML only after the first navigate, if the
web control is disposed and the form is released (but the app still runs of
course) and the the form and webbrowser is re-instantiated and navigates to
the same html I cannot call into the same dhtml script function. Something
must be left hanging around but I can't figure out what it is.

If you have any thoughts please let me know. I may play around with a
simpler dhtml sample to see if it has something to do with my script,
however in vb6 calling the same dhtml function gives me no problems. I am
beginning to suspect there is something broken in the interop regarding
support for this type script calling.

Thanks for all your help.

- Ken

Call WebBrowser1.Script.PopupDrawXYPlot(strXMLData, lngCurveFitStyles)

Works once on first instance, after second time the WebBrowser is
instantiated it fails, saying it is unable to find PopupDrawXYPlot, even
though the innerHTML of the document shows it has loaded correctly.
 
Charles,

I can recreate the problem very easily:

Create a simple vb winform app with two forms, the first form will
showdialog the second from from a command button. The second form has the
followng load event

AxWebBrowser1.Navigate2("C:\Capstone\Applications\TestApp\WindowsApplication
1\test.htm")

Do While AxWebBrowser1.ReadyState <>
SHDocVw.tagREADYSTATE.READYSTATE_COMPLETE

Application.DoEvents()

Loop

Call AxWebBrowser1.Document.Script.evaluate(0.5)

--------------------------------

The test.htm file contains the following:

<HTML>
<HEAD>
<TITLE>Evaluate</TITLE>

<SCRIPT>
function evaluate(x)
{
alert("hello")
return eval(x)
}
</SCRIPT>
</HEAD>

<BODY>
</BODY>
</HTML>

The dialog form will work once, Close and re-open, the second time it is
instantiated the call to evaluate fails. I will be calling Microsoft
tommorow.

- Ken
 
Hi Charles,

I use even a counter for this started and complete and when it is zero I
know that it is ready.

(And a timer to stop it when there is a strange page in it which started but
will never download).

Cor
 
Ken

Unfortunately, I cannot get your example to run. Firstly, I get an 'object
variable not set' error when trying to access AxWebBrowser1.Document. I
would always recommend a navigate to "about:blank" before anything else,
even if you go on to navigate somewhere else afterwards. In ensures that the
control is properly initialised.

After adding the extra navigate to the form load, I get an 'Unknown error'
in mscorlib when trying to run the script. Is there any other code missed
out?

Also, just as an aside, you may prefer to turn Option Strict On, to ensure
that all calls and bindings are correct.

HTH

Charles
 
Back
Top