C
Charles Law
Sorry for the cross post, but I'm not sure who is best placed to answer this
one.
This is the most bizarre behaviour of MSHTML and streams.
I have a WebBrowser control that contains nothing but some default HTML. I
want to copy the document and modify it before saving it to disk.
So, I clone the document like this:
<code>
Private Function CloneDocument(ByVal doc As mshtml.IHTMLDocument2) As
mshtml.IHTMLDocument2
Dim newdoc2 As mshtml.IHTMLDocument2
Dim ips As IPersistStreamInit
Dim strm As UCOMIStream
Dim source As String
' Create and initialise a new document object
newdoc2 = New mshtml.HTMLDocument
ips = DirectCast(newdoc2, IPersistStreamInit)
ips.InitNew()
Do Until newdoc2.readyState = "complete"
Application.DoEvents()
Loop
' Get the current document HTML as a stream
source = GetDocumentSource(AxWebBrowser1.Document) ' see below
strm = GetStream(source) ' see below
' Load the new document from the stream
ips.Load(strm)
' Wait until the new document has settled
Do Until newdoc2.readyState = "complete"
Application.DoEvents()
Loop
Return newdoc2
End Function
Private Function GetStream(ByVal s As String) As UCOMIStream
Dim iptr As IntPtr
Dim istrm As UCOMIStream
' Get a pointer to the string
iptr = Marshal.StringToHGlobalAuto(s)
' Create the stream from the pointer
CreateStreamOnHGlobal(iptr, True, istrm)
Return istrm
End Function
Private Function GetStream(ByVal size As Integer) As UCOMIStream
Dim iptr As IntPtr
Dim strm As UCOMIStream
' Create a pointer to a block of the required size
iptr = Marshal.AllocHGlobal(size)
' Create the stream from the pointer
CreateStreamOnHGlobal(iptr, True, strm)
Return strm
End Function
Private Function GetDocumentSource(ByVal doc As mshtml.IHTMLDocument2,
ByVal resetIsDirty As Boolean) As String
Dim stream As UCOMIStream
Dim ips As IPersistStreamInit
Dim s As String
ips = DirectCast(doc, IPersistStreamInit)
If ips Is Nothing Then
s = Nothing
Else
stream = GetStream(2048)
' Save the document into the comstream, without clearing the
IsDirty flag
ips.Save(stream, resetIsDirty)
s = StreamToString(stream)
End If
Return s
End Function
Private Function StreamToString(ByVal strm As UCOMIStream) As String
Dim iptr As IntPtr
Dim s As String
GetHGlobalFromStream(strm, iptr)
' *** THIS IS ODD TOO ***
' If the source is the WebBrowser control then Ansi must be used ***
s = Marshal.PtrToStringAnsi(iptr)
' If the source is the cloned and modified document then Auto must
be used ***
s = Marshal.PtrToStringAuto(iptr)
' ***
Return s
End Function
</code>
Having cloned the document, I modify it by inserting some tags into the body
element. I then call GetDocumentSource() to get the HTML from the cloned
document.
However, the HTML that is returned is the original HTML, from the WebBrowser
control, and not from the cloned document. I know that the cloned document
contains the correct HTML because if I execute the following for the cloned
document
?doc.all.tags("html").Item(0).outerhtml
in the command window, I get what I expect in the body element.
Can anyone suggest why this is happening?
I have also highlighted an oddity in function StreamToString(), which I do
not understand. Why would the encoding of the HTML change between the
browser and a cloned document?
TIA
Charles
one.
This is the most bizarre behaviour of MSHTML and streams.
I have a WebBrowser control that contains nothing but some default HTML. I
want to copy the document and modify it before saving it to disk.
So, I clone the document like this:
<code>
Private Function CloneDocument(ByVal doc As mshtml.IHTMLDocument2) As
mshtml.IHTMLDocument2
Dim newdoc2 As mshtml.IHTMLDocument2
Dim ips As IPersistStreamInit
Dim strm As UCOMIStream
Dim source As String
' Create and initialise a new document object
newdoc2 = New mshtml.HTMLDocument
ips = DirectCast(newdoc2, IPersistStreamInit)
ips.InitNew()
Do Until newdoc2.readyState = "complete"
Application.DoEvents()
Loop
' Get the current document HTML as a stream
source = GetDocumentSource(AxWebBrowser1.Document) ' see below
strm = GetStream(source) ' see below
' Load the new document from the stream
ips.Load(strm)
' Wait until the new document has settled
Do Until newdoc2.readyState = "complete"
Application.DoEvents()
Loop
Return newdoc2
End Function
Private Function GetStream(ByVal s As String) As UCOMIStream
Dim iptr As IntPtr
Dim istrm As UCOMIStream
' Get a pointer to the string
iptr = Marshal.StringToHGlobalAuto(s)
' Create the stream from the pointer
CreateStreamOnHGlobal(iptr, True, istrm)
Return istrm
End Function
Private Function GetStream(ByVal size As Integer) As UCOMIStream
Dim iptr As IntPtr
Dim strm As UCOMIStream
' Create a pointer to a block of the required size
iptr = Marshal.AllocHGlobal(size)
' Create the stream from the pointer
CreateStreamOnHGlobal(iptr, True, strm)
Return strm
End Function
Private Function GetDocumentSource(ByVal doc As mshtml.IHTMLDocument2,
ByVal resetIsDirty As Boolean) As String
Dim stream As UCOMIStream
Dim ips As IPersistStreamInit
Dim s As String
ips = DirectCast(doc, IPersistStreamInit)
If ips Is Nothing Then
s = Nothing
Else
stream = GetStream(2048)
' Save the document into the comstream, without clearing the
IsDirty flag
ips.Save(stream, resetIsDirty)
s = StreamToString(stream)
End If
Return s
End Function
Private Function StreamToString(ByVal strm As UCOMIStream) As String
Dim iptr As IntPtr
Dim s As String
GetHGlobalFromStream(strm, iptr)
' *** THIS IS ODD TOO ***
' If the source is the WebBrowser control then Ansi must be used ***
s = Marshal.PtrToStringAnsi(iptr)
' If the source is the cloned and modified document then Auto must
be used ***
s = Marshal.PtrToStringAuto(iptr)
' ***
Return s
End Function
</code>
Having cloned the document, I modify it by inserting some tags into the body
element. I then call GetDocumentSource() to get the HTML from the cloned
document.
However, the HTML that is returned is the original HTML, from the WebBrowser
control, and not from the cloned document. I know that the cloned document
contains the correct HTML because if I execute the following for the cloned
document
?doc.all.tags("html").Item(0).outerhtml
in the command window, I get what I expect in the body element.
Can anyone suggest why this is happening?
I have also highlighted an oddity in function StreamToString(), which I do
not understand. Why would the encoding of the HTML change between the
browser and a cloned document?
TIA
Charles