heavy problem with HTMLDocument

  • Thread starter Thread starter pierre
  • Start date Start date
P

pierre

Hi, I got a problem which may easy to resolve, but I can't
find any issue:

I want to parse html files, so, I want first get it from an
url, and I do like that:

Dim objMSHTML As New mshtml.HTMLDocument()
Dim objDocument As mshtml.HTMLDocument
objDocument =
objMSHTML.createDocumentFromUrl("http://www.google.fr",
vbNullString)

normally, this should work and I could parse the html
code... but in fact, I got this error:

"Une exception non gérée du type
'System.NullReferenceException' s'est produite dans
mscorlib.dll
Informations supplémentaires : La référence d'objet n'est
pas définie à une instance d'un objet."

(sorry, my vb version is french)

any Idea?
PS: I think this code works with VB6...

une idée?
 
Hi, I got a problem which may easy to resolve, but I can't
find any issue:

I want to parse html files, so, I want first get it from an
url, and I do like that:

Dim objMSHTML As New mshtml.HTMLDocument()
Dim objDocument As mshtml.HTMLDocument
objDocument =
objMSHTML.createDocumentFromUrl("http://www.google.fr",
vbNullString)

Use the built-in .NET networking objects. See:

http://tinyurl.com/98ey
 
Thank you, Patrick
I've just read the article...
but it doesn't seems that it can help me to parse the
html... using mshtml.HTMLDocument, I though I could use the
"links" property which is supposed to give an access to
links in html...

 
Pierre,
I' never seen this methode, so I am curious if it works, but that is not in
one time.

I will advise you to take a look at the "webbrowser" with that you can
"navigate" to an URL
(It uses Internet explorer 6, don't ask me how)

Then with the "documentscomplete" events from the "webbrowser" you can get
the documents conform the dom.

When there is a frame's there is for every frame a document.
There is too a navigate-complete, but with that you get only the last page
downloaded

That's why I find the methode you use strange, but I saw it too in the
documentation

I hope I did bring you in the right direction.
It is to much to give a quick example.

And the webbrowser is only one of the methode's I think you can use, but
that I use for this things at the moment.

I hope it helps you a little bit.
Cor
 
Thank you, Patrick
I've just read the article...
but it doesn't seems that it can help me to parse the
html... using mshtml.HTMLDocument, I though I could use the
"links" property which is supposed to give an access to
links in html...

Sorry -- forgot about your parsing issue.

Perhaps you could get the raw HTML using the .NET WebRequest and then
feed that into the mshtml.HTMLDocument object. I've never used that
object before so I'm not sure if you can load it with your own HTML.
 
Hi Pierre

The problem is that although you create a new mshtml.HTMLDocument, it is not
being initialised.

Try the following:

<code>
Dim objMSHTML As New mshtml.HTMLDocument
Dim objDocument As mshtml.IHTMLDocument2
Dim ips As IPersistStreamInit

ips = DirectCast(objMSHTML, IPersistStreamInit)
ips.InitNew()

objDocument = objMSHTML.createDocumentFromUrl("http://www.google.fr",
vbNullString)

Do Until objDocument.readyState = "complete"
Application.DoEvents()
Loop

Debug.WriteLine(objDocument.body.outerHTML)
</code>

At the end of this you can access the DOM. Note that you need to define the
IPersistStreamInit interface.

HTH

Charles


Hi, I got a problem which may easy to resolve, but I can't
find any issue:

I want to parse html files, so, I want first get it from an
url, and I do like that:

Dim objMSHTML As New mshtml.HTMLDocument()
Dim objDocument As mshtml.HTMLDocument
objDocument =
objMSHTML.createDocumentFromUrl("http://www.google.fr",
vbNullString)

normally, this should work and I could parse the html
code... but in fact, I got this error:

"Une exception non gérée du type
'System.NullReferenceException' s'est produite dans
mscorlib.dll
Informations supplémentaires : La référence d'objet n'est
pas définie à une instance d'un objet."

(sorry, my vb version is french)

any Idea?
PS: I think this code works with VB6...

une idée?
 
Pierre

In case you don't have it, here is the IPersistStreamInit interface
definition

<code>
Imports System.Runtime.InteropServices

<ComVisible(True), ComImport(),
Guid("7FD52380-4E07-101B-AE2D-08002B2EC713"), _
InterfaceTypeAttribute(ComInterfaceType.InterfaceIsIUnknown)> _
Public Interface IPersistStreamInit
' IPersist interface
Sub GetClassID(ByRef pClassID As Guid)

<PreserveSig()> Function IsDirty() As Integer
<PreserveSig()> Function Load(ByVal pstm As UCOMIStream) As Integer
<PreserveSig()> Function Save(ByVal pstm As UCOMIStream, ByVal
fClearDirty As Boolean) As Integer
<PreserveSig()> Function GetSizeMax(<InAttribute(), Out(),
MarshalAs(UnmanagedType.U8)> ByRef pcbSize As Long) As Integer
<PreserveSig()> Function InitNew() As Integer
End Interface
</code>

HTH

Charles
 
Thanks a lot, it works perfectly :)
P.
Pierre

In case you don't have it, here is the IPersistStreamInit interface
definition

<code>
Imports System.Runtime.InteropServices

<ComVisible(True), ComImport(),
Guid("7FD52380-4E07-101B-AE2D-08002B2EC713"), _
InterfaceTypeAttribute(ComInterfaceType.InterfaceIsIUnknown)> _
Public Interface IPersistStreamInit
' IPersist interface
Sub GetClassID(ByRef pClassID As Guid)

<PreserveSig()> Function IsDirty() As Integer
<PreserveSig()> Function Load(ByVal pstm As UCOMIStream) As Integer
<PreserveSig()> Function Save(ByVal pstm As UCOMIStream, ByVal
fClearDirty As Boolean) As Integer
<PreserveSig()> Function GetSizeMax(<InAttribute(), Out(),
MarshalAs(UnmanagedType.U8)> ByRef pcbSize As Long) As Integer
<PreserveSig()> Function InitNew() As Integer
End Interface
</code>

HTH

Charles


mshtml.HTMLDocument, it is
not need to define
the


.
 
Back
Top