web developer 2010

  • Thread starter Thread starter jdrott1
  • Start date Start date
J

jdrott1

I'm trying to get the table data from a website. i found this snippet
of code online to pull the meta info. how can i change this to pull
all the table info?

'reads the html into an html document to enable parsing
Dim doc As IHTMLDocument2 = New HTMLDocumentClass()
doc.write(New Object() {responseFromServer})
doc.close()

'loops through each element in the document to check if it
qualifies for the attributes to be set
For Each el As IHTMLElement In DirectCast(doc.all,
IHTMLElementCollection)
' check to see if all the desired attributes were found
with the correct values
Dim qualify As Boolean = True
If el.tagName = "tr" Then
Dim meta As HTMLMetaElement = DirectCast(el,
HTMLMetaElement)
Response.Write("Content " + meta.content & "<br/>")

End If
Next
 
You want the documentation for the IE document
object model (DOM). I think you can probably use the
basics of the code below. Just change it to:

If el.tagName = "TABLE" then
' get el.outerText

You can access any valid attribute for any HTML
element, including style.* There is also a children
collection that returns contained elements like
TR, TD, etc.

Watch out, though, for DOM irregularities. Since
IE6 Microsoft has offered the option to specify
whether a page renders in standards mode or
"quirks mode". The DOM methods that are not
standards mode will not work in standards mode!
(Ex.: documentElement is new with standards mode.)

They were trying to help web designers adopt
gradually to IE updates, but it only created more
problems. At this point each version of IE is still
incompatible with the last, but now there are
also 2 webpage types. See here for how to test:

http://msdn.microsoft.com/en-us/library/ms533687(v=vs.85).aspx

(As of IE8 they're even halfway to breaking that.
They've decided willy nilly to change compatMode
to documentMode.)

I don't have a link offhand to the DOM differences you
need to deal with, but you should plan to test in both
modes. (Unfortunately, the docs are all for standards
mode now, despite the fact that IE effectively has 2
separate DOMs in effect.)

For quirks mode skip the DOCTYPE tag in your test
page. For standards mode the following will work:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd">



| I'm trying to get the table data from a website. i found this snippet
| of code online to pull the meta info. how can i change this to pull
| all the table info?
|
| 'reads the html into an html document to enable parsing
| Dim doc As IHTMLDocument2 = New HTMLDocumentClass()
| doc.write(New Object() {responseFromServer})
| doc.close()
|
| 'loops through each element in the document to check if it
| qualifies for the attributes to be set
| For Each el As IHTMLElement In DirectCast(doc.all,
| IHTMLElementCollection)
| ' check to see if all the desired attributes were found
| with the correct values
| Dim qualify As Boolean = True
| If el.tagName = "tr" Then
| Dim meta As HTMLMetaElement = DirectCast(el,
| HTMLMetaElement)
| Response.Write("Content " + meta.content & "<br/>")
|
| End If
| Next
 
.....An update.... I knew I had more info about
compatMode somewhere. The following is
an example of something that's broken between
the two IE DOMs.

If IE.document.compatMode = "CSS1Compat" Then
MsgBox IE.document.documentElement.innerHTML
Else
MsgBox IE.document.body.innerHTML
End If

Many of the issues are along those lines -- cases
where Body used to be used and documentElement
has replaced that. Also, if you might ever deal with
IE5 you also need to error-trap the compatMode
check.

The stunning error that Microsoft made is that
they broke backward compatibility to quirks mode,
not only in the page rendering but also in the
automation DOM.

In other words, documentElement.innerHTML does
not work in quirks mode and body.innerHTML does
not work in standards mode. So you really have to
think of it as a case of using 2 different browsers...
and you never know which browser has opened the
webpage until you check compatMode.
 
that's awesome... here's how i got it to work:

For Each el As IHTMLElement In DirectCast(doc.all,
IHTMLElementCollection)
' check to see if all the desired attributes were found
with the correct values
Dim qualify As Boolean = True
If el.tagName = "TABLE" Then
Dim meta As HTMLTableClass = DirectCast(el,
HTMLTableClass)
Response.Write(el.outerText)

End If
Next
 
|
| I guess it time to get greedy... How can I change the format of the
| data I receive?
|

The format? You mean the HTML itself?....the
outerText string you're getting? I don't know
what you mean. An HTML page is just text, so you
can do whatever you want with it.
 
|
| I guess it time to get greedy...  How can I change the format of the
| data I receive?
|

  The format? You mean the HTML itself?....the
outerText string you're getting? I don't know
what you mean. An HTML page is just text, so you
can do whatever you want with it.

thanks. i was able to configure things by using css.
 
?Try this one,
It is used this week by somebody in the VB forum and while he first thought
it did not go as should, he replied latter it did.

http://www.vb-tips.com/MSHTML.aspx

Success

Cor

"jdrott1" wrote in message

I'm trying to get the table data from a website. i found this snippet
of code online to pull the meta info. how can i change this to pull
all the table info?

'reads the html into an html document to enable parsing
Dim doc As IHTMLDocument2 = New HTMLDocumentClass()
doc.write(New Object() {responseFromServer})
doc.close()

'loops through each element in the document to check if it
qualifies for the attributes to be set
For Each el As IHTMLElement In DirectCast(doc.all,
IHTMLElementCollection)
' check to see if all the desired attributes were found
with the correct values
Dim qualify As Boolean = True
If el.tagName = "tr" Then
Dim meta As HTMLMetaElement = DirectCast(el,
HTMLMetaElement)
Response.Write("Content " + meta.content & "<br/>")

End If
Next
 
Back
Top