doc.loadxml(axwebbrowser1.document.body.outerhtml), doesn't work?

  • Thread starter Thread starter MeNotHome
  • Start date Start date
M

MeNotHome

I am automating the navigation of a website. When I reach the end, it
changes to xml.

So my axwebbrowser1 has a bunch of xml data in it.

So here is what I am trying to do

Dim xmlText As String
Dim doc As New XmlDocument()
Dim myElem As XmlElement
xmlText = AxWebBrowser1.Document.Body.OuterHTML
doc.LoadXml(xmlText)

But the problem is I get an exception error at the
doc.loadxml(xmltext)
I found a streamreader example but I already have the xml data in the
axwebbrowser control.

How do I get it so I can parse through the nodes?

Thanks for any information
 
what does the entire document look like?

either post it here, or give the url so we can see.

thx,

steve
 
I wish I could post the website but it's on a password protected site.
But here is what the page looks like when viewed in ie6.

<?xml version="1.0" encoding="ISO-8859-1" ?>
- <ShipmentForecasts>
- <ShipmentForecast>
<extract-datetime
code="CST">2003-10-15T15:33:06.000-06:00</extract-datetime>
<FileKey>ALL1*23759*20011200*ALL1</FileKey>
<TransactionIdentification>000343717</TransactionIdentification>
<ReleaseIdentification Type="Original"
DateType="Delivery">20011200</ReleaseIdentification>
<DateTime>2003-10-15T21:33:06.000Z</DateTime>
<StartDateTime>2003-10-13T21:33:06.000Z</StartDateTime>
<EndDateTime>2004-10-06T00:00:00.000Z</EndDateTime>
- <ShipFrom>
- <Company>
etc etc....

Does this help at all?

Thanks for any advice you can give.
 
you know...that looks fine to me. i don't see anything wrong with it. what
does the exception state?

here's some "quicky code" that will help you parse xml in the meantime...i
didn't have time to learn how the new xml stuff worked in .net so this was
my "5 minute problem solved" adaptation.

hth,

steve

Private Sub xmlParsing()
Dim xml As String
xml &= "<authorization>" & vbCrLf
xml &= " <record1>" & vbCrLf
xml &= " <id>1</id>" & vbCrLf
xml &= " <role>1</role>" & vbCrLf
xml &= " <description>system account</description>" & vbCrLf
xml &= " </record1>" & vbCrLf
xml &= " <record2>" & vbCrLf
xml &= " <id>2</id>" & vbCrLf
xml &= " <role>2</role>" & vbCrLf
xml &= " <description>administrator</description>" & vbCrLf
xml &= " </record2>" & vbCrLf
xml &= " <record3>" & vbCrLf
xml &= " <id>3</id>" & vbCrLf
xml &= " <role>3</role>" & vbCrLf
xml &= " <description>standard user</description>" & vbCrLf
xml &= " </record3>" & vbCrLf
xml &= "</authorization>" & vbCrLf
xml &= "<authorization>" & vbCrLf
xml &= " <record4>" & vbCrLf
xml &= " <id>4</id>" & vbCrLf
xml &= " <role>4</role>" & vbCrLf
xml &= " <description>system knob</description>" & vbCrLf
xml &= " </record4>" & vbCrLf
xml &= " <record5>" & vbCrLf
xml &= " <id>5</id>" & vbCrLf
xml &= " <role>5</role>" & vbCrLf
xml &= " <description>certified looser</description>" & vbCrLf
xml &= " </record5>" & vbCrLf
xml &= " <record6>" & vbCrLf
xml &= " <id>6</id>" & vbCrLf
xml &= " <role>6</role>" & vbCrLf
xml &= " <description>emmense whiner</description>" & vbCrLf
xml &= " </record6>" & vbCrLf
xml &= "</authorization>" & vbCrLf
Dim authorization() As String = getMultipleSegments("AUTHORIZATION", xml)
If authorization Is Nothing Then Return
Dim segment As String
For Each segment In authorization
Dim records() As String = getMultipleSegments("RECORD", segment)
If Not records Is Nothing Then
Dim description As String
Dim id As Integer
Dim record As String
Dim role As String
For Each record In records
id = CInt(getSegmentValue(getSingleSegment("ID", record), 0))
role = getSegmentValue(getSingleSegment("ROLE", record))
description = getSegmentValue(getSingleSegment("DESCRIPTION", record))
Console.WriteLine("id: " & id & ", role: " & role & ", description: " &
description)
Next
End If
Next
Console.ReadLine()
End Sub

Private Function getMultipleSegments(ByVal segment As String, ByVal xml As
String) As String()
If xml Is Nothing Then Exit Function
Dim pattern As String = "<\s*" & segment & "[^>]*>.*?<\s*/\s*" & segment &
"[^>]*?>"
Dim regExp As New Regex(pattern, RegexOptions.ExplicitCapture Or
RegexOptions.Singleline Or RegexOptions.IgnoreCase)
If Not regExp.IsMatch(xml) Then Exit Function
Dim match As Match
Dim result() As String
Dim index As Integer
For Each match In regExp.Matches(xml)
ReDim Preserve result(index)
result(index) = match.Value
index += 1
Next
Return result
End Function

Private Function getSingleSegment(ByVal segment As String, ByVal xml As
String) As String
If xml Is Nothing Then Exit Function
Dim pattern As String = "<\s*" & segment & "[^>]*>.*?<\s*/\s*" & segment &
"[^>]*?>"
Dim regExp As New Regex(pattern, RegexOptions.ExplicitCapture Or
RegexOptions.Singleline Or RegexOptions.IgnoreCase)
If Not regExp.IsMatch(xml) Then Exit Function
Dim match As Match = regExp.Match(xml)
Return match.Value
End Function

Private Overloads Function getSegmentValue(ByVal xml As String) As String
If xml Is Nothing Then Exit Function
Dim pattern As String = "<[^>]*>"
Dim regExp As New Regex(pattern)
If Not regExp.IsMatch(xml) Then Exit Function
Return regExp.Replace(xml, "")
End Function

Private Overloads Function getSegmentValue(ByVal xml As String, ByVal
emptyDefault As String) As String
Dim value As String = getSegmentValue(xml)
If value Is Nothing OrElse value.Trim = vbNullString Then value =
emptyDefault
Return value
End Function
 
I am attempting to try out your code.

I am getting a couple of errors when inserting the code.

Inside GetMultipleSegments()
Dim regExp as New Regex(pattern Regex is undefined
Dim match as Match Match is undefined


Inside getSingleSegment
Regex is undefined and Match is undefined

insided Overloads function getSegmentValue
value = emptyDefault() is undefined

thanks for any help



you know...that looks fine to me. i don't see anything wrong with it. what
does the exception state?

here's some "quicky code" that will help you parse xml in the meantime...i
didn't have time to learn how the new xml stuff worked in .net so this was
my "5 minute problem solved" adaptation.

hth,

steve

Private Sub xmlParsing()
Dim xml As String
xml &= "<authorization>" & vbCrLf
xml &= " <record1>" & vbCrLf
xml &= " <id>1</id>" & vbCrLf
xml &= " <role>1</role>" & vbCrLf
xml &= " <description>system account</description>" & vbCrLf
xml &= " </record1>" & vbCrLf
xml &= " <record2>" & vbCrLf
xml &= " <id>2</id>" & vbCrLf
xml &= " <role>2</role>" & vbCrLf
xml &= " <description>administrator</description>" & vbCrLf
xml &= " </record2>" & vbCrLf
xml &= " <record3>" & vbCrLf
xml &= " <id>3</id>" & vbCrLf
xml &= " <role>3</role>" & vbCrLf
xml &= " <description>standard user</description>" & vbCrLf
xml &= " </record3>" & vbCrLf
xml &= "</authorization>" & vbCrLf
xml &= "<authorization>" & vbCrLf
xml &= " <record4>" & vbCrLf
xml &= " <id>4</id>" & vbCrLf
xml &= " <role>4</role>" & vbCrLf
xml &= " <description>system knob</description>" & vbCrLf
xml &= " </record4>" & vbCrLf
xml &= " <record5>" & vbCrLf
xml &= " <id>5</id>" & vbCrLf
xml &= " <role>5</role>" & vbCrLf
xml &= " <description>certified looser</description>" & vbCrLf
xml &= " </record5>" & vbCrLf
xml &= " <record6>" & vbCrLf
xml &= " <id>6</id>" & vbCrLf
xml &= " <role>6</role>" & vbCrLf
xml &= " <description>emmense whiner</description>" & vbCrLf
xml &= " </record6>" & vbCrLf
xml &= "</authorization>" & vbCrLf
Dim authorization() As String = getMultipleSegments("AUTHORIZATION", xml)
If authorization Is Nothing Then Return
Dim segment As String
For Each segment In authorization
Dim records() As String = getMultipleSegments("RECORD", segment)
If Not records Is Nothing Then
Dim description As String
Dim id As Integer
Dim record As String
Dim role As String
For Each record In records
id = CInt(getSegmentValue(getSingleSegment("ID", record), 0))
role = getSegmentValue(getSingleSegment("ROLE", record))
description = getSegmentValue(getSingleSegment("DESCRIPTION", record))
Console.WriteLine("id: " & id & ", role: " & role & ", description: " &
description)
Next
End If
Next
Console.ReadLine()
End Sub

Private Function getMultipleSegments(ByVal segment As String, ByVal xml As
String) As String()
If xml Is Nothing Then Exit Function
Dim pattern As String = "<\s*" & segment & "[^>]*>.*?<\s*/\s*" & segment &
"[^>]*?>"
Dim regExp As New Regex(pattern, RegexOptions.ExplicitCapture Or
RegexOptions.Singleline Or RegexOptions.IgnoreCase)
If Not regExp.IsMatch(xml) Then Exit Function
Dim match As Match
Dim result() As String
Dim index As Integer
For Each match In regExp.Matches(xml)
ReDim Preserve result(index)
result(index) = match.Value
index += 1
Next
Return result
End Function

Private Function getSingleSegment(ByVal segment As String, ByVal xml As
String) As String
If xml Is Nothing Then Exit Function
Dim pattern As String = "<\s*" & segment & "[^>]*>.*?<\s*/\s*" & segment &
"[^>]*?>"
Dim regExp As New Regex(pattern, RegexOptions.ExplicitCapture Or
RegexOptions.Singleline Or RegexOptions.IgnoreCase)
If Not regExp.IsMatch(xml) Then Exit Function
Dim match As Match = regExp.Match(xml)
Return match.Value
End Function

Private Overloads Function getSegmentValue(ByVal xml As String) As String
If xml Is Nothing Then Exit Function
Dim pattern As String = "<[^>]*>"
Dim regExp As New Regex(pattern)
If Not regExp.IsMatch(xml) Then Exit Function
Return regExp.Replace(xml, "")
End Function

Private Overloads Function getSegmentValue(ByVal xml As String, ByVal
emptyDefault As String) As String
Dim value As String = getSegmentValue(xml)
If value Is Nothing OrElse value.Trim = vbNullString Then value =
emptyDefault
Return value
End Function


MeNotHome said:
I wish I could post the website but it's on a password protected site.
But here is what the page looks like when viewed in ie6.

<?xml version="1.0" encoding="ISO-8859-1" ?>
- <ShipmentForecasts>
- <ShipmentForecast>
<extract-datetime
code="CST">2003-10-15T15:33:06.000-06:00</extract-datetime>
<FileKey>ALL1*23759*20011200*ALL1</FileKey>
<TransactionIdentification>000343717</TransactionIdentification>
<ReleaseIdentification Type="Original"
DateType="Delivery">20011200</ReleaseIdentification>
<DateTime>2003-10-15T21:33:06.000Z</DateTime>
<StartDateTime>2003-10-13T21:33:06.000Z</StartDateTime>
<EndDateTime>2004-10-06T00:00:00.000Z</EndDateTime>
- <ShipFrom>
- <Company>
etc etc....

Does this help at all?

Thanks for any advice you can give.
 
Ok, I see I forgot an import.

One more thing.....

When I set xml=axwebbrowser1.document.body.outerhtml

I get all kinds of junk in the xml stirng.

It is displayed perfectly as an xml document in the axwebrowser
control but the string that is returned contains lots of extra junk in
it. And the <Node>xyz</Node> is totally missing from the xml string.

Any ideas on how to get the xml string loaded with only the real xml
data that is displayed in the axwebbrowser control?

thanks
 
Hi MeNo,

If you know when you've reached the xml stage, why not get the file
directly using an HttpRequest? Then you won't need to worry axBrowser messing
it up - you can stick it straight into your XmlDocument.

Regards,
Fergus
 
go to the command window (ctl + g) and type this in when you have it running
and loaded:

?axwebbrowser1.document.body

what shows up? if you don't see xml anywhere, then try:

?axwebbrowser1.document

i think the whole problem is there. one way to bypass the whole use of the
browser control is to create an http request and get the results. i have an
example if you wanna go that route.

let me know what the command window outputs.

thx,

steve
 
I can't get it from the httprequest. There is no single URL that will
get me to the necessary page.

I have to step through a series of pages. Selecting links, filling in
data fields and pressing the right buttons. I have all of this
working programatically of course.

If I enter the URL that is displayed when I get to the final page, it
redirects to the entry page where I have to fill in the information.
The website is ASP based.

Thanks for any additional information.
 
Hi MeNo,

Yes, I've followed your progress with this project. Hurdle after hurdle,
but you're getting there. :-)

If simply using the URL won't work. I'd guess it's because the request has
particular data in the headers. If you could determine what that is, you could
recreate the entire request for yourself.

On way to find out would be to substitute the website's address with your
localhost and have an asp page on a dummy website which simply reports what
the headers are.

Leaving that aside, it may be possible to take the mangled axBrowser
output and 're-xmlify' it. It depends whether it loses information or just
adds in a load of garbage.

Regards,
Fergus
 
hate to say it, but for the betterment of all concerned..."poppy-cock"

there is one page that has data to post to another page. all you have to do
is get the label/value pairs of what is to be posted and post it through an
http request...simple as that. look at the html source of the entry page,
then post the entry page data programmatically to the final url that's going
to return your xml.

btw, this is common methodology for hacking "secure" sites. explore the
hacker w/n.

;^)

steve
 
Yes, I am getting there but boy is it slow progress...LOL


I have tried to figure out if they were posting something via the URL
command line to get to the right page.

Here is the form code for the button I press

<form action="Rel.asp" method="post" name="frmReleaseList"
onsubmit="return frmReleaseList_validate();" style="margin:0">

But I cannot figure out what it wants on the URL line to produce the
right page.

If I put in the "Rel.asp" it jumps to the selection form and doesn't
display any xml data.

thanks for trying to help
 
my or hasn't shown for some reason...so here's my $0.02 usd.
I can't get it from the httprequest. There is no single URL that will
get me to the necessary page.

I have to step through a series of pages. Selecting links, filling in
data fields and pressing the right buttons. I have all of this
working programatically of course.

look at it more simply...

there is one page that has data to post to another page. all you have to do
is get the label/value pairs of what is to be posted and post it through an
http request...simple as that. look at the html source of the entry page,
then post the entry page data programmatically to the final url that's going
to return your xml.

btw, this is common methodology for hacking "secure" sites. explore the
hacker w/n.

;^)

steve
 
Is this true even if the pages are asp based?

my or hasn't shown for some reason...so here's my $0.02 usd.


look at it more simply...

there is one page that has data to post to another page. all you have to do
is get the label/value pairs of what is to be posted and post it through an
http request...simple as that. look at the html source of the entry page,
then post the entry page data programmatically to the final url that's going
to return your xml.

btw, this is common methodology for hacking "secure" sites. explore the
hacker w/n.

;^)

steve
 
absolutely...

you are not presented w/ an "asp" or "php" or any other type of dynamically
generated "page". you are presented with, what you can consider, static
information...beit html, xml, etc. all the browser control is doing
conveniently displaying the results of an http request. there's no magic
involved.

try it out...make your own asp pages that act similarly to your target site.

in the interest of time though, did you print out the document.body and
document properties? i think it may be quicker to resolve where the xml
resides in the control.

steve
 
Hi MeNo,

Right from the start, this is what I've been trying to say to you - work
out what each page is sending in its request and use HttpRequest to match it.
But I accept that working that out is not easy if the concept is new to you.
And I admire what you've done in triggering events within the AxBrowser. :-)

If you set up an asp or aspx page on your server at home or work, you can
get it to display the headers of the request that invokes it. If you can
fiddle the link html in the AxBrowser to call <your> page instead of the real
website, you can do the same - find out what the headers are. Then you can use
HttpRequest and set <its> headers to those values.

However, like Steve says, pragmatism favours following the path you're on
rather than researching a whole new area.

I'm curious and would be interested to see the html of the last couple of
web pages in your sequence. Perhaps you could zip and post them?

Regards,
Fergus
 
Can you give any help on figuring out what is being sent to each page?

Give me more info on using httpRequest. I have not used this before.

Thanks
 
here's an example i've given b4...it sends a soap message, but posting data
is done in a similar fashion. pay attention to headers.add as this is where
session information, etc. is communicated. btw, are you going to attempt to
find where the xml is hidden w/n the control?

hth,

steve

' ============================================

Private Const shopCode As String = "XYZ"
Private Const appId As String = "SOME_GENERATED_PUBLIC_KEY"
Private Const syncServer As String = "http://somecompany.com/websvc.php"
' wow...i thought people just used asp in the real world!

' this is as basic as it gets
' the method is getData(shopcode, appid)
' this is how you'd wrap up the request in soap

Function sendWebRequest() As String
Dim webRequest As HttpWebRequest =
CType(webRequest.Create(syncServer), HttpWebRequest)
Dim webResponse As HttpWebResponse
Dim soapEnvelope As String
soapEnvelope &= "<SOAP:Envelope>" & vbCrLf
soapEnvelope &= " <SOAP:Body>" & vbCrLf
soapEnvelope &= " <getData>" & vbCrLf
soapEnvelope &= " <parameters>" & vbCrLf
soapEnvelope &= " <shopcode xsi:type=""xsd:string"">" &
shopCode & "</shopcode>" & vbCrLf
soapEnvelope &= " <appid xsi:type=""xsd:string"">" & appId
& "</appid>" & vbCrLf
soapEnvelope &= " </parameters>" & vbCrLf
soapEnvelope &= " </getData>" & vbCrLf
soapEnvelope &= " </SOAP:Body>" & vbCrLf
soapEnvelope &= "</SOAP:Envelope>" & vbCrLf
With webRequest
.ContentType = "text/xml"
.Headers.Add("SOAPMethodName", "getData")
.ContentLength = soapEnvelope.Length
.Method = "POST"
.Timeout = 60 * 1000 ' milliseconds to seconds
Dim streamWriter As New StreamWriter(.GetRequestStream())
streamWriter.Write(soapEnvelope)
streamWriter.Close()
webResponse = CType(.GetResponse(), HttpWebResponse)
End With
Dim stream As Stream = webResponse.GetResponseStream
Dim streamReader As New StreamReader(stream)
Dim xmlStream As String = streamReader.ReadToEnd
streamReader.Close()
stream.Close()
Return xmlStream
End Function
 
Thanks for the example code.

Now is there anyway to know what I fields and values I need to pass to
the httprequest?

Thanks
 
Back
Top