Relative URL in <A> tag being converted to absolute URL

  • Thread starter Thread starter John Williams
  • Start date Start date
J

John Williams

I'm using the WebBrowser control and Microsoft HTML Object Library
(MSHTML) in a VB .Net program. I'm trying to read <P> tags and their
contents from one HTML document, test1.htm, and insert them into
another, test2.htm. The problem is the relative URL in the <A> tag in
the 3rd paragraph.

test1.htm contains:
<HTML><HEAD></HEAD>
<BODY>
<P>This is the first paragraph</P>
<P>This is the second paragraph</P>
<P>This is the third paragraph with a <A
href="../../dir1/dir2/page2.htm">link</A> included</P>
</BODY>
</BODY>

test2.htm contains:
<HTML><HEAD></HEAD>
<BODY>
<HR>
</BODY>
</HTML>

The <P> tags are read from test1.htm and saved in a listbox using:

mElements = mDoc.getElementsByTagName("P")
For Each mElement In mElements
lstPTags.Items.Add(mElement.outerHTML)
Next

They are then inserted before the <HR> tag in test2.htm using:

For Each mElement In mDoc.all
If mElement.tagName = "HR" Then
For i = 0 To lstPTags.Items.Count - 1
mElement.insertAdjacentHTML("beforeBegin",
lstPTags.Items(i))
Next
End If
Next

This works fine, EXCEPT for the 3rd paragraph which contains an <A>
tag link. The link is converted from a relative URL to an absolute
URL in the modified test2.htm :

<P>This is the third paragraph with a <A
href="file:///C:/Documents%20and%20Settings/dir1/dir2/page2.htm">link</A>
included</P>

My question is: how can I copy and insert the <P> tag and embedded <A>
tag exactly as it appears in test1.htm into test2.htm?
 
Hi John,

I am not sure but are you looking for "URLUnencoded" in the DOM or
something,

when that is so, there is also a lot of stuff about this in the HttpUtility
members.

I hope this helps a little bit?

Cor
 
Cor said:
Hi John,

I am not sure but are you looking for "URLUnencoded" in the DOM or
something,

URLUnencoded returns the URL of the document containing the html, i.e.
test1.htm in my case. I need to read the <P> tag elements from one
document and write them without modification to another html document.

Perhaps I'm doing it the wrong way. I've also tried saving the <P>
tag IHTMLElement objects in an array:

Public mElementArray As mshtml.IHTMLElement()
Public iElements as Integer

mElementArray(iElements) = mElement
----- or -----
mElementArray(iElements) = mElement.cloneNode(True)

but both these statements lock up the program.
 
Hi John,

I think I saw it wrong, you want the innerHTML

If I was you I would take a look for that, it sets or retrieves the HTML
between the start and end tags of the object.

This is some text I did copy from MSDN

The innerHTML property is valid for both block and inline elements. By
definition, elements that do not have both an opening and closing tag cannot
have an innerHTML property.

The innerHTML property takes a string that specifies a valid combination of
text and elements.

When the innerHTML property is set, the given string completely replaces the
existing content of the object. If the string contains HTML tags, the string
is parsed and formatted as it is placed into the document.

This property is accessible at run time, as of Microsoft® Internet Explorer
5. Removing elements at run time, before the closing tag is parsed, could
prevent other areas of the document from rendering.

I don't know if it is it, but it is nearby I think?

Cor
 
Cor said:
Hi John,

I think I saw it wrong, you want the innerHTML

Thank you for your help and interest in my problem.

Yes, innerHTML is better than outerHTML for reading the <P> tag
element. However, I can't get the <A> anchor tag to appear correctly
in the output .htm document. I've tried 2 methods:

Private Function createIntroHTML(ByRef HRElement As
mshtml.IHTMLElement, ByRef mDoc As mshtml.HTMLDocument) As String

Dim mNewElement As mshtml.IHTMLElement
Dim mNewNode As mshtml.IHTMLDOMNode
Dim strLines() As String = Split(txtIntro.Text, vbNewLine)
Dim i as Integer
Dim strHTML As String

For i = 0 To UBound(strLines) - 1

'Nearly, but puts anchor tag as text, i.e. <A
href=.....>Link</A>

'mNewElement = mDoc.createElement("P")
'mNewElement.setAttribute("class", "text4")
'mNewElement.innerText = strLines(i)
'mNewNode = mNewElement
'HRElement.insertAdjacentElement("beforeBegin", mNewNode)

'Nearly, but converts relative URL in anchor tag to
absolute URL

strHTML = "<P class=""text4"">" + strLines(i) + "</P>"
HRElement.insertAdjacentHTML("beforeBegin", strHTML)

Next

End Function


I just need a way to prevent the relative URL being changed to an
absolute URL.
 
Hi John,

It goes a little bit to far to figure this out the MSHTML is not the nicest
thing to do.

But I use it on a way like this, (although it is not a complete innerHtml
that I ever did try)
Charles, the one who knows the most of this in this newsgroup (but I have
not seen him a while) told me that the document2 was much faster and when I
tried, it was.


Dim iDocument As mshtml.IHTMLDocument2
Dim myText as string
Dim i As Integer
For i = 0 To iDocument.all.length - 1
Dim tagname As String = iDocument.all.item(i).tagname
If (tagname = "p") Then myText =
iDocument.all.item(i).innerText.ToString
next

You can try it, it is not that big deal to try it yourself.

Give me a message if it did go or not, I am strugling also with some things
in MSHTML?

Cor
 
Back
Top