AxWebBrowser & HTML Code

Nabor Gilgalad · Jan 27, 2005

Hello Newsgroup,

I have a simple question I think, but after a long time searching, get
no answer.
I want to get the original html source code from a page loaded in the
AxWebBrowser Control.

At the Moment i use this code:

IHTMLDocument3 htmlDoc3 = new HTMLDocumentClass();
htmlDoc3 = (IHTMLDocument3) browser01.Document;
String inHTML = htmlDoc3.documentElement.innerHTML.ToString();

But the HTML code in inHTML is not the original source code.
It seems to be parsed and analysed HTML without quots and other thinks
that look bad.

It can't be that there ist no other method to get the original source
code of a web page.

Can somebody help me?

Regards Nabor

Ashish Das · Jan 27, 2005

You do not have to go so far to get HTML of a webpage, use HTTPWebRequest
and there are other ways to do the same thing.

Nabor Gilgalad · Jan 27, 2005

Where can I get the HTTPWebRequest from the AxWebBroser?
If this is not possible and I have to request a page twice, first for
the browser controle an then to get the source code, this is not a
solution for me.
I have to display a web page in the browser control and I can't request
the same page a second time.
Please explain a little more detailed how do you think this would work.

Regards Nabor

Adam · Jan 27, 2005

I have researched this question very extensively, and I'm confident
that it is impossible to get the original, unparsed HTML code out of
the AxWebBrowser control.

I think that your best solution may be to acquire the HTML via an
HTTPWebRequest -- which will give you the raw HTML -- and then feed the
HTML to the web browser control by assigning it to
IHTMLDocument2.body.outerHTML.

Nabor Gilgalad · Jan 27, 2005

Realy... thats not good...
Does this mean that I have to rebuild the functionality for get and post
request?
Because, when I click on in link in the browser control i have to get
the next page by HTTPWebRequest and not just with the browser control...

Regards Nabor

WRH · Jan 27, 2005

Hello
Here's a snippet of C++ code I use that may help
if C# .NET has equivalent methods...

(Init m_Doc first)

IHTMLElement *m_Element, *m_ParElement
IHTMLDocument2 *m_Doc

CString CMfcieView::GetSource()
{
CString src;
BSTR bstr;

HRESULT res;

if(m_Doc == NULL)return "Error";

try
{
res = m_Doc->get_body(&m_Element);

if(res != S_OK)return "Error";

res = m_Element->get_parentElement(&m_ParElement);

if(res != S_OK)return "Error";

res = m_ParElement->get_outerHTML(&bstr);

if(res != S_OK)return "Error";
}
catch (...)
{
return "Error";
}

bstr_t tmpbstr(bstr, FALSE);

src = (LPCTSTR)tmpbstr;

return src;

}

Nabor Gilgalad · Jan 27, 2005

Hello,

I'm not that good in C++.
Have a look to my comments.

Hello
Here's a snippet of C++ code I use that may help
if C# .NET has equivalent methods...

(Init m_Doc first)

IHTMLElement *m_Element, *m_ParElement
IHTMLDocument2 *m_Doc

CString CMfcieView::GetSource()
{
CString src;
BSTR bstr;

HRESULT res;

if(m_Doc == NULL)return "Error";

try
{
res = m_Doc->get_body(&m_Element);

if(res != S_OK)return "Error";

res = m_Element->get_parentElement(&m_ParElement);

if(res != S_OK)return "Error";

res = m_ParElement->get_outerHTML(&bstr);

if(res != S_OK)return "Error";
}
catch (...)
{
return "Error";
}

Until here I'm with you. I've got the value outerHTML.
This value is not the original source code so far.
And what comes next?
What did you do here?

Nabor Gilgalad · Jan 27, 2005

In the browser control there is a context menu where I get the original
source code. Why should this function not be available via C#?

WRH · Jan 27, 2005

Nabor Gilgalad said:
Hello,

I'm not that good in C++.
Have a look to my comments.

Until here I'm with you. I've got the value outerHTML.
This value is not the original source code so far.
And what comes next?
What did you do here?

This just converts a C++ BSTR type (a string with a length
specifier which may contain nulls) to a C++ CString type,
equiv to a C# string

Heres the BSTR definition..

.... BSTRs, which are length-prefixed strings. The length is stored as an
integer at the memory location preceding the data in the string.
A BSTR is null-terminated after the last counted character but may also
contain null characters embedded within the string. The string length is
determined by the character count, not the first null character.

Does C# .NET specify the argument type for get_outerHTML ...
perhaps its not a BSTR...

mshtml and frames/iframes	1	Sep 15, 2008
axWebBrowser not displaying images	9	Oct 16, 2007
HTML source from HTMLDocument (AxWebBrowser) in lowercase	0	Jul 30, 2003
use AxWebBrowser to get the HTML source of a page in frameset?	2	Oct 8, 2004
get/set javascript variable from C#	0	Feb 20, 2004
setting html code in axwebbrowser	2	Nov 2, 2004
AxWebBrowser and JavaScript,	4	Oct 10, 2003
AxWebBrowser questions	2	Nov 25, 2003

AxWebBrowser & HTML Code

Nabor Gilgalad

Ashish Das

Nabor Gilgalad

Adam

Nabor Gilgalad

WRH

Nabor Gilgalad

Nabor Gilgalad

WRH

Ask a Question

Similar Threads