AxWebBrowser & HTML Code

  • Thread starter Thread starter Nabor Gilgalad
  • Start date Start date
N

Nabor Gilgalad

Hello Newsgroup,

I have a simple question I think, but after a long time searching, get
no answer.
I want to get the original html source code from a page loaded in the
AxWebBrowser Control.

At the Moment i use this code:

IHTMLDocument3 htmlDoc3 = new HTMLDocumentClass();
htmlDoc3 = (IHTMLDocument3) browser01.Document;
String inHTML = htmlDoc3.documentElement.innerHTML.ToString();

But the HTML code in inHTML is not the original source code.
It seems to be parsed and analysed HTML without quots and other thinks
that look bad.

It can't be that there ist no other method to get the original source
code of a web page.

Can somebody help me?

Regards Nabor
 
You do not have to go so far to get HTML of a webpage, use HTTPWebRequest
and there are other ways to do the same thing.
 
Where can I get the HTTPWebRequest from the AxWebBroser?
If this is not possible and I have to request a page twice, first for
the browser controle an then to get the source code, this is not a
solution for me.
I have to display a web page in the browser control and I can't request
the same page a second time.
Please explain a little more detailed how do you think this would work.

Regards Nabor
 
I have researched this question very extensively, and I'm confident
that it is impossible to get the original, unparsed HTML code out of
the AxWebBrowser control.

I think that your best solution may be to acquire the HTML via an
HTTPWebRequest -- which will give you the raw HTML -- and then feed the
HTML to the web browser control by assigning it to
IHTMLDocument2.body.outerHTML.
 
Realy... thats not good...
Does this mean that I have to rebuild the functionality for get and post
request?
Because, when I click on in link in the browser control i have to get
the next page by HTTPWebRequest and not just with the browser control... :(

Regards Nabor
 
Hello
Here's a snippet of C++ code I use that may help
if C# .NET has equivalent methods...

(Init m_Doc first)


IHTMLElement *m_Element, *m_ParElement
IHTMLDocument2 *m_Doc

CString CMfcieView::GetSource()
{
CString src;
BSTR bstr;

HRESULT res;

if(m_Doc == NULL)return "Error";

try
{
res = m_Doc->get_body(&m_Element);

if(res != S_OK)return "Error";

res = m_Element->get_parentElement(&m_ParElement);

if(res != S_OK)return "Error";

res = m_ParElement->get_outerHTML(&bstr);


if(res != S_OK)return "Error";
}
catch (...)
{
return "Error";
}

bstr_t tmpbstr(bstr, FALSE);

src = (LPCTSTR)tmpbstr;

return src;


}
 
Hello,

I'm not that good in C++.
Have a look to my comments.
Hello
Here's a snippet of C++ code I use that may help
if C# .NET has equivalent methods...

(Init m_Doc first)


IHTMLElement *m_Element, *m_ParElement
IHTMLDocument2 *m_Doc

CString CMfcieView::GetSource()
{
CString src;
BSTR bstr;

HRESULT res;

if(m_Doc == NULL)return "Error";

try
{
res = m_Doc->get_body(&m_Element);

if(res != S_OK)return "Error";

res = m_Element->get_parentElement(&m_ParElement);

if(res != S_OK)return "Error";

res = m_ParElement->get_outerHTML(&bstr);


if(res != S_OK)return "Error";
}
catch (...)
{
return "Error";
}

Until here I'm with you. I've got the value outerHTML.
This value is not the original source code so far.
And what comes next?
What did you do here?
 
In the browser control there is a context menu where I get the original
source code. Why should this function not be available via C#?
 
Nabor Gilgalad said:
Hello,

I'm not that good in C++.
Have a look to my comments.


Until here I'm with you. I've got the value outerHTML.
This value is not the original source code so far.
And what comes next?
What did you do here?


This just converts a C++ BSTR type (a string with a length
specifier which may contain nulls) to a C++ CString type,
equiv to a C# string

Heres the BSTR definition..

.... BSTRs, which are length-prefixed strings. The length is stored as an
integer at the memory location preceding the data in the string.
A BSTR is null-terminated after the last counted character but may also
contain null characters embedded within the string. The string length is
determined by the character count, not the first null character.


Does C# .NET specify the argument type for get_outerHTML ...
perhaps its not a BSTR...
 
Back
Top