Why is HttpWebRequest different than IE?

  • Thread starter Thread starter Dan
  • Start date Start date
D

Dan

Why when I retrieve the response from a URL using HttpWebRequest, do I
end up with HTML that is different than IE, even if I set the
HTTP_USER_AGENT to be the same as IE?

Here's a super-simple example. Note that in IE when you load this
URL, there are "£" symbols in front of all of the prices (verified by
View | Source), but in the HttpWebRequest response there are not...the
actual HTML is definitely different...

HttpWebRequest eRequest =
(HttpWebRequest)WebRequest.Create(
"http://cgi6.ebay.co.uk/aw-cgi/eBayISAPI.dll?ViewBids&item=2340661957");
eRequest.Headers.Add("HTTP_USER_AGENT",
"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR
1.0.3705)");
eRequest.Headers.Add("HTTP_ACCEPT", "*/*");
eRequest.Headers.Add("HTTP_ACCEPT_LANGUAGE", "en-us");

WebResponse eResponse = eRequest.GetResponse();

string eContent = new
StreamReader(eResponse.GetResponseStream()).ReadToEnd();
Debug.WriteLine(eContent);

Any help appreciated.

BTW - I thought it might be cookies, so I disabled cookies in IE to
see if that made a difference - it doesn't!
 
Dan said:
Why when I retrieve the response from a URL using HttpWebRequest, do I
end up with HTML that is different than IE, even if I set the
HTTP_USER_AGENT to be the same as IE?

Here's a super-simple example. Note that in IE when you load this
URL, there are "£" symbols in front of all of the prices (verified by
View | Source), but in the HttpWebRequest response there are not...the
actual HTML is definitely different...

HttpWebRequest eRequest =
(HttpWebRequest)WebRequest.Create(
"http://cgi6.ebay.co.uk/aw-cgi/eBayISAPI.dll?ViewBids&item=2340661957");
eRequest.Headers.Add("HTTP_USER_AGENT",
"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR
1.0.3705)");
eRequest.Headers.Add("HTTP_ACCEPT", "*/*");
eRequest.Headers.Add("HTTP_ACCEPT_LANGUAGE", "en-us");

WebResponse eResponse = eRequest.GetResponse();

string eContent = new
StreamReader(eResponse.GetResponseStream()).ReadToEnd();
Debug.WriteLine(eContent);

Any help appreciated.

BTW - I thought it might be cookies, so I disabled cookies in IE to
see if that made a difference - it doesn't!

This sounds like an encoding difference.

You should look at the messages sent and received by IE and your program
with a network sniffer like ProxyTrace from http://pocketsoap.com. You could
compare them and see the differences.
 
Dan said:
Why when I retrieve the response from a URL using HttpWebRequest, do
I end up with HTML that is different than IE, even if I set the
HTTP_USER_AGENT to be the same as IE?

Here's a super-simple example. Note that in IE when you load this
URL, there are "£" symbols in front of all of the prices (verified by
View | Source), but in the HttpWebRequest response there are
not...the actual HTML is definitely different...

HttpWebRequest eRequest =
(HttpWebRequest)WebRequest.Create(
"http://cgi6.ebay.co.uk/aw-cgi/eBayISAPI.dll?ViewBids&item=234066195
7"); eRequest.Headers.Add("HTTP_USER_AGENT",
"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR
1.0.3705)");
eRequest.Headers.Add("HTTP_ACCEPT", "*/*");
eRequest.Headers.Add("HTTP_ACCEPT_LANGUAGE", "en-us");

WebResponse eResponse = eRequest.GetResponse();

string eContent = new
StreamReader(eResponse.GetResponseStream()).ReadToEnd();
Debug.WriteLine(eContent);

The console uses almost certainly a different encoding than the actual
content received from the web server. Thus, dumping web non-ASCII
content to the console "as is" is going to give you some funny or
missing characters.

In addition to that, you blindly decode the response assuming it's
encoded with UTF-8 (default StreamReader constructor). Unfortunately,
www.ebay.co.uk is using an 8 bit encoding. Which one it does not bother
to specify, but since it's running on IIS4, Windows-1252 is a safe bet
;-)

Cheers,
 
Back
Top