HTTPWebRequest with non-English Characters...

  • Thread starter Thread starter Paul W
  • Start date Start date
P

Paul W

I'm using VB.NET 2K5 and SQL Server 2K5...

I'm trying to use a HTTPWebRequest to pull HTML from a given web page
and a HTTPWebResponse to move it into a streamreader and then screen
scrape it into my database. No problem, it works great, except....

One of the URLs I'm working with has an ö. When I put the string with
this URL into the HTTPWebRequest, I don't get the correct page back
from the website. It seems the HTTPWebRequest translated the ö to a
?? and I get the wrong page.

How do I correct this? Is this a setting in .NET or on the OS level by
changing the language?

Thanks for any help you can give...

Paul W
 
I'll check it out when I get home. I guess you are saying that however
I 'decode' the content retrieved is how the URL is sent out in the
first place?

I would understand if I got the correct page and just the characters
were wrong in the stream, but I'm getting the wrong page back in the
first place, not the correct page with 'bad' data in the stream.

I think I found another solution. The word in the URL is Flöhli, but
looking again at the web page. The URL link on the page tranlates the
ö to %F6. So the Flöhli becomes Fl%F6hli. I'll have to try that
too. Any idea why this happens?

PW
 
Hello, Paul!

PW> I would understand if I got the correct page and just the characters
PW> were wrong in the stream, but I'm getting the wrong page back in the
PW> first place, not the correct page with 'bad' data in the stream.

What do you mean by wrong page? page from wrong url?

PW> I think I found another solution. The word in the URL is Flöhli, but
PW> looking again at the web page. The URL link on the page tranlates the
PW> ö to %F6. So the Flöhli becomes Fl%F6hli. I'll have to try that
PW> too. Any idea why this happens?

it is unicode representation of the non-ASCII symbols in the url.
Take an url with space = %20
http://my url will become http://my url

That is normal behavior, and is known as url encoding.
--
Regards, Vadym Stetsyak
www: http://vadmyst.blogspot.com
 
What do you mean by wrong page? page from wrong url?

Example (not real): I was shooting for the page http://Flöhli and I
pass that to the HTTPWebRequest. However, it reads it as
http://Fl??hli

I didn't realise that the ö would need to be translated to %F6 in the
URL. Now that you reminded me about translating spaces to %20 in a URL
it all clicks. Don't know why it didn't before.

Thanks!
 
Thus wrote Vadym,
Hello, Paul!
PW>> I would understand if I got the correct page and just the
PW>> characters were wrong in the stream, but I'm getting the wrong page
PW>> back in the first place, not the correct page with 'bad' data in
PW>> the stream.
PW>>
What do you mean by wrong page? page from wrong url?
PW>> I think I found another solution. The word in the URL is Flöhli,
PW>> but
PW>> looking again at the web page. The URL link on the page tranlates
PW>> the
PW>> ö to %F6. So the Flöhli becomes Fl%F6hli. I'll have to try that
PW>> too. Any idea why this happens?
it is unicode representation of the non-ASCII symbols in the url.
Take an url with space = %20
http://my url will become http://my url
That is normal behavior, and is known as url encoding.

But not in the hostname, which is still subject to RFC 1034, unless IDN support
is available (.NET 2.0).

Cheers,
 
Back
Top