Getting correct encoding from a webresponse

  • Thread starter Thread starter Guest
  • Start date Start date
G

Guest

Hi guys,

I have the following code that retrieves a webpage.
My problem is getting it to use the right encoding.
I've tested it against a danish page, but it won't show the danish
characters. When I set it to sniff the encoding (using
sr = New System.IO.StreamReader(strm, True)
) it sets it to UTF8, but when I browse the page using MSIE with
auto-selection of encoding, it uses Western European (and displays the danish
chars correctly).

Any ideas?

The code:


' Create a new WebRequest Object to the mentioned URL.
Dim myWebRequest As System.Net.WebRequest =
System.Net.WebRequest.Create(txtUrl.Text)

' Set the 'Timeout' property in Milliseconds.
myWebRequest.Timeout = 10000

Dim encode As System.Text.Encoding =
System.Text.Encoding.GetEncoding("Unicode")

' Assign the response object of 'WebRequest' to a 'WebResponse'
variable.
Dim myWebResponse As System.Net.WebResponse =
myWebRequest.GetResponse()
Dim strm As System.IO.Stream
Dim sr As System.IO.StreamReader
Dim line As String

strm = myWebResponse.GetResponseStream()

sr = New System.IO.StreamReader(strm, True)
'sr = New System.IO.StreamReader(strm, encode)
MsgBox("Encoding: " & sr.CurrentEncoding.ToString)
txtSrc.Text = ""
Do
line = sr.ReadLine()
txtSrc.Text += line
Loop While Not line Is Nothing
 
Jonax said:
I have the following code that retrieves a webpage.
My problem is getting it to use the right encoding.
I've tested it against a danish page, but it won't show the danish
characters. When I set it to sniff the encoding (using
sr = New System.IO.StreamReader(strm, True)
) it sets it to UTF8, but when I browse the page using MSIE with
auto-selection of encoding, it uses Western European (and displays the danish
chars correctly).

Any ideas?

You should look at what the HttpWebResponse says the ContentEncoding is
- although the server may not tell you. Guessing accurately is tricky
and there's always a chance you'll get it wrong. To use "Western
European" however, use Encoding.GetEncoding(1252).

(StreamReader will always use UTF-8 by default, as the documentation
states.)
 
Back
Top