UTF8 Decoder

  • Thread starter Thread starter Guest
  • Start date Start date
G

Guest

I am using the following code to decode a UTF string from an XML element.
All the fonts being used is Arial Unicode MS and the UTF characters are not
corrupt as they display correctly within the raw XML page. The code below
works fine with the majority of characters, with the exception of various
characters such as, U+00FC and U+00FE Any assistance would be greatly
appreciated.

Luke

Dim MessageString As String
Dim MessageBuffer() As Byte
Dim MessageChar() As Char
Dim Decoder As System.Text.Decoder
Dim UTF8Code As String
'
MessageString = CurrentMessage.Value
'
' Decode the UTF bytes that are displayed within the XML Service
MessageBuffer = System.Text.Encoding.UTF8.GetBytes(MessageString.ToCharArray)

ReDim MessageChar(MessageBuffer.Length)
Decoder = System.Text.Encoding.UTF8.GetDecoder()
Decoder.GetChars(MessageBuffer, 0, MessageBuffer.Length, MessageChar, 0)
'Loop through the Char() array
UTF8Code = String.Empty
For Each Character As Char In MessageChar
' Format for RTB output
UTF8Code &= "\u" & Convert.ToUInt32(Character).ToString() & "?"
Next Character
MessageString = UTF8Code
 
Luke said:
I am using the following code to decode a UTF string from an XML element.
All the fonts being used is Arial Unicode MS and the UTF characters are not
corrupt as they display correctly within the raw XML page. The code below
works fine with the majority of characters, with the exception of various
characters such as, U+00FC and U+00FE Any assistance would be greatly
appreciated.

Luke

Dim MessageString As String
Dim MessageBuffer() As Byte
Dim MessageChar() As Char
Dim Decoder As System.Text.Decoder
Dim UTF8Code As String
'
MessageString = CurrentMessage.Value
'
' Decode the UTF bytes that are displayed within the XML Service
MessageBuffer = System.Text.Encoding.UTF8.GetBytes(MessageString.ToCharArray)

ReDim MessageChar(MessageBuffer.Length)
Decoder = System.Text.Encoding.UTF8.GetDecoder()
Decoder.GetChars(MessageBuffer, 0, MessageBuffer.Length, MessageChar, 0)
'Loop through the Char() array
UTF8Code = String.Empty
For Each Character As Char In MessageChar
' Format for RTB output
UTF8Code &= "\u" & Convert.ToUInt32(Character).ToString() & "?"
Next Character
MessageString = UTF8Code

It's not entirely clear what you're seeing that you don't expect.

Could you post a short but complete program which demonstrates the
problem?

See http://www.pobox.com/~skeet/csharp/complete.html for details of
what I mean by that.
 
Back
Top