Encoding problem on VB.Net 2005

  • Thread starter Thread starter Lorenzo Puglioli
  • Start date Start date
L

Lorenzo Puglioli

Hi all,

in my windows application I have to read a text file to a string array. The
file contains a û character (code 251).
On my pc all works fine, but in the production one the û characters
disapperas, the string resulting 1 character shorter.

The code that do the reading is shown here:

Using sr As System.IO.StreamReader = New
System.IO.StreamReader(Me.File.OpenRead, System.Text.Encoding.UTF8)
Dim s As List(Of String)
s = New List(Of String)
Do While Not sr.EndOfStream
s.Add(sr.ReadLine)
Loop
Return s.ToArray
End If
End Using

Why, even if I force the UTF8 decoding, the behavior on different PCs is not
the same?
My app runs on framework 2.0, operating system is windows xp pro in italian
language (in both PCs).
On my pc were installed the italian language pack: I also installed it on
the production one, but never changed.

Someone could help me?

Thank you in advance.
Lorenzo Puglioli
 
Hi all,

in my windows application I have to read a text file to a string array. The
file contains a û character (code 251).
On my pc all works fine, but in the production one the û characters
disapperas, the string resulting 1 character shorter.

The code that do the reading is shown here:

Using sr As System.IO.StreamReader = New
System.IO.StreamReader(Me.File.OpenRead, System.Text.Encoding.UTF8)
                Dim s As List(Of String)
                s = New List(Of String)
                Do While Not sr.EndOfStream
                    s.Add(sr.ReadLine)
                Loop
                Return s.ToArray
                End If
            End Using

Why, even if I force the UTF8 decoding, the behavior on different PCs is not
the same?
My app runs on framework 2.0, operating system is windows xp pro in italian
language (in both PCs).
On my pc were installed the italian language pack: I also installed it on
the production one, but never changed.

Someone could help me?

Thank you in advance.
Lorenzo Puglioli

Have you tried "System.Text.Encoding.Default" instead of
System.Text.Encoding.UTF8?
 
UTF-8 is an encoding scheme so it won't store ASCII 251 as &HFB (By my
calculation it stores it as &HC3BB see http://en.wikipedia.org/wiki/Utf-8).
But I'm assuming you are simply losing a character rather than seeing
complete gibberish.

Perhaps the Byte Order Mark is confusing things? What does the underlying
byte stream look like?
 
Lorenzo,

Probably you mean this one

\\\
Dim Str As New StreamReader(FilePath)
Dim arrInput As Byte() = _
System.Text.Encoding.GetEncoding(437).GetBytes(Str.ReadToEnd)
Str.Close()
///

Have a look at my reply to Gino for the rest of your problem.

Cor
 
Thanks Cor,

I can try using the encoding you suggest (423-MS DOS), but I still can't
understand what happens.
Why, even if I force a specific encoding, I obtain different results on
different PCs?

In addition, consider that: in my first post I didn't say that I don't
really care the (251) character being read, because it is in a part of file
that I ignore. But i read positionally some other character after this, so
the problem is that they result shifted left.

I explain.
By semplicity, I simplified the input file so it is 5 bytes long:
(32)(251)(32)(13)(10)
On my pc, the resulting string will be 3 characters long: a space, a strange
character (that I don't care) and a space again.
On the other pc it will be only 2 characters long: a space and another
space. The second space (and all other subsequent characters, if any), is
shifted left!

UPDATE:
In the meantime, I have made another test: if I try to read the file on a
CHAR array (using the utf8 decoder), this results the same on both PCs. The
difference happens when I write the char array to a string.
For example using:

dim s as new string(chars)

(where chars is the CHAR array)

I obtain the two different results shown above, even if char arrays are
identical.
I think this is due to the framework internal representation of strings.
So, why are them different? Can I control this?


Thank you
Lorenzo
 
Lorenzo,


I had a same kind of problem as you too some time ago.

(Using the textbox while copying I got not expected results)

I was forgotten that I had set my computer to the Polish Language, after
setting it back, my problem was gone.

Cor
 
Lorenzo,

I had a same kind of problem as you too some time ago.

(Using the textbox while copying I got not expected results)

I was forgotten that I had set my computer to the Polish Language, after
setting it back, my problem was gone.

Cor

In a thread, i was having difficulty to display and save non-English
chars, setting encoding type to system.text.encoding.default kicked
the problem away. Note that if you're using Notepad, ANSI encoding
type is default and as suggested in my first reply, setting encoding
mode to default may kick your problem away.

Hope this helps.
 
Back
Top