Characters missing when reading from file.

  • Thread starter Thread starter bart.kowalski
  • Start date Start date
B

bart.kowalski

I'm trying to read a text file that contains international
(specifically Polish) characters line by line. I'm using the following
C# code:

FileStream lStream = new FileStream(pFileName, FileMode.Open);
using (StreamReader lReader = new StreamReader(lStream))
{
string lLine;
while ((lLine = lReader.ReadLine()) != null)
ProcessLine(/* blah..blah */);
}

The problem is that all Polish characters are missing. It doesn't even
show them incorrectly. It just completely drops the Polish chars and
the string is shorter than expected as a result. Does anyone know how
to fix this?
 
Bart,

Just making a guess on this one. Do you know what encoding the Polish file
is in? Check out the StreamReader(Stream, Encoding) constructor. By default
the stream is read in UTF8Encoding. Chaging to the other constructor allows
you to specify ASCII, Unicode, UTF7 or UTF8.

Michael
 
Michael said:
Bart,

Just making a guess on this one. Do you know what encoding the Polish file
is in? Check out the StreamReader(Stream, Encoding) constructor. By default
the stream is read in UTF8Encoding. Chaging to the other constructor allows
you to specify ASCII, Unicode, UTF7 or UTF8.

Thanks. Do you know where I can get more information about the
character encoding?

Regards,
Bart.
 
That's the real question isn't it! :) Unfortunately, that really depends on
the source of the file. If you are unable to ask the person that created the
file, try Unicode and keep your fingers crossed!

Michael
 
Michael said:
That's the real question isn't it! :) Unfortunately, that really depends on
the source of the file. If you are unable to ask the person that created the
file, try Unicode and keep your fingers crossed!

I found out that the file is in ASCII using the Eastern European code
page, and that's why it doesn't work. My question was where can I get
more information about using character encodings and conversions in
..NET, so that I can make it work. I found the MSDN documentation to be
rather short.

Thanks,
Bart.
 
You mean ANSI then, right? Take a look at
System.Text.Encoding.GetEncoding().

Resources to help you. Good question. I've bene fortunate, the last time I
had to deal with this was many years ago as we have been able to ensure that
files that we needed to parse used UTF8. Try:

Links -
overview - http://www.yoda.arachsys.com/csharp/unicode.html
MS's Global Dev Portal - http://www.microsoft.com/globaldev/default.mspx

Books (I haven't look at any of these so don't know how good they are) -
.NET Internationalization: The Developer's Guide to Building Global
Windows and Web Applications - http://www.bookpool.com/sm/0321341384
Internationalization and Localization Using Microsoft .NET -
http://www.bookpool.com/sm/1590590023

Michael
 
You probably need to find out what encoding (or codepage) was used to
write the file, and pass that in, e.g.

new StreamReader(IStream, Encoding.UTF8)

or - if the file has byte order marks at the start, you /may/ be able
to auto-detect:

new StreamReader(IStream, true)

Marc
 
Michael said:
You mean ANSI then, right? Take a look at
System.Text.Encoding.GetEncoding().
<snip>

Thanks. It works with GetEncoding(1250). The link you provided contains
some useful information too.

Regards,
Bart.
 
Back
Top