Characters missing when reading from file.

bart.kowalski · Aug 28, 2006

I'm trying to read a text file that contains international
(specifically Polish) characters line by line. I'm using the following
C# code:

FileStream lStream = new FileStream(pFileName, FileMode.Open);
using (StreamReader lReader = new StreamReader(lStream))
{
string lLine;
while ((lLine = lReader.ReadLine()) != null)
ProcessLine(/* blah..blah */);
}

The problem is that all Polish characters are missing. It doesn't even
show them incorrectly. It just completely drops the Polish chars and
the string is shorter than expected as a result. Does anyone know how
to fix this?

Guest · Aug 28, 2006

Bart,

Just making a guess on this one. Do you know what encoding the Polish file
is in? Check out the StreamReader(Stream, Encoding) constructor. By default
the stream is read in UTF8Encoding. Chaging to the other constructor allows
you to specify ASCII, Unicode, UTF7 or UTF8.

Michael

bart.kowalski · Aug 29, 2006

Michael said:
Bart,

Just making a guess on this one. Do you know what encoding the Polish file
is in? Check out the StreamReader(Stream, Encoding) constructor. By default
the stream is read in UTF8Encoding. Chaging to the other constructor allows
you to specify ASCII, Unicode, UTF7 or UTF8.

Thanks. Do you know where I can get more information about the
character encoding?

Regards,
Bart.

Guest · Aug 29, 2006

That's the real question isn't it!

Unfortunately, that really depends on
the source of the file. If you are unable to ask the person that created the
file, try Unicode and keep your fingers crossed!

Michael

bart.kowalski · Aug 29, 2006

Michael said:
That's the real question isn't it! Unfortunately, that really depends on
the source of the file. If you are unable to ask the person that created the
file, try Unicode and keep your fingers crossed!

I found out that the file is in ASCII using the Eastern European code
page, and that's why it doesn't work. My question was where can I get
more information about using character encodings and conversions in
..NET, so that I can make it work. I found the MSDN documentation to be
rather short.

Thanks,
Bart.

Guest · Aug 30, 2006

You mean ANSI then, right? Take a look at
System.Text.Encoding.GetEncoding().

Resources to help you. Good question. I've bene fortunate, the last time I
had to deal with this was many years ago as we have been able to ensure that
files that we needed to parse used UTF8. Try:

Links -
overview - http://www.yoda.arachsys.com/csharp/unicode.html
MS's Global Dev Portal - http://www.microsoft.com/globaldev/default.mspx

Books (I haven't look at any of these so don't know how good they are) -
.NET Internationalization: The Developer's Guide to Building Global
Windows and Web Applications - http://www.bookpool.com/sm/0321341384
Internationalization and Localization Using Microsoft .NET -
http://www.bookpool.com/sm/1590590023

Michael

Cor Ligthert [MVP] · Aug 30, 2006

Bart,

Maybe does this help you to find the right code page you have to convert.

http://www.vb-tips.com/dbPages.aspx?ID=cca7e08a-9580-42b3-beff-76c81839e6c9

As the v is not used in Polish, does the rest of the world as far as I know
not use the l with hypen in it and therefore everybody outside Polen is
mostly saying Walensa.

You should see what "wauwelen" means in Dutch as you are not a fan of him

:-)

Cor

Marc Gravell · Aug 31, 2006

You probably need to find out what encoding (or codepage) was used to
write the file, and pass that in, e.g.

new StreamReader(IStream, Encoding.UTF8)

or - if the file has byte order marks at the start, you /may/ be able
to auto-detect:

new StreamReader(IStream, true)

Marc

bart.kowalski · Aug 31, 2006

Michael said:
You mean ANSI then, right? Take a look at
System.Text.Encoding.GetEncoding().

<snip>

Thanks. It works with GetEncoding(1250). The link you provided contains
some useful information too.

Regards,
Bart.

2.0: reading utf-8 text file with non-ASCII letters	1	May 22, 2006
How to read a text file with wchar?	7	Oct 15, 2005
StringBuilder limitations?	2	Mar 23, 2006
omit blank lines in file using StreamReader	7	Apr 15, 2008
Problems reading special characters from a file	10	May 4, 2006
How to create a .txt file with unicode encoding	1	Mar 27, 2007
Stream Reader	2	Sep 19, 2005
How to read a text file with wide characters?	4	Oct 15, 2005

Characters missing when reading from file.

bart.kowalski

Guest

bart.kowalski

Guest

bart.kowalski

Guest

Cor Ligthert [MVP]

Marc Gravell

bart.kowalski

Ask a Question

Similar Threads