Accentuated char

  • Thread starter Thread starter Pascal Cloup
  • Start date Start date
P

Pascal Cloup

Hello,

The methods Read() of the class StreamReader don't read the accentuated
characters. When an accentuated character is present in a file, Read() skip
it and read the following character. Missing something?

Thanks for help

P. Cloup
 
Pascal Cloup said:
The methods Read() of the class StreamReader don't read the accentuated
characters. When an accentuated character is present in a file, Read() skip
it and read the following character. Missing something?

Chances are you've got the wrong encoding. It certainly *does* read
accented characters when everything is correct. How are you
constructing your StreamReader, and what's your data source?
 
Hello Jon


Jon Skeet said:
Chances are you've got the wrong encoding. It certainly *does* read
accented characters when everything is correct. How are you
constructing your StreamReader, and what's your data source?

In fact i create a file stream object and 2 stream objects:
itsFileStream = File.Open( itsPath , FileMode.Open );

itsBinaryReader = new BinaryReader( itsFileStream , Encoding.ASCII );
// Perhaps the problem is here but i also try UTF8

itsStreamReader = new StreamReader( itsFileStream );

Depending of the nature of the file (binary or text) i use one of the 2
streams , but the 2 remain open (??); i need the BinaryStream to determine
if the file is text or not. All works fine except for accentuated characters
(é à è) which are skipped.

Any idea?

Thanks in advance,

Pascal Cloup
 
Pascal Cloup said:
In fact i create a file stream object and 2 stream objects:
itsFileStream = File.Open( itsPath , FileMode.Open );

itsBinaryReader = new BinaryReader( itsFileStream , Encoding.ASCII );
// Perhaps the problem is here but i also try UTF8

itsStreamReader = new StreamReader( itsFileStream );

That sounds like a bad idea to start with. Two readers on the same
stream is bound to cause problems.
Depending of the nature of the file (binary or text) i use one of the 2
streams , but the 2 remain open (??); i need the BinaryStream to determine
if the file is text or not. All works fine except for accentuated characters
(é à è) which are skipped.

I thought the point was that it was definitely a text file - otherwise
why are you trying to read it? *Any* file can be a text file, but it
depends on what encoding is being used as to what that file means.

You need to know what encoding the file is in, and specify that to the
StreamReader - ignore the BinaryReader.
 
Hi Pascal,

Thanks for posting in the community.

From your description, I understand that you want to read accented
characters(in a .txt file) with streamreader/binaryreader,
Please correct me if there is any misunderstand.

First you should have a valid data source as Jon mentioned, for example,
you can save the data with UTF-8/Unicode encoding in the Notepad.

Then, for the reason that the accented character is double character
encoding(byte), I suggest you to use the BinaryReader to read the accented
characters:
itsBinaryReader = new BinaryReader(itsFileStream, Encoding.UTF8); //or
Encoding.Unicode

Now, you can use the Byte[] to read accented characters from the stream
object.


Please apply my suggestion above and let me know if it helps resolve your
problem.


Thanks!

Best regards,

Gary Chang
Microsoft Online Partner Support

Get Secure! - www.microsoft.com/security
This posting is provided "AS IS" with no warranties, and confers no rights.
--------------------
 
Gary Chang said:
Hi Pascal,

Thanks for posting in the community.

From your description, I understand that you want to read accented
characters(in a .txt file) with streamreader/binaryreader,
Please correct me if there is any misunderstand.

First you should have a valid data source as Jon mentioned, for example,
you can save the data with UTF-8/Unicode encoding in the Notepad.

Then, for the reason that the accented character is double character
encoding(byte), I suggest you to use the BinaryReader to read the accented
characters:
itsBinaryReader = new BinaryReader(itsFileStream, Encoding.UTF8); //or
Encoding.Unicode

Now, you can use the Byte[] to read accented characters from the stream
object.

Using BinaryReader is a bad idea - StreamReader is designed for exactly
the job required.
 
Hello Jon & Gary

You need to know what encoding the file is in, and specify that to the
StreamReader - ignore the BinaryReader.

Ok, now i create only a StreamReader object for text file.

When the file is created with Encoding.UTF8, the StreamReader object reads
correctly the accentueted char.
But if a file is created with Encoding.Default (or ASCII), the StreamReader
object
skip the accentuated char.

I understand that my problem depends on the constrains of Encoding, but:

How to know the Encoding of a file before creating a reader?
How to specify the Encoding of a StreamReader?

Thanks for help,

Pascal Cloup
 
Pascal Cloup said:
Ok, now i create only a StreamReader object for text file.

When the file is created with Encoding.UTF8, the StreamReader object reads
correctly the accentueted char.
But if a file is created with Encoding.Default (or ASCII), the StreamReader
object skip the accentuated char.

I understand that my problem depends on the constrains of Encoding, but:

How to know the Encoding of a file before creating a reader?

You should just know it - there's no absolutely accurate way of
determining an encoding from just the binary data. There are ways you
can guess it heuristically, but there's nothing in the framework which
will do this for you. (Some encodings will fix themselves in terms of
endianness, but that's not the kind of issue you're looking at here.)
How to specify the Encoding of a StreamReader?

StreamReader reader = new StreamReader(stream, encoding)
 
Hello,

Thank you both,

Finally, i use the Encoding.Default (that corespond to ANSI character set)
encoding for both StreamWriter and StremReader. I can not use UTF8/7 or
unicode for some reasons of compatibility with files created with older
software or created on other platform (Mac).

Nevertheless, it seems that the StreamReader doesn't use the
Encoding.Default by default.

with kind greetings

Pascal Cloup
 
Pascal Cloup said:
Nevertheless, it seems that the StreamReader doesn't use the
Encoding.Default by default.

Indeed it does. As the docs for the constructor StreamReader(Stream)
say:

<quote>
This constructor initializes the encoding to UTF8Encoding
</quote>
 
I had a similar problem while reading a text file with
unicode and UTF8. Both did not return chars that look weird example ÿ.
Had to use Encoding.ASCII which read all chars but getting ????????? for
unkown ascii chars.
I did not want to loose the bufferRead position so stayed with Ascii.
 
GG said:
I had a similar problem while reading a text file with
unicode and UTF8. Both did not return chars that look weird example =3F.
Had to use Encoding.ASCII which read all chars but getting ????????? for
unkown ascii chars.
I did not want to loose the bufferRead position so stayed with Ascii.

That suggests you were using the wrong encoding then - perhaps you
should have used Encoding.Default instead? It's hard to know without
knowing what you were trying to read - but you should know what
encoding your file is in rather than guessing until something works.
 
Hi Pascal,

Thanks for your response!

Skeet said:
Using BinaryReader is a bad idea,
StreamReader is designed for exactly the job required.

Yes, I agree with it, StreamReader is designed for the txt file.

However at first time I try to read the accented characters with
StreamReader, I got null for them, and the BinaryReader actually retrieve
the correct characters, so I think the BinaryReadermay be better.

Today, I tested that program again, and found the StreamReader.Read() works
fine this time, I think I have missed something.


Thanks!

Best regards,

Gary Chang
Microsoft Online Partner Support

Get Secure! - www.microsoft.com/security
This posting is provided "AS IS" with no warranties, and confers no rights.
--------------------
 
Gary Chang said:
Skeet said:

Yes, I agree with it, StreamReader is designed for the txt file.

However at first time I try to read the accented characters with
StreamReader, I got null for them, and the BinaryReader actually retrieve
the correct characters, so I think the BinaryReadermay be better.

If BinaryReader was reading the correct characters, then you must have
been giving it the correct encoding, while giving StreamReader the
wrong encoding. BinaryReader isn't capable of guessing an encoding any
better than StreamReader is.
 
Back
Top