File.OpenText and accented characters

  • Thread starter Thread starter Pedro Jose
  • Start date Start date
P

Pedro Jose

Hi,

I am wondering why File.OpenText can't read accented
characters.
I am using it to read a text file with words
like "horário" and "não" but the when it finish reading my
variables have "horrio" and "no", the accented characters
disapear.

Why is that? And how can I fix it?

Is I use this combination it works
//Encoding fileEncoding = Encoding.Default;
//FileStream fsIn = new FileStream(infile, FileMode.Open,
FileAccess.Read, FileShare.Read);
//StreamReader fileReader = new StreamReader(fsIn,
fileEncoding, true);

But the File.OpenText is faster.

What is the fast method to read a big file in CF?

Thanks

Pedro Jose
 
Note that the documentation for File.OpenText indicates that it assumes that
the file was encoded as UTF-8. If you wrote the file as Unicode, this could
easily cause each accented character to be returned as two characters when
it's read as UTF-8.

Paul T.

Hi,

I am wondering why File.OpenText can't read accented
characters.
I am using it to read a text file with words
like "horário" and "não" but the when it finish reading my
variables have "horrio" and "no", the accented characters
disapear.

Why is that? And how can I fix it?

Is I use this combination it works
//Encoding fileEncoding = Encoding.Default;
//FileStream fsIn = new FileStream(infile, FileMode.Open,
FileAccess.Read, FileShare.Read);
//StreamReader fileReader = new StreamReader(fsIn,
fileEncoding, true);

But the File.OpenText is faster.

What is the fast method to read a big file in CF?

Thanks

Pedro Jose
 
Pedro Jose said:
I am wondering why File.OpenText can't read accented
characters.

It can - just not using the encoding you're using. File.OpenText uses
UTF-8, and it looks like your text is encoded in the default code page
for your system.
I am using it to read a text file with words
like "horário" and "não" but the when it finish reading my
variables have "horrio" and "no", the accented characters
disapear.

Why is that? And how can I fix it?

Is I use this combination it works
//Encoding fileEncoding = Encoding.Default;

There's no need for a separate variable here.
//FileStream fsIn = new FileStream(infile, FileMode.Open,
FileAccess.Read, FileShare.Read);
//StreamReader fileReader = new StreamReader(fsIn,
fileEncoding, true);

But the File.OpenText is faster.

How much faster? I wouldn't have thought it would make a significant
difference.
What is the fast method to read a big file in CF?

Using the code above, or just using:

StreamReader fileReader = new StreamReader (inFile, Encoding.Default);
 
-----Original Message-----


It can - just not using the encoding you're using. File.OpenText uses
UTF-8, and it looks like your text is encoded in the default code page
for your system.

Yes this was true, thank you both.
There's no need for a separate variable here.

This is correct, fix.
How much faster? I wouldn't have thought it would make a significant
difference.

The diference was 30 seg for File.OpenText method and 45
for the second, but I think the fault is because it didn't
read the accented characters and so it was faster.

The new method StreamReader fileReader = new StreamReader
(infile, Encoding.Default); for a a 152kb ANSI file is
55/60 seg.
 
Pedro Jose said:
The diference was 30 seg for File.OpenText method and 45
for the second, but I think the fault is because it didn't
read the accented characters and so it was faster.

Actually I'd expect it to be slower - it had to detect the invalid byte
sequences etc. Ah well...
The new method StreamReader fileReader = new StreamReader
(infile, Encoding.Default); for a a 152kb ANSI file is
55/60 seg.

That sounds pretty slow - I'm sure I've read files faster than that.
How exactly are you reading it after opening?
 
Pedro Jose said:
Line by line with ReadLine()

Hmm... should be okay. Could you post a short but complete program
which demonstrates the problem, and maybe mail me the file in question?
 
Back
Top