ASCII files

  • Thread starter Thread starter Alex Leduc
  • Start date Start date
A

Alex Leduc

I'm trying to load ASCII files that contain characters from the French
language in a way that is independant of whatever Locale the machine is
configured to use.

So If I have machine who's default Locale is "en-US" and I open some
french text like this:

[C# exaple that has the same behaviour in any .net languages]

StreamReader sr = new StreamReader("C:\\someFrenchFile.txt");
string strInput = sr.ReadToEnd();

Suppose the file contains this:
"Le Québec en été."
the characters that I get in strInput are:
"Le Qu?bec en ?t?."

If I change the default Locale in the Control Panel and use
Encoding.Default in the StreamReader's constructor parameters, I get the
right characters in strInput:
"Le Québec en été."

What I'd like to be able to do is load the french string with the right
characters regardless of what's the machine's default Locale. What's the
way to programmatically decide what Locale to use with all ASCII strings?

Alexandre Leduc
 
Your stream reader is missing something important for the second parameter
use System.Text.Encoding.ASCII
otherwise it should eb unicode i believe.

This should helo you
 
Alex Leduc said:
I'm trying to load ASCII files that contain characters from the French
language in a way that is independant of whatever Locale the machine is
configured to use. [snip]
What I'd like to be able to do is load the french string with the right
characters regardless of what's the machine's default Locale. What's the
way to programmatically decide what Locale to use with all ASCII strings?

If you know what's the code page of the file you can try to set
StreamReader's CurrentEncoding property to ASCIIEncoding with the CodePage
set to file's code page. [Warning! haven't tried it myself :)]

OTOH if you want to read arbitrary file in arbitrary language I'm afraid
it's not possible... (or, at least, I don't know the way...)
 
Dave said:
Your stream reader is missing something important for the second parameter
use System.Text.Encoding.ASCII
otherwise it should eb unicode i believe.

I forgot to mention that I've tried that and the result I get is:

"Le Qubec en t."

It removes all accentuated characters from the string.
 
Bruno said:
ASCII is a 7-bit codeset and it does not cover accentuated characters.

What you want is probably ISO-Latin1 also known as ISO-8859-1, which
contains the French accentuated characters. So, you should specify this
encoding when you open the StreamReader.

Bruno.

Could you tell me how to do that in code? I find the SDK documentation
on this topic to be a bit confusing.
 
Alex Leduc said:
I'm trying to load ASCII files that contain characters from the French
language in a way that is independant of whatever Locale the machine is
configured to use.

If it contains anything non-English (such as accented letters), it's not
ASCII.

What you have is some kind of extension of ASCII, and there are many such.
 
Try:

StreamReader sr = new StreamReader("C:\\someFrenchFile.txt",
System.Text.Encoding.GetEncoding("ISO-8859-1") );
string strInput = sr.ReadToEnd();
 
Your stream reader is missing something important for the second parameter
I forgot to mention that I've tried that and the result I get is:
"Le Qubec en t."
It removes all accentuated characters from the string.

Is it really ASCII (as in DOS / OEM), or is it ANSI (as in a regular
Windows file)??

If it's ANSI / Windows, try using System.Text.Encoding.Default. Works
for German umlauts for me :-)

Marc

================================================================
Marc Scheuner May The Source Be With You!
Bern, Switzerland m.scheuner(at)inova.ch
 
Yeah I think what I was talking about is ANSI. I never understood the
difference between the two so I assumed they were two different names
for the same thing.
 
Thanks a lot. That worked fine.

Now what I'd like to know is if there's a way to tell my application to
always use this encoding for whatever string related methods/types it
has to use.

Kind of like in C

char *loc = setlocale(LC_ALL, "French_Canada.1252");

which can set the appication's locale at a global scope.

Alexandre Leduc
 
Yeah I think what I was talking about is ANSI. I never understood the
difference between the two so I assumed they were two different names
for the same thing.

No, not really - the ASCII stuff is "old" DOS age thingies - the ASCII
character set is standardized up to ASCII 127 and country-specific
above that - it usually contains things like French accented
characters, German Umlauts (ö ä ü) and so forth, plus line drawing
characters and a few mathematical symbols.

ANSI is the Windows base character set, which tossed out the
line-drawing characters and math stuff, and added extra characters.

Marc
================================================================
Marc Scheuner May The Source Be With You!
Bern, Switzerland m.scheuner(at)inova.ch
 
Alex Leduc said:
Assuming you mean accented characters, that's impossible. ASCII doesn't
contain any accented characters.

8-bit ASCII (e.g. codepage 850) does contain accented chars and German
umlauts etc - ASCII doesn't always stop at 7 bit, you know! There's a
whole wide world outside of English-speaking 7 bits! :-)

Marc
================================================================
Marc Scheuner May The Source Be With You!
Bern, Switzerland m.scheuner(at)inova.ch
 
Back
Top