c# - Encoding in 8 bits!!

  • Thread starter Thread starter Duncan M
  • Start date Start date
D

Duncan M

Hi all,

I have seen a similar post to this in the past but no resolution. I
will explain fully my problem:

I am writing a text editor that will be used in several regions around
the world but my testing will be done in Turkey (I am in GB). The
output of this editor will be used in a DOS enviroment so any hint of
it outputting unicode is out of the question.

Before writing the application I requested a file from a turkish
client to be generated in notepad with all the letters of the alphabet
and any special Turkish characters. I then generated a screen font for
the DOS enviroment which mapped the special characters values (all
above 128) to their graphical representations. So far good.

I then moved on the create the application. Origonally for IO I used a
pair of StreamReaders / Writers with encoding set to Encoding.ASCII.
Obviously this scheme only appreciates ASCII values within the 7 bit
range - noo good as 129+ spilled over into 2 bytes.

I next tried Encoding.Default, this behaves very strangley – it saves
single byte values on my machine and 2 byte values on the Turkish
machine. Still not good enough then.

I am desperate to find a solution to this, I would simply like to
output all charcters in the 8 bit character mapping scheme that is
used by NOTEPAD!! Surely this is easy. I know I can get to the ANSI
codepage as follows:

TextInfo ti = CultureInfo.CurrentCulture.TextInfo;
ti.ANSICodePage;

But what now!!!

I would appreciate any advice anybody can give me.
 
Duncan M said:
I have seen a similar post to this in the past but no resolution. I
will explain fully my problem:

I am writing a text editor that will be used in several regions around
the world but my testing will be done in Turkey (I am in GB). The
output of this editor will be used in a DOS enviroment so any hint of
it outputting unicode is out of the question.

Before writing the application I requested a file from a turkish
client to be generated in notepad with all the letters of the alphabet
and any special Turkish characters. I then generated a screen font for
the DOS enviroment which mapped the special characters values (all
above 128) to their graphical representations. So far good.

I then moved on the create the application. Origonally for IO I used a
pair of StreamReaders / Writers with encoding set to Encoding.ASCII.
Obviously this scheme only appreciates ASCII values within the 7 bit
range - noo good as 129+ spilled over into 2 bytes.

I would expect Unicode 128+ to come out as rubbish using an ASCII
encoding, but still a single byte - probably (unicodeValue & 0x7f).
I next tried Encoding.Default, this behaves very strangley =3F it saves
single byte values on my machine and 2 byte values on the Turkish
machine. Still not good enough then.

Encoding.Default uses whatever the system default encoding is - I
suspect the Turkish machine has a different default encoding,
presumably a multibyte one.
I am desperate to find a solution to this, I would simply like to
output all charcters in the 8 bit character mapping scheme that is
used by NOTEPAD!!

Used by Notepad on which machine though? It will vary...
Surely this is easy. I know I can get to the ANSI
codepage as follows:

TextInfo ti = CultureInfo.CurrentCulture.TextInfo;
ti.ANSICodePage;

But what now!!!

I would appreciate any advice anybody can give me.

I would suggest finding out *exactly* what encoding you're really after
(not just using Encoding.Default) and specify that for your
StreamWriter.
 
Cor said:
You can prevent a lot of work.

dotNet programs are not running on DOS.

The OP never suggested they would be - just that the *output* of the
..NET program would be used in DOS.
 
Hi Duncan,

Jon pointed me that I understand your question wrong, and I think he can be
right.

But to mix up for your both ASCII is a 7 bit value. That is not used on a
Dos computer.

On a Dos computer is it as far as I know UTF8

http://msdn.microsoft.com/library/d...tml/frlrfsystemtextutf8encodingclasstopic.asp

I think that you need for that the right code scheme for codes above 127,
(mostly used in Europe en US are as far as I remember me 850 and 437).

I do not know how you can use that but maybe you can see it in the class
information for which I have given a link above.

Cor
 
Cor said:
Jon pointed me that I understand your question wrong, and I think he can be
right.

But to mix up for your both ASCII is a 7 bit value. That is not used on a
Dos computer.

Well, all the encodings I've seen used in DOS as ASCII-*compatible*,
i.e. they're "extensions" of ASCII. Given the encodings problem, it's
also often safest just to restrict yourself to ASCII if you can :)
On a Dos computer is it as far as I know UTF8

Nope, it's an individual code page, usually (as you say) 850 and 437.
Usually single byte encodings though, as far as I've seen.
 
Back
Top