StreamReader mutilates '+'characters in UTF-7 files

  • Thread starter Thread starter Hans
  • Start date Start date
H

Hans

Hi,

I need to process files that are created in UTF-7 format.
This works fine upto the point where a '+' character
(0x2B/43) appears in the line. The string is mutilated...

The reader appears to have a bug - or am I doing
something wrong here???
The code:
StreamReader Reader = new StreamReader
(@"MyLocalFile.txt", System.Text.Encoding.UTF7);
string sLine;
while((sLine = Reader.ReadLine()) != null)
{
// Process the line
}

The text:
#(6+sections)

Can anybody give me a clue to what is happening here?

Thanks, Hans
 
Hans said:
I need to process files that are created in UTF-7 format.

This works fine upto the point where a '+' character
(0x2B/43) appears in the line. The string is mutilated...

The reader appears to have a bug - or am I doing
something wrong here???
The code:
StreamReader Reader = new StreamReader
(@"MyLocalFile.txt", System.Text.Encoding.UTF7);
string sLine;
while((sLine = Reader.ReadLine()) != null)
{
// Process the line
}

The text:
#(6+sections)

Can anybody give me a clue to what is happening here?

Are you absolutely sure it's UTF-7? In UTF-7, the "+" character
signifies a shift into a modified Base64 mode. See
http://www.faqs.org/rfcs/rfc2152.html for more details.

Where do you get this text file from? UTF-7 is not a very common
character encoding at all - UTF-8 is rather more likely.
 
Every older piece of windows software as well as the unix
world produce UTF-7 type files. And that's were the file
comes from.

Please explain why notepad/wordpad/visual studio/... CAN
display the contents of the file without a problem!

Regards,Hans
 
I just found out that
StreamReader Reader = new StreamReader
(@"MyLocalFile.txt", System.Text.Encoding.Default);

produces the desired result.

One of my books, however, says that "using the Default
property is discouraged".

Can anybody tell me why???

Thanks, Hans
 
I just found out that
StreamReader Reader = new StreamReader
(@"MyLocalFile.txt", System.Text.Encoding.Default);

produces the desired result.

In which case, as I suspected, it *wasn't* UTF-7.
One of my books, however, says that "using the Default
property is discouraged".

Can anybody tell me why???

It means that only people with the same default will get the same
results - and the default will depend on things like operating system,
regional settings etc.
 
Hans said:
Every older piece of windows software as well as the unix
world produce UTF-7 type files. And that's were the file
comes from.

Please explain why notepad/wordpad/visual studio/... CAN
display the contents of the file without a problem!

I don't think UTF-7 means what you think it means. UTF-7 is a way of
encoding Unicode characters within ASCII files.
 
Sun an Notepad (at least by default) produce ANSI encoded
files. 'System.Text.Encoding.Default' (like GetACP())
encodes according to...... the system's current ANSI code
page.

If there's another way to get ANSI-encoding do tell me!!!

Hans.
 
Hans said:
Sun an Notepad (at least by default) produce ANSI encoded
files.

'System.Text.Encoding.Default' (like GetACP())
encodes according to...... the system's current ANSI code
page.

If there's another way to get ANSI-encoding do tell me!!!

It's a case of *which* ANSI-encoding to use though. If you always use
the default one for the computer, it means that if you transfer files
to/from another computer with a different default, you're in trouble.
If you let the user specify the encoding (using Encoding.Default as the
default, but not relying on it) you give a lot more flexibility - and
if you also give the option of reading/writing in UTF-8, you end up
with the full flexibility of Unicode in a fairly compact form.

(Certainly if you don't need an older tool to understand the file
you're writing, I'd go with UTF-8 virtually every time.)
 
Back
Top