Using Stream objects with encoding

  • Thread starter Gaia C via .NET 247
  • Start date
G

Gaia C via .NET 247

(Type your message here)
Hi,
I want to open a file to read from and then edit and write the text to another file.
I am using StreamReader and StreamWriter for this purpose.
The problem is that i don't know the encoding of the file, because it can be saved at any encoding.
When using the stream objects it is using the UTF-8 encoding, and therefore after editting the file, some characters are omitted.
How can i know the file encoding? or any other solution?

The code i am using:

//Reading from file
StreamReader sReader = new StreamReader(path);
string str = sReader.ReadToEnd();

[..... edditing the string .....]

//writing to file
StreamWriter sStr = new StreamWriter(newpath);
sStr.Write(str);

Regards,
Gaia.
 
M

Morten Wennevik

Hi Gaia,

Unless you know the encoding the file was written in you have to try likely code pages.
The StreamReader assumes UTF-8 by default, but files are written as 8-bit text by default so any extended ascii characters would be lost.

In my case, using æ ø å in any file and reading it as utf-8 would only show the regular ascii characters. To be able to read these characters properly I would have to use something like this:

StreamReader sr = new StreamReader("D:\\test.txt", Encoding.GetEncoding("iso-8859-15"));

Use the most likely encoding in your area.
 
J

Jon Skeet [C# MVP]

Morten Wennevik said:
Unless you know the encoding the file was written in you have to try
likely code pages.

Indeed - and be aware that you are guessing and could easily be wrong.
IMO it's rarely a good idea to read a text file without knowing the
encoding.
The StreamReader assumes UTF-8 by default, but files are written as
8-bit text by default so any extended ascii characters would be lost.

What do you mean by "files are written as 8-bit text by default"? It
entirely depends on what's being used to write the files. If you use
StreamWriter, it'll use UTF-8 by default.
In my case, using æ ø å in any file and reading it as utf-8 would
only show the regular ascii characters. To be able to read these
characters properly I would have to use something like this:

StreamReader sr = new StreamReader("D:\\test.txt",
Encoding.GetEncoding("iso-8859-15"));

Use the most likely encoding in your area.

Or use Encoding.Default, which is the system's current ANSI code page.
 
M

Morten Wennevik

Indeed - and be aware that you are guessing and could easily be wrong.
IMO it's rarely a good idea to read a text file without knowing the
encoding.

What do you mean by "files are written as 8-bit text by default"? It
entirely depends on what's being used to write the files. If you use
StreamWriter, it'll use UTF-8 by default.

I was thinking of pure text files created by simple programs like Edit, Notepad or similar, using the code pages of the current system. A bad case of thinking specific scenarios, but writing it vaguely :(
Or use Encoding.Default, which is the system's current ANSI code page.

That would probably be the most likely encoding :)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top