Odd characters being written weird

  • Thread starter Thread starter Jon Davis
  • Start date Start date
J

Jon Davis

I have a problem with encoding, I think (??). I have the following code:

StreamWriter sw = File.CreateText(l);
sw.Write(data);
sw.Close();

Now, "data" is a string that contains text, but sometimes contains special characters, like "§".

When I upload this file to a web server, what I get back is a preceding junk character. For instance, for the character "§", when I browse the file in IE I get "§".

If I open the file in Notepad, the character is "§", but when I view it in a command prompt with the "type" command, I get "┬º", whereas if I type "§" in notepad and save it the command prompt sees it as "º", so the junk character before in the command prompt is "┬".

What is going on and how do I fix this?

Jon
 
Now, "data" is a string that contains text, but sometimes contains special characters, like "§".

I should also note that data's value comes from an XmlNode's InnerText property.

Any help would be appreciated.

Jon
I have a problem with encoding, I think (??). I have the following code:

StreamWriter sw = File.CreateText(l);
sw.Write(data);
sw.Close();

Now, "data" is a string that contains text, but sometimes contains special characters, like "§".

When I upload this file to a web server, what I get back is a preceding junk character. For instance, for the character "§", when I browse the file in IE I get "§".

If I open the file in Notepad, the character is "§", but when I view it in a command prompt with the "type" command, I get "┬º", whereas if I type "§" in notepad and save it the command prompt sees it as "º", so the junk character before in the command prompt is "┬".

What is going on and how do I fix this?

Jon
 
Hi

As Jon Skeet points out, you are likely to have an encoding problem.

Your code is:

StreamWriter sw = File.CreateText(l);
sw.Write(data);
sw.Close();

Instead try:

StreamWriter sw = new StreamWriter(I, System.Text.Encoding.UTF8); //or
whatever encoding you want to use.
sw.Write(data);
sw.Close();

When you create the sw object this way, you will be able to control the
encoding used.

But it is also possible, that error comes from the way that you read the
XML. You need to read the XML with the right encoding (set it explicitly the
same way as you created the sw).

Hope it helps.

Lars
 
Thanks Lars!!

Jon


Lars Hansen said:
Sorry, its like this instead: (second param false or true as you need):

StreamWriter sw = new StreamWriter(I, false, System.Text.Encoding.UTF8);

Lars
 
Well crap, Unicode "fixes" the problem, but only as a workaround. On another
web app, an ASP classic app, now I get:

Active Server Pages error 'ASP 0239'
Cannot process file
/blog/default.htm, line 1
UNICODE ASP files are not supported.


So apparently Unicode files are the exception and not the norm.
Unacceptable, then... So how on earth do I fix this? UTF8 encoding for the
file output didn't do a thing for me, as I get those junk characters when I
do.

FYI, the encoding of the XML file is UTF-8.

I hope, Lars, that you are still following this thread, otherwise I need to
start another one.

Jon
 
Jon Davis said:
Well crap, Unicode "fixes" the problem, but only as a workaround. On another
web app, an ASP classic app, now I get:

Active Server Pages error 'ASP 0239'
Cannot process file
/blog/default.htm, line 1
UNICODE ASP files are not supported.


So apparently Unicode files are the exception and not the norm.
Unacceptable, then... So how on earth do I fix this? UTF8 encoding for the
file output didn't do a thing for me, as I get those junk characters when I
do.

FYI, the encoding of the XML file is UTF-8.

I hope, Lars, that you are still following this thread, otherwise I need to
start another one.

The thing to do is work out when encoding you actually *do* want. I
would suggest that if the files are going to be web pages, that you
actually stick to ASCII for the contents, using ሴ type entities
for non-ASCII characters.
 
Go figure. This works. Now I'm ticked ...

XmlDocument xDoc = new XmlDocument();
xDoc.LoadXml("<?xml version=\"1.0\" encoding=\"UTF-8\"?>\r\n"
+ "<test />");
XmlNode testNode = xDoc.SelectSingleNode("/test");
testNode.InnerText = "§";
StreamWriter sw = new StreamWriter(@"C:\__test.htm", false,
Encoding.UTF8);
sw.Write(testNode.InnerText);
sw.Close();
StreamReader sr = File.OpenText(@"C:\__test.htm");
string s = sr.ReadToEnd();
sr.Close();
//File.Delete(@"C:\__test.htm");
MessageBox.Show(s);


Jon
 
K right now looking at the uploaded file as a flat file from within IE it
looks fine. This file is embedded as an <!--#include...--> in an ASP classic
page with no specified encoding, and then I see the junk characters. So the
problem seems to be either:

* ASP classic's #INCLUDE feature not handing UTF-8 files properly
* ASP classic's dispensation of UTF-8 encoded files, or
* IE's presentation of a downloaded HTML resource with a UTF-8 feature.

I will have to move this to an ASP Classic or IE newsgroup. Alternatively, I
can change the chars to &#[###]; but I really don't want to do that as that
is changing the content which may have unintended repercussions to the user
of my software (weblogging / blogging software).

Jon
 
Back
Top