Odd characters being written weird

J

Jon Davis

I have a problem with encoding, I think (??). I have the following code:

StreamWriter sw = File.CreateText(l);
sw.Write(data);
sw.Close();

Now, "data" is a string that contains text, but sometimes contains special characters, like "§".

When I upload this file to a web server, what I get back is a preceding junk character. For instance, for the character "§", when I browse the file in IE I get "§".

If I open the file in Notepad, the character is "§", but when I view it in a command prompt with the "type" command, I get "┬º", whereas if I type "§" in notepad and save it the command prompt sees it as "º", so the junk character before in the command prompt is "┬".

What is going on and how do I fix this?

Jon
 
J

Jon Davis

Now, "data" is a string that contains text, but sometimes contains special characters, like "§".

I should also note that data's value comes from an XmlNode's InnerText property.

Any help would be appreciated.

Jon
I have a problem with encoding, I think (??). I have the following code:

StreamWriter sw = File.CreateText(l);
sw.Write(data);
sw.Close();

Now, "data" is a string that contains text, but sometimes contains special characters, like "§".

When I upload this file to a web server, what I get back is a preceding junk character. For instance, for the character "§", when I browse the file in IE I get "§".

If I open the file in Notepad, the character is "§", but when I view it in a command prompt with the "type" command, I get "┬º", whereas if I type "§" in notepad and save it the command prompt sees it as "º", so the junk character before in the command prompt is "┬".

What is going on and how do I fix this?

Jon
 
L

Lars Hansen

Hi

As Jon Skeet points out, you are likely to have an encoding problem.

Your code is:

StreamWriter sw = File.CreateText(l);
sw.Write(data);
sw.Close();

Instead try:

StreamWriter sw = new StreamWriter(I, System.Text.Encoding.UTF8); //or
whatever encoding you want to use.
sw.Write(data);
sw.Close();

When you create the sw object this way, you will be able to control the
encoding used.

But it is also possible, that error comes from the way that you read the
XML. You need to read the XML with the right encoding (set it explicitly the
same way as you created the sw).

Hope it helps.

Lars
 
J

Jon Davis

Thanks Lars!!

Jon


Lars Hansen said:
Sorry, its like this instead: (second param false or true as you need):

StreamWriter sw = new StreamWriter(I, false, System.Text.Encoding.UTF8);

Lars
 
J

Jon Davis

Well crap, Unicode "fixes" the problem, but only as a workaround. On another
web app, an ASP classic app, now I get:

Active Server Pages error 'ASP 0239'
Cannot process file
/blog/default.htm, line 1
UNICODE ASP files are not supported.


So apparently Unicode files are the exception and not the norm.
Unacceptable, then... So how on earth do I fix this? UTF8 encoding for the
file output didn't do a thing for me, as I get those junk characters when I
do.

FYI, the encoding of the XML file is UTF-8.

I hope, Lars, that you are still following this thread, otherwise I need to
start another one.

Jon
 
J

Jon Skeet

Jon Davis said:
Well crap, Unicode "fixes" the problem, but only as a workaround. On another
web app, an ASP classic app, now I get:

Active Server Pages error 'ASP 0239'
Cannot process file
/blog/default.htm, line 1
UNICODE ASP files are not supported.


So apparently Unicode files are the exception and not the norm.
Unacceptable, then... So how on earth do I fix this? UTF8 encoding for the
file output didn't do a thing for me, as I get those junk characters when I
do.

FYI, the encoding of the XML file is UTF-8.

I hope, Lars, that you are still following this thread, otherwise I need to
start another one.

The thing to do is work out when encoding you actually *do* want. I
would suggest that if the files are going to be web pages, that you
actually stick to ASCII for the contents, using ሴ type entities
for non-ASCII characters.
 
J

Jon Davis

Go figure. This works. Now I'm ticked ...

XmlDocument xDoc = new XmlDocument();
xDoc.LoadXml("<?xml version=\"1.0\" encoding=\"UTF-8\"?>\r\n"
+ "<test />");
XmlNode testNode = xDoc.SelectSingleNode("/test");
testNode.InnerText = "§";
StreamWriter sw = new StreamWriter(@"C:\__test.htm", false,
Encoding.UTF8);
sw.Write(testNode.InnerText);
sw.Close();
StreamReader sr = File.OpenText(@"C:\__test.htm");
string s = sr.ReadToEnd();
sr.Close();
//File.Delete(@"C:\__test.htm");
MessageBox.Show(s);


Jon
 
J

Jon Davis

K right now looking at the uploaded file as a flat file from within IE it
looks fine. This file is embedded as an <!--#include...--> in an ASP classic
page with no specified encoding, and then I see the junk characters. So the
problem seems to be either:

* ASP classic's #INCLUDE feature not handing UTF-8 files properly
* ASP classic's dispensation of UTF-8 encoded files, or
* IE's presentation of a downloaded HTML resource with a UTF-8 feature.

I will have to move this to an ASP Classic or IE newsgroup. Alternatively, I
can change the chars to &#[###]; but I really don't want to do that as that
is changing the content which may have unintended repercussions to the user
of my software (weblogging / blogging software).

Jon
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top