XML file and UTF8

  • Thread starter Thread starter Peter Holschbach
  • Start date Start date
P

Peter Holschbach

Hi,

I have a UTF8 coded XML file, where I have to translate some text and save
it with another file name. The result shall be UTF8 coded.
Here what I did:

XmlDocument xmlDoc = new XmlDocument();
xmlDoc.Load("xml_input.xml");
.....
XmlNode node = ..... // find the right node
node.InnerText = "°C"; // this is the short story, the translated text
is read out of an excel file
.....
StreamWriter sw = new StreamWriter("xml_output.xml", false,Encoding.UTF8);
xmlDoc.Save(sw);
sw.Close();

The result is a XML file with the "°" symbol as a 3 byte ANSI coded value
(EF BF BD) not like in the original file coded as a 2 byte UTF8 value (C2
B0).

What can I do to store the XML file in UTF8 ?
 
Peter said:
I have a UTF8 coded XML file, where I have to translate some text and
save it with another file name. The result shall be UTF8 coded.
Here what I did:

XmlDocument xmlDoc = new XmlDocument();
xmlDoc.Load("xml_input.xml");
....
XmlNode node = ..... // find the right node
node.InnerText = "°C"; // this is the short story, the translated
text is read out of an excel file

If the original XML is UTF-8 encoded, why don't you simply call
xmlDoc.Save("xml_output.xml");
? That way the encoding is certainly not changed.
StreamWriter sw = new StreamWriter("xml_output.xml", false,Encoding.UTF8);
xmlDoc.Save(sw);
sw.Close();

That should also save as UTF-8.
The result is a XML file with the "°" symbol as a 3 byte ANSI coded
value (EF BF BD) not like in the original file coded as a 2 byte UTF8
value (C2 B0).

ANSI coded? Which ANSI code page would encode "°" with those three bytes?
Are you sure when you set the InnerText that you insert the character
'°'? Maybe when you read from Excel somehow decoding already does not do
what you want.
 
Hi,

I have a UTF8 coded XML file, where I have to translate some text and save
it with another file name. The result shall be UTF8 coded.
Here what I did:

XmlDocument xmlDoc = new XmlDocument();
xmlDoc.Load("xml_input.xml");
....
XmlNode node = .....         // find the right node
node.InnerText = "°C";      // this is the short story, the translated text
is read out of an excel file
....
StreamWriter sw = new StreamWriter("xml_output.xml", false,Encoding.UTF8);
xmlDoc.Save(sw);
sw.Close();

The result is a XML file with the "°" symbol as a 3 byte ANSI coded value
(EF BF BD) not like in the original file coded as a 2 byte UTF8 value (C2
B0).

What can I do to store the XML file in UTF8 ?

xmlDoc.Save("xml_output.xml");

The default is UTF8
 
Hi Martin,
If the original XML is UTF-8 encoded, why don't you simply call
xmlDoc.Save("xml_output.xml");
? That way the encoding is certainly not changed.

That is what I did first :-). Same result as doing it in this way.
That should also save as UTF-8.

Yes, this is what I have expected.
ANSI coded? Which ANSI code page would encode "°" with those three bytes?
Are you sure when you set the InnerText that you insert the character '°'?
Maybe when you read from Excel somehow decoding already does not do what
you want.

For sure the text in Excel is not in UTF8.
And I have try it with the example code too (using the string "°"). As far
as I understood the string in C# is not coded in UTF8.

thx
Peter
 
Peter Holschbach skrev:
Hi,
....

The result is a XML file with the "°" symbol as a 3 byte ANSI coded
value (EF BF BD) not like in the original file coded as a 2 byte UTF8
value (C2 B0).
Isn't "EF BF BD", Unicode Character 'REPLACEMENT CHARACTER' U+FFFD
(displayed as a question mark in a black diamond) in some systems used
to tell that it could not decode a multibyte-encoded text correctly or
the text is damaged?

It sound's more that is is your tool to check the result that is wrong
or configured to expect another encoding.
 
Back
Top