Incomplete Escaping Functionality??

  • Thread starter Thread starter Arthur Dent
  • Start date Start date
A

Arthur Dent

Hello All...

I am in an app that needs to write out an XML document for transmittal to an
outside organization. All good and fine... I create the XmlDocument object,
append all my nodes, and values etc etc... and it all works.

Now I go to save the file... I tried two methods...
MyXmlDocument.Save(filename) and
My.Computer.FileSystem.WriteAllText(filename, MyXmlDoc.OuterXml, False)

The problem comes in with XmlDocument.OuterXml. According to XML, there are
5 characters which need to be escaped... Ampersand, LessThan, GreaterThan,
Apostrophe and DoubleQuote. XmlDocument.OuterXml, escapes only three of
these ( & , < and > ). Apostrophe and DoubleQuote do not get escaped. This
is a problem, because the third party we need to deal with *must* have them
escaped, even inside of a Nodes InnerText.
So I figured okay, I'll just escape them myself, but when I try to do that,
it winds up escaping my Ampersand (for example in "&quot;" ), so that it
winds up saving "&amp;quot;".

How in the world can I tell it that it needs to escape ALL FIVE CHARACTERS?
Thanks in advance,
- Arthur Dent.
 
Arthur Dent said:
Hello All...

I am in an app that needs to write out an XML document for transmittal to
an outside organization. All good and fine... I create the XmlDocument
object, append all my nodes, and values etc etc... and it all works.

Now I go to save the file... I tried two methods...
MyXmlDocument.Save(filename) and
My.Computer.FileSystem.WriteAllText(filename, MyXmlDoc.OuterXml, False)

The problem comes in with XmlDocument.OuterXml. According to XML, there
are 5 characters which need to be escaped... Ampersand, LessThan,
GreaterThan, Apostrophe and DoubleQuote. XmlDocument.OuterXml, escapes
only three of these ( & , < and > ). Apostrophe and DoubleQuote do not get
escaped. This is a problem, because the third party we need to deal with
*must* have them escaped, even inside of a Nodes InnerText.
So I figured okay, I'll just escape them myself, but when I try to do
that, it winds up escaping my Ampersand (for example in "&quot;" ), so
that it winds up saving "&amp;quot;".


Exactly what are you trying to escape? Do you have these characters within
text nodes? If so, you need to escape them when you create the text node.

John
 
Arthur said:
I am in an app that needs to write out an XML document for transmittal
to an outside organization. All good and fine... I create the
XmlDocument object, append all my nodes, and values etc etc... and it
all works.

Now I go to save the file... I tried two methods...
MyXmlDocument.Save(filename) and
My.Computer.FileSystem.WriteAllText(filename, MyXmlDoc.OuterXml, False)

The problem comes in with XmlDocument.OuterXml. According to XML, there
are 5 characters which need to be escaped... Ampersand, LessThan,
GreaterThan, Apostrophe and DoubleQuote. XmlDocument.OuterXml, escapes
only three of these ( & , < and > ). Apostrophe and DoubleQuote do not
get escaped. This is a problem, because the third party we need to deal
with *must* have them escaped, even inside of a Nodes InnerText.

XML spec says this:

"The ampersand character (&) and the left angle bracket (<) MUST NOT
appear in their literal form, except when used as markup delimiters, or
within a comment, a processing instruction, or a CDATA section. If they
are needed elsewhere, they MUST be escaped using either numeric
character references or the strings "&amp;" and "&lt;" respectively. The
right angle bracket (>) may be represented using the string "&gt;", and
MUST, for compatibility, be escaped using either "&gt;" or a character
reference when it appears in the string "]]>" in content, when that
string is not marking the end of a CDATA section.

To allow attribute values to contain both single and double quotes, the
apostrophe or single-quote character (') may be represented as "&apos;",
and the double-quote character (") as "&quot;"."

So & and < MUST always be escaped, while >, ' and " only must be escaped
under certain circumstances, otherwise they MAY be escaped.

But actually you shouldn't care about XML syntax, XML takes care of it.
 
I have an XmlNode whose InnerText property contains DoubleQuote.
This causes problem with the 3rd party, because their software cannot handle
the doublequote in the innertext.
When I tried to manually escape it using "&quot;", the Xml parser escaped my
"&" on me, and saved it to the file as "&amp;quot;"... effectively making it
impossible for me to manually escape the doublequote.

Ultimately, I wound up adding the text inside of a CDATA section. This
worked for the 3rd party.
From looking around though, it looked online, like CDATA is a holdover, and
not the recommended way of doing things.




Oleg Tkachenko said:
Arthur said:
I am in an app that needs to write out an XML document for transmittal to
an outside organization. All good and fine... I create the XmlDocument
object, append all my nodes, and values etc etc... and it all works.

Now I go to save the file... I tried two methods...
MyXmlDocument.Save(filename) and
My.Computer.FileSystem.WriteAllText(filename, MyXmlDoc.OuterXml, False)

The problem comes in with XmlDocument.OuterXml. According to XML, there
are 5 characters which need to be escaped... Ampersand, LessThan,
GreaterThan, Apostrophe and DoubleQuote. XmlDocument.OuterXml, escapes
only three of these ( & , < and > ). Apostrophe and DoubleQuote do not
get escaped. This is a problem, because the third party we need to deal
with *must* have them escaped, even inside of a Nodes InnerText.

XML spec says this:

"The ampersand character (&) and the left angle bracket (<) MUST NOT
appear in their literal form, except when used as markup delimiters, or
within a comment, a processing instruction, or a CDATA section. If they
are needed elsewhere, they MUST be escaped using either numeric character
references or the strings "&amp;" and "&lt;" respectively. The right angle
bracket (>) may be represented using the string "&gt;", and MUST, for
compatibility, be escaped using either "&gt;" or a character reference
when it appears in the string "]]>" in content, when that string is not
marking the end of a CDATA section.

To allow attribute values to contain both single and double quotes, the
apostrophe or single-quote character (') may be represented as "&apos;",
and the double-quote character (") as "&quot;"."

So & and < MUST always be escaped, while >, ' and " only must be escaped
under certain circumstances, otherwise they MAY be escaped.

But actually you shouldn't care about XML syntax, XML takes care of it.
 
Back
Top