LoadXml() problem - invalid characters

  • Thread starter Thread starter Ken
  • Start date Start date
K

Ken

When calling LoadXml(), I get an exception with the
message, "'', hexadecimal value 0x1B, is an invalid
character. Line 4, position 245." Apparently my Xml
document has an invalid character in it. I can write code
to strip it out, but my question is what characters
are "valid" in an Xml document?

I have seen the above error for 0x1B and 0x0F, so I know
there are at least two invalid characters. Are there
others?
 
: When calling LoadXml(), I get an exception with the
: message, "'', hexadecimal value 0x1B, is an invalid
: character. Line 4, position 245." Apparently my Xml
: document has an invalid character in it. I can write code
: to strip it out, but my question is what characters
: are "valid" in an Xml document?
:
: I have seen the above error for 0x1B and 0x0F, so I know
: there are at least two invalid characters. Are there
: others?

As with many XML questions, the XML recommendation is the best place to look
here. See http://www.w3.org/TR/REC-xml#charsets:

[2] Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] |
[#xE000-#xFFFD] | [#x10000-#x10FFFF] /* any Unicode character, excluding the
surrogate blocks, FFFE, and FFFF. */


Note that even some excellent XML tools -- XMLSpy, for instance -- don't
respect this little piece of the XML recommendation. (I don't know of any
XML editor that warns you if you create a name that begins with "xml",
either.)

Also, just to be an XML geek for a moment, "valid" has a very specific
meaning in the context of XML -- you're better off using "legal" when you're
talking about whether or not a character is allowed.

Hope this helps.

Bob Rossney
(e-mail address removed)
 
Hello Ken,

Thanks for posting in the group.

Robert is right. We can refer to XML spec online to check it.

By the way, if you need to represent invalid (arbitrary) characters in XML,
the most common solution is to base-64 encode the data, which will make it
valid XML. There are already methods in .Net to do this encoding for you.

Hope that helps.

Best regards,
Yanhong Huang
Microsoft Online Partner Support

Get Secure! - www.microsoft.com/security
This posting is provided "AS IS" with no warranties, and confers no rights.

--------------------
!Content-Class: urn:content-classes:message
!From: "Ken" <[email protected]>
!Sender: "Ken" <[email protected]>
!Subject: LoadXml() problem - invalid characters
!Date: Wed, 3 Sep 2003 13:12:00 -0700
!Lines: 10
!Message-ID: <[email protected]>
!MIME-Version: 1.0
!Content-Type: text/plain;
! charset="iso-8859-1"
!Content-Transfer-Encoding: quoted-printable
!X-Newsreader: Microsoft CDO for Windows 2000
!X-MimeOLE: Produced By Microsoft MimeOLE V5.50.4910.0300
!Thread-Index: AcNyV6HvFCCWSVi8Sn6nv23iJjCxSA==
!Newsgroups: microsoft.public.dotnet.framework
!Path: cpmsftngxa06.phx.gbl
!Xref: cpmsftngxa06.phx.gbl microsoft.public.dotnet.framework:52884
!NNTP-Posting-Host: TK2MSFTNGXA08 10.40.1.160
!X-Tomcat-NG: microsoft.public.dotnet.framework
!
!When calling LoadXml(), I get an exception with the
!message, "'', hexadecimal value 0x1B, is an invalid
!character. Line 4, position 245." Apparently my Xml
!document has an invalid character in it. I can write code
!to strip it out, but my question is what characters
!are "valid" in an Xml document?
!I have seen the above error for 0x1B and 0x0F, so I know
!there are at least two invalid characters. Are there
!others?
!
 
Hi Ken,

I'm with the same problem.

How can i make base-64 encode in .NET?

Thanks,
Rafael
 
RTFM. How about the obvious Convert.ToBase64String and
Convert.ToBase64CharArray?

Jerry

Hi Ken,

I'm with the same problem.

How can i make base-64 encode in .NET?

Thanks,
Rafael
 
Thanks very much for the help - all of you!

I don't actually need to represent these characters. I
think they are superfluous and sneaking into the document
and I need to filter them out so I can load the document
into the DOM.
 
Hello Ken,

If so, I think we need to develop a program/function to streamly read XML
data byte by byte and then extract out invalid characters. After that, we
could save it back to file. Then it could be opened by LoadXML.

Hope that helps.

Best regards,
Yanhong Huang
Microsoft Online Partner Support

Get Secure! - www.microsoft.com/security
This posting is provided "AS IS" with no warranties, and confers no rights.

--------------------
!Content-Class: urn:content-classes:message
!From: "Ken" <[email protected]>
!Sender: "Ken" <[email protected]>
!References: <[email protected]>
!Subject: LoadXml() problem - invalid characters
!Date: Thu, 4 Sep 2003 06:02:31 -0700
!Lines: 20
!Message-ID: <[email protected]>
!MIME-Version: 1.0
!Content-Type: text/plain;
! charset="iso-8859-1"
!Content-Transfer-Encoding: quoted-printable
!X-Newsreader: Microsoft CDO for Windows 2000
!X-MimeOLE: Produced By Microsoft MimeOLE V5.50.4910.0300
!Thread-Index: AcNy5MzlUTeV0zt2QxK5k8jtzPQbyg==
!Newsgroups: microsoft.public.dotnet.framework
!Path: cpmsftngxa06.phx.gbl
!Xref: cpmsftngxa06.phx.gbl microsoft.public.dotnet.framework:52952
!NNTP-Posting-Host: TK2MSFTNGXA13 10.40.1.165
!X-Tomcat-NG: microsoft.public.dotnet.framework
!
!Thanks very much for the help - all of you!
!I don't actually need to represent these characters. I
!think they are superfluous and sneaking into the document
!and I need to filter them out so I can load the document
!into the DOM.
!>-----Original Message-----
!>When calling LoadXml(), I get an exception with the
!>message, "'', hexadecimal value 0x1B, is an invalid
!>character. Line 4, position 245." Apparently my Xml
!>document has an invalid character in it. I can write code
!>to strip it out, but my question is what characters
!>are "valid" in an Xml document?
!>
!>I have seen the above error for 0x1B and 0x0F, so I know
!>there are at least two invalid characters. Are there
!>others?
!>.
!>
!
 
Back
Top