XmlTextReader.ReadString doesn't parse \x0095 character correctly

  • Thread starter Thread starter Guest
  • Start date Start date
G

Guest

I have an xml file that, for example, contains the following element: -

<data>\x0095 blah1 \x0095 blah2 \x0095 blah3 \x0095 blah4</data>

If I use XmlTextReader.ReadString() to read this data into a string, the \x0095 are interpreted literally.

However, the following code works fine: -

string a = "\x0095 blah1 \x0095 blah2 \x0095 blah3 \x0095 blah4";

Can anyone please explain what I'm doing wrong? The point of all this is that \x0095 is a delimiter and I want to use the Split function to break the string up into an array, like this: -

string[] b = a.Split(new char[] {'\x0095'});

Thanks in advance.

David.
 
I have an xml file that, for example, contains the following element: -

<data>\x0095 blah1 \x0095 blah2 \x0095 blah3 \x0095 blah4</data>

If I use XmlTextReader.ReadString() to read this data into a string,
the \x0095 are interpreted literally.

However, the following code works fine: -

string a = "\x0095 blah1 \x0095 blah2 \x0095 blah3 \x0095 blah4";

Can anyone please explain what I'm doing wrong?

Sure - you're assuming that C# source code escape sequences and XML
escape sequences are the same. They're not.

If you want to include Unicode character 0x95 in your XML, use •
 
the "\x" hex designator is C# syntax, not XML syntax.

To place Hex 95 into XML, you will need to use an XML code.
•

However, I don't know if that character is valid in XML. You may need to
encode the entire string in Base64.

--- Nick

"David@[email protected]"
I have an xml file that, for example, contains the following element: -

<data>\x0095 blah1 \x0095 blah2 \x0095 blah3 \x0095 blah4</data>

If I use XmlTextReader.ReadString() to read this data into a string, the
\x0095 are interpreted literally.
However, the following code works fine: -

string a = "\x0095 blah1 \x0095 blah2 \x0095 blah3 \x0095 blah4";

Can anyone please explain what I'm doing wrong? The point of all this is
that \x0095 is a delimiter and I want to use the Split function to break the
string up into an array, like this: -
string[] b = a.Split(new char[] {'\x0095'});

Thanks in advance.

David.
 
Thanks Jon/Nick - that's very helpful. Unfortunately the xml file is being passed to me from another system, so I can't control what's in it. All I want to do is break this string into it's designated parts. Unless either of you have any other suggestions, I'll probably look at Regex.Split to see if that will allow me to split on a multi-character delimiter (although I've a feeling the backslash might complicate matters...).

Thanks again.

David.

Nick Malik said:
the "\x" hex designator is C# syntax, not XML syntax.

To place Hex 95 into XML, you will need to use an XML code.
•

However, I don't know if that character is valid in XML. You may need to
encode the entire string in Base64.

--- Nick

"David@[email protected]"
I have an xml file that, for example, contains the following element: -

<data>\x0095 blah1 \x0095 blah2 \x0095 blah3 \x0095 blah4</data>

If I use XmlTextReader.ReadString() to read this data into a string, the
\x0095 are interpreted literally.
However, the following code works fine: -

string a = "\x0095 blah1 \x0095 blah2 \x0095 blah3 \x0095 blah4";

Can anyone please explain what I'm doing wrong? The point of all this is
that \x0095 is a delimiter and I want to use the Split function to break the
string up into an array, like this: -
string[] b = a.Split(new char[] {'\x0095'});

Thanks in advance.

David.
 
David said:
Thanks Jon/Nick - that's very helpful. Unfortunately the xml file is
being passed to me from another system, so I can't control what's in
it. All I want to do is break this string into it's designated parts.
Unless either of you have any other suggestions, I'll probably look
at Regex.Split to see if that will allow me to split on a
multi-character delimiter (although I've a feeling the backslash
might complicate matters...).

Backslash will complicate it, but shouldn't do so *that* much.

One option would be to convert the file first, turning \x0095 into
• everywhere, and *then* load it in.
 
I went for the Regex.Split option in the end - it was quite straightforward, I just needed to escape the backslash. So, it was @"\\x0095".
 
Back
Top