Regex bug?? Insufficient hexadecimal digits

  • Thread starter Thread starter Guest
  • Start date Start date
G

Guest

I have a string that contains the \", \t, \r, \n. I need to get the xml.

sample below:
"<?xml version=\"1.0\"?>\r\n<USERS
xmlns:xsd=\"http://www.w3.org/2001/XMLSchema\"
xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\"
xmlns=\"http://www.slcorp.com\\xml\\slcorp_dtd_schema.xml\">\r\n\t<ACCT>GameTek</ACCT>\r\n\t<USER>\r\n\t\t<USER_ID>Mike</USER_ID></USER>\r\n\t</USERS>\r\n"

I have tried replacing as follows so I can get the xml. I have tried 2
approaches
(1)
str = str.Replace("\n", "").Replace("\t","").Replace("\r","").Replace("\"",
""");
This code segment (Replace("\"", """);) does not compile, the rest is okay.
-------------------------------------------------------------------------
(2)
I have also tried using Regex as follows

string str= Regex.Unescape(str); This time the exception is "Insufficient
hexadecimal digits"


Any ideas?
 
Mori said:
I have a string that contains the \", \t, \r, \n. I need to get the xml.

sample below:
"<?xml version=\"1.0\"?>\r\n<USERS
xmlns:xsd=\"http://www.w3.org/2001/XMLSchema\"
xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\"
xmlns=\"http://www.slcorp.com\\xml\\slcorp_dtd_schema.xml\">\r\n\t<ACCT>GameTek</ACCT>\r\n\t<USER>\r\n\t\t<USER_ID>Mike</USER_ID></USER>\r\n\t</USERS>\r\n"

I have tried replacing as follows so I can get the xml. I have tried 2
approaches
(1)
str = str.Replace("\n", "").Replace("\t","").Replace("\r","").Replace("\"",
""");
This code segment (Replace("\"", """);) does not compile, the rest is okay.
-------------------------------------------------------------------------
(2)
I have also tried using Regex as follows

string str= Regex.Unescape(str); This time the exception is "Insufficient
hexadecimal digits"


Any ideas?

""" isn't a valid string. Did you mean ""?

However, I'm not entirely sure what you mean by needing to "get the
XML" - the string *is* the XML. The \r, \n etc are only escapes as far
as C# is concerned.

See http://www.pobox.com/~skeet/csharp/strings.html
 
Mori said:
I have a string that contains the \", \t, \r, \n. I need to get the xml.

sample below:
"<?xml version=\"1.0\"?>\r\n<USERS
xmlns:xsd=\"http://www.w3.org/2001/XMLSchema\"
xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\"
xmlns=\"http://www.slcorp.com\\xml\\slcorp_dtd_schema.xml\">\r\n\t<ACCT>GameTek</ACCT>\r\n\t<USER>\r\n\t\t<USER_ID>Mike</USER_ID></USER>\r\n\t</USERS>\r\n"

I have tried replacing as follows so I can get the xml. I have tried 2
approaches
(1)
str = str.Replace("\n", "").Replace("\t","").Replace("\r","").Replace("\"",
""");
This code segment (Replace("\"", """);) does not compile, the rest is okay.
-------------------------------------------------------------------------
(2)
I have also tried using Regex as follows

string str= Regex.Unescape(str); This time the exception is "Insufficient
hexadecimal digits"

In addition to what Jon said, I understand you want to strip the escape
sequences from the XML string by replacing \r, \n and \t by nothing, but
replace \" by ". Right?

In that case, you need to make sure that the escape sequences aren't
recognized as such in the strings you are trying to use in your
replacement. The easiest way to do that is to use verbatim literals, like
this:

str = str.Replace(@"\n", "").Replace(@"\t","").Replace @"\r","").Replace(@"\"", @"""");

Without verbatim literals, it would have to look like this:

str = str.Replace("\\n", "").Replace("\\t","").Replace "\\r","").Replace("\\\"", "\"");

Using regular expressions is probably not the most performant way to do
this, because you'd have to do two replacements - the only advantage is
that you could replace \r, \n and \t in one go:

Regex.Replace(str, @"\\[rnt]", "");

Using Regex.Unescape doesn't make any sense here, it's got a completely
different purpose.

Now, I'm sure I made a mistake somewhere will all this escaping -
someone's going to tell me :-)


Oliver Sturm
 
Back
Top