String with UTF8 in it?

  • Thread starter Thread starter Jeff
  • Start date Start date
J

Jeff

Hello,

I have a string returned from a third party library.

It is of the form "Here is some text but there is \u0026#39; in the middle
of it".

Is there an easy way for me to convert this \u part into unicode? I've tried
using the encoding classes but \u0026#39; remains intact.

Many thanks in advance!

Jeff.

--



__________ Information from ESET NOD32 Antivirus, version of virus signature database 3961 (20090325) __________

The message was checked by ESET NOD32 Antivirus.

http://www.eset.com
 
Jeff said:
I have a string returned from a third party library.

It is of the form "Here is some text but there is \u0026#39; in the
middle of it".

Is there an easy way for me to convert this \u part into unicode? I've
tried using the encoding classes but \u0026#39; remains intact.
\u0026#39; is intended to correspond with ' (the \u0026 is & is escaped
according to C# string literal conventions, apparently) and ' in turn
corresponds with ' (apostrophe) in HTML and XML escaping. It makes very
little sense to escape strings this way, so I'm guessing a few wires got
crossed. It has nothing to do with UTF-8, in any case.

As far as I know, there are no standard framework classes for unescaping
either of these mechanisms, except indirectly (the XML parser can unescape
XML encoding, obviously, and the C# compiler knows about Unicode escapes,
but using these would be overkill).

If you're certain your library consistently escapes strings this way, you
can undo it by first replacing "\u[0-9a-f]{4}" sequences and then replacing
"&#[0-9]+;" (and possibly "&#x[0-9a-f]+;") using regexes. However, I would
first look into the mechanisms that are causing these escapings if possible.
They don't appear to make sense in the first place.
 
Back
Top