String with UTF8 in it?

Jeff · Mar 25, 2009

Hello,

I have a string returned from a third party library.

It is of the form "Here is some text but there is \u0026#39; in the middle
of it".

Is there an easy way for me to convert this \u part into unicode? I've tried
using the encoding classes but \u0026#39; remains intact.

Many thanks in advance!

Jeff.

--

__________ Information from ESET NOD32 Antivirus, version of virus signature database 3961 (20090325) __________

The message was checked by ESET NOD32 Antivirus.

http://www.eset.com

Jeroen Mostert · Mar 25, 2009

Jeff said:
I have a string returned from a third party library.

It is of the form "Here is some text but there is \u0026#39; in the
middle of it".

Is there an easy way for me to convert this \u part into unicode? I've
tried using the encoding classes but \u0026#39; remains intact.

\u0026#39; is intended to correspond with ' (the \u0026 is & is escaped
according to C# string literal conventions, apparently) and ' in turn
corresponds with ' (apostrophe) in HTML and XML escaping. It makes very
little sense to escape strings this way, so I'm guessing a few wires got
crossed. It has nothing to do with UTF-8, in any case.

As far as I know, there are no standard framework classes for unescaping
either of these mechanisms, except indirectly (the XML parser can unescape
XML encoding, obviously, and the C# compiler knows about Unicode escapes,
but using these would be overkill).

If you're certain your library consistently escapes strings this way, you
can undo it by first replacing "\u[0-9a-f]{4}" sequences and then replacing
"&#[0-9]+;" (and possibly "&#x[0-9a-f]+;") using regexes. However, I would
first look into the mechanisms that are causing these escapings if possible.
They don't appear to make sense in the first place.

referencing My Documents Folder in connection string	4	Sep 21, 2009
UTF8 / UTF16 / Unicode 3.2 / RFC 3491 - Internationalization of Strings (Framework oversite?)	12	Sep 21, 2003
UTF32 CodePoints, UTF8 Combining Chars / Surrogate Pairs, and .NET	11	Apr 21, 2004
how to convert a UTF8 string to Unicode with C#	2	Jul 9, 2004
Encryption in XML file	6	Sep 23, 2004
converting ansi to utf8 format - is there anything wrong with it ? urgently requires help	1	Mar 17, 2006
UTF8 encoding, calculating space that string will take up in a file	2	Aug 12, 2004
Send email with Korean characters in subject line and body of email	3	Apr 12, 2004

String with UTF8 in it?

Jeff

Jeroen Mostert

Ask a Question

Similar Threads