C# does not support Unicode characters

  • Thread starter Thread starter Guest
  • Start date Start date
G

Guest

Is it correct that Unicode characters with code points above 0x10FFFF are not supported by C#?

I have a hard time believing this since it would eliminate some Asian languages. If it is true, is there a workaround? Do other .NET languages support code points > 0x10FFFF?

I appreciate any comments.
Thanks,
Johannes
 
And, yes, C# (natively) supports Unicode.

"The string type represents a string of Unicode characters. string is an alias
for System.String in the .NET Framework."
 
Is it correct that Unicode characters with code points above 0x10FFFF
are not supported by C#?

There are no Unicode characters above 0x10FFFF.

C# may have a problem with characters above 0xFFFF, since the internal
representation is UTF16. Characters between 0xFFFF and 0x10FFFF are
represented using surogates and some .NET API may be inacurate
(string length, iterations "by char", and others in the same class)
 
In light of the Unicode v3 changes that I just became aware of, I retract all
that I've said on the subject in this thread.
 
Thanks for all your responses. It's all clear to me now

UTF-16 - the internal representation of Unicode in the .NET Framework - permits code points up to 10FFFF, which does cover all languages, including Asian languages

The misunderstanding was caused by a syntax error in my code. I was using [\u000000-\u10FFFF] to indicate a range in the character class of regular expression, which is simply the wrong notation (matches 0-FFFF). The correct notation uses upper-case U, as in [\U00000000-\U0010FFFF]. The C# Language Specification is very clear about this. (section Grammar, C1.5) Maybe I will read it after all..

Johanne
 
Back
Top