public String ToLowerASCII(String s)
{
return new String(
s.Normalize(NormalizationForm.FormD).ToCharArray()
.Where(c =>
System.Globalization.CharUnicodeInfo.GetUnicodeCategory(c) !=
System.Globalization.UnicodeCategory.NonSpacingMark)
.ToArray());
}
There is one small issue with this, depending on whether or not you
have non-letter Unicode input to your application or not---because
we're not using an ASCII encoding directly, the system is still
permitting characters with values greater than 127, which is a no-no if
you want pure ASCII.
Now, if .NET provides a general-purpose Unicode transliteration
mechanism, cool. It doesn't seem so, though, so if your input includes
anything but characters that can be stripped of accents, you're still
sending non-ASCII output. To catch that, we need to filter the allowed
characters a bit. Here's a way to do it without using LINQ (requires
"using System.Collections.Generic; using System.Globalization; using
System.Text;"):
===============
public static string StringToAscii(string str) {
List<char> normalized = new List<char>();
foreach(char c in
str.Normalize(NormalizationForm.FormD).ToCharArray()) {
if(CharUnicodeInfo.GetUnicodeCategory(c) !=
UnicodeCategory.NonSpacingMark)
if(c < 127)
normalized.Add(c);
}
return(new String(normalized.ToArray()));
}
===============
This function now strips accents from characters, and doesn't pass any
non-ASCII character through. If your input has null characters or
control characters, I would expect that those would be preserved.
If you want to preserve ALL Unicode input, though, you will still have
to create a translation table method of some sort, such that you can do
things like:
© = (c)
® = (R)
â„¢ = (tm)
ß = ss
And so forth.
Look below for a full test program that you can examine, too, the
output of which is:
test> ./ascii.exe
Orig: áéÃóú
New: aeiou
Orig: äåé®þüúÃóö«áßðfgjhg'¶øœæ©xvbbñ
New: aaeuuiooafgjhg'xvbbn
Orig: ¡²³¤¼¼½¾½‘¾
New:
Orig: ¿©µvæ¢ÆÃVÃÄÉÞÖ
New: vAVAEO
--- Mike
using System;
using System.Collections.Generic;
using System.Globalization;
using System.Text;
public static class EntryPoint {
public static int Main() {
string[] tests = {
"áéÃóú",
"äåé®þüúÃóö«áßðfgjhg'¶øœæ©xvbbñ",
"¡²³¤¼¼½¾½‘¾",
"¿©µvæ¢ÆÃVÃÄÉÞÖ" };
foreach(string t in tests) {
Console.WriteLine("Orig:\t{0}", t);
Console.WriteLine("New:\t{0}", StringToAscii(t));
Console.WriteLine();
}
return(0);
}
public static string StringToAscii(string str) {
List<char> normalized = new List<char>();
foreach(char c in
str.Normalize(NormalizationForm.FormD).ToCharArray()) {
if(CharUnicodeInfo.GetUnicodeCategory(c) !=
UnicodeCategory.NonSpacingMark)
if(c < 127)
normalized.Add(c);
}
return(new String(normalized.ToArray()));
}
}