Morten Wennevik said:
There is no generic way to do this. There is a hack that works in
most cases involving switching Encoding the string and reading it in
a different encoding, but this is by no means ensured to work for
you. Your best bet is to create a lookup table and manually translate
each character. If you anticipate a wide variety of characters, maybe
Unicode or UTF-8 support is best.
Actually, as of .NET 2.0 there *is* a way of doing this using
System.Text.NormalizationForm.
Look at
http://groups.google.com/group/microsoft.public.dotnet.general/tree/bro
wse_frm/thread/78a09bd184351bc5/99f090af662c126c?rnum=11
(the last response, from Chris Mullins).
Here's the code posted, which does some upper-casing which isn't needed
in this case - but it should be okay aside from that.
Original code:
Encoding ascii = Encoding.GetEncoding(
"us-ascii",
new EncoderReplacementFallback(string.Empty),
new DecoderReplacementFallback(string.Empty));
byte[] encodedBytes = new byte[ascii.GetByteCount(normalized)];
int numberOfEncodedBytes = ascii.GetBytes(normalized, 0,
normalized.Length,
encodedBytes, 0);
string s = "áäåãòä:usdBDlGXHHA";
string normalized = s.Normalize(NormalizationForm.FormKD);
Encoding ascii = Encoding.GetEncoding(
"us-ascii",
new EncoderReplacementFallback(string.Empty),
new DecoderReplacementFallback(string.Empty));
byte[] encodedBytes = new byte[ascii.GetByteCount(normalized)];
int numberOfEncodedBytes = ascii.GetBytes(normalized, 0,
normalized.Length,
encodedBytes, 0);
string newString = ascii.GetString(encodedBytes).ToUpper();
MessageBox.Show(newString);
End of original code.
Here's a slightly simpler (IMO) version:
static string RemoveAccents (string input)
{
string normalized = input.Normalize(NormalizationForm.FormKD);
Encoding removal = Encoding.GetEncoding
(Encoding.ASCII.CodePage,
new EncoderReplacementFallback(""),
new DecoderReplacementFallback(""));
byte[] bytes = removal.GetBytes(normalized);
return Encoding.ASCII.GetString(bytes);
}
Or an alternative:
static string RemoveAccents (string input)
{
string normalized = input.Normalize(NormalizationForm.FormKD);
StringBuilder builder = new StringBuilder();
foreach (char c in normalized)
{
if (char.GetUnicodeCategory(c) !=
UnicodeCategory.NonSpacingMark)
{
builder.Append(c);
}
}
return builder.ToString();
}