Translate accented characters

  • Thread starter Thread starter JezB
  • Start date Start date
J

JezB

Is there anything in the framework which will help translate accented
characters in strings to their standard counterparts?

eg. "Gráda" to "Grada"
 
Is there anything in the framework which will help translate accented
characters in strings to their standard counterparts?

eg. "Gráda" to "Grada"

Interestingly enough, I was looking for exactly the same thing recently, and
was unable to find anything native to the Framework, so I ended up writing
my own mapping function. Easy enough for the Latin languages (e.g. French,
Spanish, Italian, Portuguese etc), fairly simply for German (e.g. any vowel
with an umlaut is replaced by the unmodified vowel + 'e'), a little messier
for the Scandinavian languages, even worse for Greek and Cyrillic, and
almost impossible for the Eastern European languages with diacritics.

What is the business purpose behind your need to do this, AAMOI?
 
I'm passing artist/album names stored within mp3 files through Amazon's web
service, to look up album details. Many of the artist names have accented
characters, since I am interested in world/celtic music, but Amazon's serach
criteria seems to be based on normalized unaccented strings. A real pain to
edit all my id3 tags !
 
Hi,

There is nothing like this in the framework, what you can do is use
String.Replace , it will be slower but there are only 5 vocals after all :)

Cheers,
 
Hi Jez,

There is nothing pre-made in .Net that will do what you want. You need to create a translation table and translate each character as necessary.

There is a method that seems to work in most cases involving translation between different encodings, but I cannot guarantee that it works in all cases.

string s = "áàäãâåéèëêíìïîóòöõôøúùüûýÿ";
byte[] b = Encoding.GetEncoding(1251).GetBytes(s);
string t = Encoding.ASCII.GetString(b);

//t == aaaaaaeeeeiiiioooooouuuuyy
 
There is a method that seems to work in most cases involving translation
between different encodings, but I cannot guarantee that it works in all
cases.

string s = "áàäãâåéèëêíìïîóòöõôøúùüûýÿ";
byte[] b = Encoding.GetEncoding(1251).GetBytes(s);
string t = Encoding.ASCII.GetString(b);

//t == aaaaaaeeeeiiiioooooouuuuyy

No use at all for German:

ä = ae
ö = oe
ü = ue
ß = ss
 
There is nothing like this in the framework, what you can do is use
String.Replace , it will be slower but there are only 5 vocals after all
:)

If by "vocals" you mean "vowels", then that just isn't the case in many
languages...
 
I'm passing artist/album names stored within mp3 files through Amazon's
web service, to look up album details. Many of the artist names have
accented characters, since I am interested in world/celtic music, but
Amazon's serach criteria seems to be based on normalized unaccented
strings. A real pain to edit all my id3 tags !

Then the translation table approach is what you need here...
 
That's good enough for me !!! This is just a hobby program so doesn't need
to be foolproof.
Many thanks Morten
 
Hi,

Yes I meant vowels :)

It was an "spanglish" vocales = vowels in spanish :)


You are right, but in a particular language there are not that many, most
certainly by the description of the OP he has the mp3 tags in one language

cheers,
 
Back
Top