International strings/chars

  • Thread starter Thread starter Dobieslaw Wroblewski
  • Start date Start date
D

Dobieslaw Wroblewski

Hello,

Please help me. I think I found this once but cannot find again :-(

Is there any .NET framework class/method that returns the international
counterpart of a given string or character? For example - for 'ó' it returns
'o'.

DW.
 
Hi Dobieslaw,

There isn't an easy way to do this, but this hack seem to work for most
characters.

string s = "áàäãâåéèëêíìïîóòöõôøúùüûýÿ";
byte[] b = Encoding.GetEncoding(1251).GetBytes(s);
string t = Encoding.ASCII.GetString(b);

t == aaaaaaeeeeiiiioooooouuuuyy
 
string s = "áàäãâåéèëêíìïîóòöõôøúùüûýÿ";
byte[] b = Encoding.GetEncoding(1251).GetBytes(s);
string t = Encoding.ASCII.GetString(b);

Thanks, it works!

I believe I used Encoding.ASCII for writing to text files, but in such cases
I got question marks instead of the national characters - so this trick is
quite mysterious for me, but - goodie - it works :-).

DW.
 
Dobieslaw Wroblewski
string s = "áàäãâåéèëêíìïîóòöõôøúùüûý=3F";
byte[] b = Encoding.GetEncoding(1251).GetBytes(s);
string t = Encoding.ASCII.GetString(b);

Thanks, it works!

I believe I used Encoding.ASCII for writing to text files, but in
such cases I got question marks instead of the national characters -

Yes, you would - ASCII doesn't contain any accented characters.
so this trick is
quite mysterious for me, but - goodie - it works :-).

Unfortunately, the above isn't really guaranteed to work. The Encoding
class doesn't specify any behaviour for when GetString is presented
with bytes which aren't a valid encoded form for that encoding. It may
work now, but there's no guarantee it'll work in the future at all.

I suspect you want something like "normalizatino form D" of the Unicode
string, followed by a removal of all accents etc - but I don't know
enough about normalization to be sure, and I don't know of any
implementations of that for .NET :(

If you know that all the characters you care about are within a certain
range, I'd suggest a hand-crafted mapping.
 
If you know that all the characters you care about are within a certain
range, I'd suggest a hand-crafted mapping.

Well, that always could be done ;-). I just thought of something smarter.
And I think I saw some function for that, but now I cannot remember for sure
if this was .NET, Win32 API or... maybe even Java ;-).

DW.
 
Dobieslaw Wroblewski
Well, that always could be done ;-). I just thought of something smarter.
And I think I saw some function for that, but now I cannot remember for sure
if this was .NET, Win32 API or... maybe even Java ;-).

I wouldn't be surprised if there were a call in Win32, and you may be
able to use P/Invoke to use that call. You could try asking in a
Windows internationalization newsgroup...
 
Back
Top