International strings/chars

Dobieslaw Wroblewski · Dec 31, 2004

Hello,

Please help me. I think I found this once but cannot find again :-(

Is there any .NET framework class/method that returns the international
counterpart of a given string or character? For example - for 'ó' it returns
'o'.

DW.

Morten Wennevik · Dec 31, 2004

Hi Dobieslaw,

There isn't an easy way to do this, but this hack seem to work for most
characters.

string s = "áàäãâåéèëêíìïîóòöõôøúùüûýÿ";
byte[] b = Encoding.GetEncoding(1251).GetBytes(s);
string t = Encoding.ASCII.GetString(b);

t == aaaaaaeeeeiiiioooooouuuuyy

Dobieslaw Wroblewski · Jan 1, 2005

string s = "áàäãâåéèëêíìïîóòöõôøúùüûýÿ";

byte[] b = Encoding.GetEncoding(1251).GetBytes(s);
string t = Encoding.ASCII.GetString(b);

Thanks, it works!

I believe I used Encoding.ASCII for writing to text files, but in such cases
I got question marks instead of the national characters - so this trick is
quite mysterious for me, but - goodie - it works :-)

.

DW.

Jon Skeet [C# MVP] · Jan 1, 2005

Dobieslaw Wroblewski

string s = "áàäãâåéèëêíìïîóòöõôøúùüûý=3F";
byte[] b = Encoding.GetEncoding(1251).GetBytes(s);
string t = Encoding.ASCII.GetString(b);

Click to expand...

Thanks, it works!

I believe I used Encoding.ASCII for writing to text files, but in
such cases I got question marks instead of the national characters -

Yes, you would - ASCII doesn't contain any accented characters.

so this trick is
quite mysterious for me, but - goodie - it works .

Unfortunately, the above isn't really guaranteed to work. The Encoding
class doesn't specify any behaviour for when GetString is presented
with bytes which aren't a valid encoded form for that encoding. It may
work now, but there's no guarantee it'll work in the future at all.

I suspect you want something like "normalizatino form D" of the Unicode
string, followed by a removal of all accents etc - but I don't know
enough about normalization to be sure, and I don't know of any
implementations of that for .NET

If you know that all the characters you care about are within a certain
range, I'd suggest a hand-crafted mapping.

Dobieslaw Wroblewski · Jan 2, 2005

If you know that all the characters you care about are within a certain

range, I'd suggest a hand-crafted mapping.

Well, that always could be done ;-). I just thought of something smarter.
And I think I saw some function for that, but now I cannot remember for sure
if this was .NET, Win32 API or... maybe even Java ;-).

DW.

Jon Skeet [C# MVP] · Jan 2, 2005

Dobieslaw Wroblewski

Well, that always could be done ;-). I just thought of something smarter.
And I think I saw some function for that, but now I cannot remember for sure
if this was .NET, Win32 API or... maybe even Java ;-).

I wouldn't be surprised if there were a call in Win32, and you may be
able to use P/Invoke to use that call. You could try asking in a
Windows internationalization newsgroup...

International strings/chars

Dobieslaw Wroblewski

Morten Wennevik

Dobieslaw Wroblewski

Jon Skeet [C# MVP]

Dobieslaw Wroblewski

Jon Skeet [C# MVP]