1252 to utf-8

  • Thread starter Thread starter Hans Ruck
  • Start date Start date
H

Hans Ruck

I am writing a MIME parser and i need to convert the Windows-1252
encoded strings to utf-8. For example, the text
"=?Windows-1252?Q?une_beaut=E9=?" should become "une beauté".

Do you know what the conversion algorithm is? Does a "shortest" method
exist in the framework?

Hans.
 
The syntax =E9 is not Windows-1252 but the "Quoted-printable" format for
Mime (there is also the Base64); see
http://en.wikipedia.org/wiki/Quoted-printable

The quoted-printable format only express how to encode 8 bits to 7 bits. In
this case, we have that E9 = (14 * 16) + 9 = 233.

Now, for the Windows-1252 code page, 233 is the value of "é" which, lucky
enough, has exactly the same value for Unicode. However, the value of 233
may represent a lot of other possibilities for other code pages than Win1252
with the effect that you cannot always make a direct translation from a code
page value to its unicode counterpart. (In fact, you can rarely do it.)

http://www.microsoft.com/typography/unicode/cscp.htm for example.

For the possibility of having a direct algorithm in the framework, I don't
know but I think that there is one. For example, the following article make
a mention of « message.HTMLBodyPart.ContentTransferEncoding =
"quoted-printable" »

http://wiki.ittoolbox.com/index.php...Net_Framework_1.0_or_.Net_Framework_1.1?sp=CM

Voir aussi:
http://bugs.php.net/bug.php?id=7531

http://www.phpcs.com/codes/DECODAGE-QUOTED-PRINTABLE-POUR-MAILS_33621.aspx

--
Sylvain Lafontaine, ing.
MVP - Technologies Virtual-PC
E-mail: http://cerbermail.com/?QugbLEWINF


I am writing a MIME parser and i need to convert the Windows-1252
encoded strings to utf-8. For example, the text
"=?Windows-1252?Q?une_beaut=E9=?" should become "une beauté".

Do you know what the conversion algorithm is? Does a "shortest" method
exist in the framework?

Hans.
 
Back
Top