Using how many bytes is each japanese char encoded??? From my
understanding,
depending on the word, they are encoded using 1 or 2 bytes, with precise
rules on the valid ranges for the leading and the trailing bytes of
double-byte chars (dbch). Could you please confirm? True.
In the first example that you provide (i.e.
\u26085\'93\'fa\u26412\'96\'7b\u-30050\'8c\'ea) it looks like you are
encoding dbch (double byte chars) using the \u control word and single byte
chars using the \' syntax. Did I get you right?
True again
My doubts arise from the fact that I have no way of telling the .NET
Framework that the text in the string is japanese.
Well, you have to know. You need to carry that info together with the string.
Where are the strings comming from? Is that info available at some point?
I still don't undertstant exactly what you need.
Back to my question "is there any good reason not to use a standard RTF
control?"
At what level do you want to work?
Have code producing an rtf "from scratch", no rtf control involved?
I find this a bit tough and probably not worth the effort.
Why not use the standard RTF control? Then you do not need to care
about the internal representation (but you still have to care about the right
fonts).
The same unicode code point looks differently in Japanese/Traditional
Chinese/Simplified Chinese, and you need the proper font for the proper
language.
The font gives a hint to the RTF control for what encoding to use.
See my example:
{\fonttbl
{\f1\fcharset128 MS PGothic;}
}
\f1\'93\'fa\'96\'7b\'8c\'ea
This reads: font number 1 using charset 128 is "MS PGothic"
Then \f1 tells that the text following used font 1.
Charset 128 is SHIFTJIS_CHARSET (WinGDI.h), which means Japanese,
which means 932 used for the bytes.
On the other side, "the bytes" part is only used by old RTF controls.
For new controls you can even use this:
\u26085\'3f\'3f\u26412\'3f\'3f\u-30050\'3f\'3f
(\'3f = question mark)