hi
Probably this is the problem, right there.
Was the file encoded as UTF-8 or UTF-16?
I tried both encodings, and when using UTF-8, I also called GetPreamble
() to specify BOM at the beginning of the file (0xEF 0xBB 0xBF).
Anyways, notepad still displayed empty square brackets
Did you pick character codes that actually are valid code points? The
Unicode codespace contains more than 1.1 million possible code points,
but only about 100.000 of them are used...
I think I did, since I checked at the following URL what character
should particular code point represent:
http://www.columbia.edu/kermit/utf8-t1.html
What do you mean by "unknown character"? Do you mean a codepoint that
doesn't represent any valid character in Unicode, or do you mean an
invalid UTF-8 sequence? For the former, the sensible way to do is just
try to get the glyph for that codepoint from the selected font as
usual - if the font cannot render it, then it will end up being a
"question mark" anyway. For the latter, it depends on what you're
doing; for example, whenever Notepad encounters an invalid UTF-8
sequence, it seems to render the individual bytes constituting it as
blank squares.
Notepad displayed blank squares for characters it couldn’t render,
thus I assume problem is not in having fonts that which don't have
glyphs for certain characters
BTW – I checked the file with Hex editor and everything was OK, so
the problem is with notepad not the file. Here is the code:
void Main()
{
FileStream fs = new FileStream(@"D:\test.txt",
FileMode.Create);
StreamWriter sw = new StreamWriter(fs, Encoding.UTF8);
for (int i = 383; i < 450; i++) // just empty squares
{
sw.Write((char)i);
}
sw.Close();
}
A question mark in Windows doesn't mean "unknown character". It means
a "character for which this fon't doesn't have a glyph". This has
nothing to do with codepages, and everything with font that you're
using. Try changing Notepad font to something like "Arial Unicode MS",
and see what it does then.
I checked what fonts Notepad supports and while it has several Arial
types, none contains the word Unicode
thank you