Determining unsopported characters in a font

  • Thread starter Thread starter Jeff Johnson
  • Start date Start date
J

Jeff Johnson

I wrote a little utility which views how code pages map characters to their
Unicode code points, and in the process of viewing non-Latin character sets
I see that several fonts do not support scripts like Cyrillic, for example.
However, Windows (or .NET, or something) substitutes a font which does
support those characters (Arial, it looks like), so I always see 256
characters on-screen*. This has me wondering: is there any way to determine
if a font provides a given Unicode character (i.e., code point)?


*In other words, with code page 1251 (Cyrillic) active, I see 0 - 127 in
Curlz MT, for example, but characters 128 - 255, which for the most part
differ from what most people would call "high-ASCII," appear in Arial,
because Curlz MT doesn't have glyphs for U+0400 - U+04FF.

The code for drawing these glyphs doesn't change; I simply iterate over the
256 characters in my string and print the current one using a font set up
earlier:

g.DrawString(current.ToString(), displayFont,
characterBrush,
new PointF((float)x * 32 + 24, (float)y * 36 + 52));
 
This has me wondering: is there any way to determine
if a font provides a given Unicode character (i.e., code point)?

Several ways:
1. Parse the cmap OpenType table
http://www.microsoft.com/typography/otspec/cmap.htm
You can read it from the .otf file or retrieve it with GetFontData
2. GetGlyphIndices (but it does not handle Unicode characters above FFFF)
http://msdn.microsoft.com/en-us/library/dd144890(VS.85).aspx
3. ScriptGetCMap (Uniscribe)
http://msdn.microsoft.com/en-us/library/dd319122(VS.85).aspx

Nothing pure .NET (except maybe the cmap parsing)
 
Several ways:
1. Parse the cmap OpenType table
http://www.microsoft.com/typography/otspec/cmap.htm
You can read it from the .otf file or retrieve it with GetFontData
2. GetGlyphIndices (but it does not handle Unicode characters above FFFF)
http://msdn.microsoft.com/en-us/library/dd144890(VS.85).aspx
3. ScriptGetCMap (Uniscribe)
http://msdn.microsoft.com/en-us/library/dd319122(VS.85).aspx

Nothing pure .NET (except maybe the cmap parsing)

Thanks for these links. This stuff is complicated! In some sample source I
saw something I'd never seen in my (admittedly limited) C[++] experience: a
pointer to an array of pointers, declared like this:

SOME_STRUCT*** blah;

THREE asterisks!! Holy crap! That's some indirection and a half right there!
 
Back
Top