_vsnwprintf_s seems to be broken

  • Thread starter Thread starter Norman Diamond
  • Start date Start date
N

Norman Diamond

I think the current version of _vsnwprintf_s is broken, in ordinary Windows.

I'm not completely sure yet but it looks like this breakage is worse than
previously known Windows CE breakage of StringCchPrintf. For Windows CE
breakage of StringCchPrintf, since the %S format died instead of converting
ANSI to Unicode, a workaround was to call MultiByteToWideChar and then use
the %s format.

For ordinary Windows breakage of _vsnwprintf_s, the %s format is broken, as
far as I can tell.

The compilation environment is not internationalized. It's Visual Studio
2005 SP1 + hotfix for Vista, and SDK for Vista, all running on Vista, all in
Japanese, no foreign software involved in this environment. The project
setting for character set says to use Unicode not ANSI. Function name
_vsntprintf_s maps to _vsnwprintf_s, _T("") maps to L"", etc., and
everything except _vsnwprintf_s seems to perform properly at execution time.
MFC and ATL are not used. The CRT is used as a DLL.

The runtime environment where failure was observed is internationalized.
The Chinese MUI pack was downloaded. The user's locale (viewable format or
something like that), the user's display language, and the system locale
(viewable format for non-Unicode programs) are all set to Chinese
traditional Hong Kong. The settings were copied to all reserved and default
accounts. The execution PC was rebooted several times. The logon screen
and nearly everything else are displayed properly in Chinese. However, the
CRT DLL is from Vista RTM, not from Visual Studio 2005 SP1.

The user's username is "$BCfJ8(B2" (without the quotes). The user can log on
perfectly. The Start menu shows the user's name at the top. Windows
Explorer shows the user's name correctly. No renaming or anything else has
been done with this user. Ordinary Windows operations work. Execution of
my program works, except for calls to _vsnwprintf_s.

Code:
static TCHAR szBuf[2048];
_vsntprintf_s(szBuf, sizeof(szBuf) / sizeof (TCHAR),
_T("Username=\"%s\"\n"), userName);

Result:
Username="

_vsnwprintf_s dies as soon as the %s format hits a perfectly valid simple
Unicode character.
 
Hello,
Code:
static TCHAR szBuf[2048];
_vsntprintf_s(szBuf, sizeof(szBuf) / sizeof (TCHAR),
_T("Username=\"%s\"\n"), userName);

Result:
Username="
_vsnwprintf_s dies as soon as the %s format hits a perfectly valid simple
Unicode character.

What is the type of userName? Is it va_arg or TCHAR *? v* functions take va_arg
params and not TCHAR * ones. Maybe you should use _sntprintf_s in place of
_vsntprintf_s?

-- best regards

Cezary Noweta
 
Ouch, I missumarized the source code when making this posting. No wonder it
looks like the source code was at fault. Here, I'll summarize it more
accurately.

_TCHAR userName[48];
DebugLog(_T("Other string=\"%s\"\n"), _T("Hello foreign language"));
DebugLog(_T("Username=\"%s\"\n"), userName);
[...]

void DebugLog(TCHAR* szForm, ...)
{
va_list args;
va_start(args, szForm); // init valiable length argument list
static TCHAR szBuf[2048]; // same size for HexDump
_vsntprintf_s(szBuf, sizeof(szBuf) / sizeof (TCHAR), szForm, args);
}

Result:
Other string="Hello foreign language"
Username="

_vsnwprintf_s dies as soon as the %s format hits a perfectly valid simple
Unicode character. In the Japanese version of Vista, in the Japanese
version of the CRT, the Japanese version of _vsnwprintf_s can't handle
Japanese characters (the Japanese user's username) in Unicode.


Cezary Noweta said:
Hello,
Code:
static TCHAR szBuf[2048];
_vsntprintf_s(szBuf, sizeof(szBuf) / sizeof (TCHAR),
_T("Username=\"%s\"\n"), userName);

Result:
Username="
_vsnwprintf_s dies as soon as the %s format hits a perfectly valid simple
Unicode character.

What is the type of userName? Is it va_arg or TCHAR *? v* functions take
va_arg
params and not TCHAR * ones. Maybe you should use _sntprintf_s in place of
_vsntprintf_s?

-- best regards

Cezary Noweta
 
I have just determined that _vsnwprintf_s is broken in Chinese Vista too,
with no internationalization involved in the execution system.

As posted in my other message a few hours ago, here is a corrected summary
of the source code:

_TCHAR userName[48];
DebugLog(_T("Other string=\"%s\"\n"), _T("Hello foreign language"));
DebugLog(_T("Username=\"%s\"\n"), userName);
[...]

void DebugLog(TCHAR* szForm, ...)
{
va_list args;
va_start(args, szForm); // init valiable length argument list
static TCHAR szBuf[2048]; // same size for HexDump
_vsntprintf_s(szBuf, sizeof(szBuf) / sizeof (TCHAR), szForm, args);
}

Result:
Other string="Hello foreign language"
Username="

_vsnwprintf_s dies as soon as the %s format hits a perfectly valid simple
Unicode character. In the Chinese version of Vista, in the Chinese version
of the CRT, the Chinese version of _vsnwprintf_s can't handle Chinese
characters (the Chinese user's username) in Unicode.

The rest of the program works, all except the calls to _vsnwprintf_s.

(By the way the valiable spelling in comments was there in the original. I
don't know who the original coder was, only that it was coded in Japan.
Today I copied a bit too much source code when using the mouse, but I did
copy it correctly today.)


Norman Diamond said:
I think the current version of _vsnwprintf_s is broken, in ordinary
Windows.

I'm not completely sure yet but it looks like this breakage is worse than
previously known Windows CE breakage of StringCchPrintf. For Windows CE
breakage of StringCchPrintf, since the %S format died instead of
converting
ANSI to Unicode, a workaround was to call MultiByteToWideChar and then use
the %s format.

For ordinary Windows breakage of _vsnwprintf_s, the %s format is broken,
as
far as I can tell.

The compilation environment is not internationalized. It's Visual Studio
2005 SP1 + hotfix for Vista, and SDK for Vista, all running on Vista, all
in
Japanese, no foreign software involved in this environment. The project
setting for character set says to use Unicode not ANSI. Function name
_vsntprintf_s maps to _vsnwprintf_s, _T("") maps to L"", etc., and
everything except _vsnwprintf_s seems to perform properly at execution
time.
MFC and ATL are not used. The CRT is used as a DLL.

The runtime environment where failure was observed is internationalized.
The Chinese MUI pack was downloaded. The user's locale (viewable format
or
something like that), the user's display language, and the system locale
(viewable format for non-Unicode programs) are all set to Chinese
traditional Hong Kong. The settings were copied to all reserved and
default
accounts. The execution PC was rebooted several times. The logon screen
and nearly everything else are displayed properly in Chinese. However,
the
CRT DLL is from Vista RTM, not from Visual Studio 2005 SP1.

The user's username is "$BCfJ8(B2" (without the quotes). The user can log on
perfectly. The Start menu shows the user's name at the top. Windows
Explorer shows the user's name correctly. No renaming or anything else
has
been done with this user. Ordinary Windows operations work. Execution of
my program works, except for calls to _vsnwprintf_s.

Code:
static TCHAR szBuf[2048];
_vsntprintf_s(szBuf, sizeof(szBuf) / sizeof (TCHAR),
_T("Username=\"%s\"\n"), userName);

Result:
Username="

_vsnwprintf_s dies as soon as the %s format hits a perfectly valid simple
Unicode character.
 
Hi Norman!
I have just determined that _vsnwprintf_s is broken in Chinese Vista too,
with no internationalization involved in the execution system.

Can you please provide al *full* working example?

And please do not use non-ASCII chars in the source-code,
so that it can be compiled on other systems with the same result.


_TCHAR userName[48];
DebugLog(_T("Other string=\"%s\"\n"), _T("Hello foreign language"));
DebugLog(_T("Username=\"%s\"\n"), userName);

"userName" is not initialized...

Greetings
Jochen
 
Norman Diamond said:
void DebugLog(TCHAR* szForm, ...)
{
va_list args;
va_start(args, szForm); // init valiable length argument list
static TCHAR szBuf[2048]; // same size for HexDump
_vsntprintf_s(szBuf, sizeof(szBuf) / sizeof (TCHAR), szForm, args);
}

It seems va_end and output of szBuf[] are missing from this function.
_vsnwprintf_s dies as soon as the %s format hits a perfectly
valid simple Unicode character.

Does _vsnwprintf_s crash or call the invalid parameter handler,
or does it return some value (which one)?
In the Chinese version of Vista, in the Chinese version of the
CRT, the Chinese version of _vsnwprintf_s can't handle Chinese
characters (the Chinese user's username) in Unicode.

So presumably you are initializing userName[] in some way.
It would be interesting to know the wchar_t values therein.
(You posted a string earlier but please give the numbers too.)
 
Can you please provide al *full* working example?

You mean that I should show the assignment of the value of userName? I
don't know if I can or not, because you proceed to say this:
And please do not use non-ASCII chars in the source-code,

The user name was "$BCfJ8(B2", without the quotes. I mentioned that part of it
correctly yesterday.
"userName" is not initialized...

It was not. It was retrieved from some decryption code which I will not
quote. Before being encrypted, it was originally retrieved from an API
which I think is one of the NetWksta____ APIs. The userName value was
retrieved correctly. The userName value was passed to other APIs for
authentication and succeeded. To repeat again, everything worked except for
calls to _vsnwprintf_s.
And please do not use non-ASCII chars in the source-code, so that it can
be compiled on other systems with the same result.

Hahahaha. Did I not show enough times that the Japanese and Chinese
versions of _vsnwprintf_s worked OK on ASCII characters? They only fail
when presented with strings in their own languages.


Jochen Kalmbach said:
Hi Norman!
I have just determined that _vsnwprintf_s is broken in Chinese Vista too,
with no internationalization involved in the execution system.

Can you please provide al *full* working example?

And please do not use non-ASCII chars in the source-code, so that it can
be compiled on other systems with the same result.


_TCHAR userName[48];
DebugLog(_T("Other string=\"%s\"\n"), _T("Hello foreign language"));
DebugLog(_T("Username=\"%s\"\n"), userName);

"userName" is not initialized...

Greetings
Jochen
 
It seems va_end and output of szBuf[] are missing from this function.

FILE* pf;
_tfopen_s(&pf, LOG_FILE_NAME, _T("a"));
if (pf) _ftprintf_s(pf, szBuf);
if (pf) fclose(pf);
va_end(args);

Do you also need a transcript of actions in Windows Explorer to open the log
file in Notepad and show the contents which my previous messages
transcribed?

Do you think that maybe the CRT's _vsnwprintf_s could handle the language of
its own version of Windows but the CRT's _ftprintf_s failed because it had
harder work to do? I don't quite think so.
Does _vsnwprintf_s crash or call the invalid parameter handler,
or does it return some value (which one)?

If it called the invalid parameter handler then I think the rest of the code
(the caller of DebugLog) would not proceed to get everything else working
properly with other Windows APIs, I think the rest of the code would abort.

Your question about the return value is a good one. I will add a meta debug
log of that information. I probably won't have time this week though
because higher priority work has just come in.
So presumably you are initializing userName[] in some way.
It would be interesting to know the wchar_t values therein.
(You posted a string earlier but please give the numbers too.)

The string is L"$BCfJ8(B2" (without the quotes). If you really need the
numbers, you can look them up as easily as I can. (The third character is
number U+0032.)


Kalle Olavi Niemitalo said:
Norman Diamond said:
void DebugLog(TCHAR* szForm, ...)
{
va_list args;
va_start(args, szForm); // init valiable length argument list
static TCHAR szBuf[2048]; // same size for HexDump
_vsntprintf_s(szBuf, sizeof(szBuf) / sizeof (TCHAR), szForm, args);
}

It seems va_end and output of szBuf[] are missing from this function.
_vsnwprintf_s dies as soon as the %s format hits a perfectly
valid simple Unicode character.

Does _vsnwprintf_s crash or call the invalid parameter handler,
or does it return some value (which one)?
In the Chinese version of Vista, in the Chinese version of the
CRT, the Chinese version of _vsnwprintf_s can't handle Chinese
characters (the Chinese user's username) in Unicode.

So presumably you are initializing userName[] in some way.
It would be interesting to know the wchar_t values therein.
(You posted a string earlier but please give the numbers too.)
 
Hi Norman!
You mean that I should show the assignment of the value of userName? I
don't know if I can or not, because you proceed to say this:

Maybe you can write:
TCHAR szUserName[] = {0x1234, 0x2345, 0x789A, 0x0000};
?????

Hahahaha.

Maybe you can write:
TCHAR szUserName[] = {0x1234, 0x2345, 0x789A, 0x0000};
?????

Hahahahaha....


So... please provide a small, full working example with ASCII chars in the
source code!

Greetings
Jochen
 
Hello,
FILE* pf;
_tfopen_s(&pf, LOG_FILE_NAME, _T("a"));
if (pf) _ftprintf_s(pf, szBuf);
if (pf) fclose(pf);
va_end(args);

Do you also need a transcript of actions in Windows Explorer to open the log
file in Notepad and show the contents which my previous messages
transcribed?

It would be nice but not necessary ;)
Do you think that maybe the CRT's _vsnwprintf_s could handle the language of
its own version of Windows but the CRT's _ftprintf_s failed because it had
harder work to do? I don't quite think so.

Yes - I think so. Wide printf foos stop output when they cannot convert from wide
char to mbcs (current locale CP or console CP). This occurs when writing to the
console, text file and so on. Open the log file in UTF16 mode (i.e. _T("ab") instead
of _T("a")), or use the following code:

======
FILE* pf;
_tfopen_s(&pf, LOG_FILE_NAME, _T("a"));
if ( pf ) {
int outchars;
outchars = _ftprintf_s(pf, szBuf);
_ftprintf_s(pf, _T("STRLEN: %u; OUTCHARS: %i\n"),
_tcslen(szBuf),
outchars);
fclose(pf);
}
va_end(args);
======

or try to set locale (,,_tsetlocale(LC_CTYPE, _T(".932"))'') to CP 932 before you are
calling ftprintf and compare the results.
If it called the invalid parameter handler then I think the rest of the code
(the caller of DebugLog) would not proceed to get everything else working
properly with other Windows APIs, I think the rest of the code would abort.

It called wctomb() which convert to the current locale (at the beginning it is "C"
which means that all chars >= U+0100 are not converted). After it failed fwprintf_s
has failed too and the foo returned number chars output so far. The rest of the code
runs fine.
The string is L"$BCfJ8(B2" (without the quotes). If you really need the
numbers, you can look them up as easily as I can. (The third character is
number U+0032.)

Oooo... ,,92 86 95 B6 32'' - 14 chars of text. At the beginning I thought that the
first two char codes are confidential and you can not disclose it explicitly ;)
Really could not you enumerate codes even at the price of a solution of your problem?

-- best regards

Cezary Noweta
 
Cezary Noweta said:
Yes - I think so. Wide printf foos stop output when they cannot convert from wide
char to mbcs (current locale CP or console CP). This occurs when writing to the
console, text file and so on.

Yes, that could cause the problem. (I expected to see an
OutputDebugString call.)
Open the log file in UTF16 mode (i.e. _T("ab") instead of _T("a")),
or use the following code:

From the documentation of fopen_s and _wfopen_s, it appears that
the "b" flag only affects control characters, and creating a
UTF-16 file requires _T("a, ccs=UTF-16LE") in Visual C++ 2005.

http://msdn2.microsoft.com/library/z5hh6ee9(VS.80).aspx
FILE* pf;
_tfopen_s(&pf, LOG_FILE_NAME, _T("a"));
if ( pf ) {

The documentation of fopen_s and _wfopen_s does not promise that
they reset the pointer to NULL on error. On the contrary, they
are specified to leave the contents of pFile unchanged in at
least some error situations. So I think it would be best to
initialize the pointer to NULL, _and_ check the return value
rather than the pointer.
int outchars;
outchars = _ftprintf_s(pf, szBuf);

Outputting szBuf as a format string without arguments is likely
to crash the program as soon as percent signs appear.
_fputts(szBuf, pf) would be safer.
Oooo... ,,92 86 95 B6 32'' - 14 chars of text. At the beginning I thought that the
first two char codes are confidential and you can not disclose it explicitly ;)

I got the characters U+4E2D U+6587 U+0032, although I cannot be
certain they weren't corrupted by the software I am using.
 
Jochen Kalmbach said:
You mean that I should show the assignment of the value of userName? I
don't know if I can or not, because you proceed to say this:

Maybe you can write:
TCHAR szUserName[] = {0x1234, 0x2345, 0x789A, 0x0000};
?????

$BCf(B = U+4E2D
$BJ8(B = U+6587
2 = U+0032
Hahahaha.

Maybe you can write:
TCHAR szUserName[] = {0x1234, 0x2345, 0x789A, 0x0000};
?????

Hahahahaha....

Well, the user name isn't intended to be constant. The user name is
intended to be the actual user name of some actual user, and the DLL
receives it by decrypting information that was previously encrypted by some
other DLL that was running under control of the actual user.
So... please provide a small, full working example with ASCII chars in the
source code!

TCHAR szUserName[48] = {0x4E2D, 0x6587, 0x0032, 0x0000};

Not tested. I might have time to test it later today.
 
[Norman Diamond:]
[Quotation of additional parts of program not originally quoted:]
Yes - I think so. Wide printf foos stop output when they cannot convert
from wide char to mbcs (current locale CP or console CP). This occurs when
writing to the console, text file and so on.

That would be enormously odd. This problem was reproduced in Chinese Vista
with no internationalization whatsoever. At the moment I don't recall what
the code page number is, but it is only one code page number, used in
China - Hong Kong, with no customization of the system locale or user
locale. Language packs can't even be installed on that one because it's
Vista Business not Ultimate. I did add the Japanese keyboard layout though
because the laptop has a Japanese keyboard built in, not a Chinese keyboard.

Nonetheless, if wide printf foos stop output because they are too stupid to
understand their own native default built-in code page after not being
customized at all, then I understand your suggestion that maybe the breakage
occurs in _ftprintf_s instead of _vsnwprintf_s. I might have time to
investigate this later today, maybe.
Open the log file in UTF16 mode (i.e. _T("ab") instead of _T("a")), or use
the following code:

======
FILE* pf;
_tfopen_s(&pf, LOG_FILE_NAME, _T("a"));
if ( pf ) {
int outchars;
outchars = _ftprintf_s(pf, szBuf);
_ftprintf_s(pf, _T("STRLEN: %u; OUTCHARS: %i\n"),
_tcslen(szBuf),
outchars);
fclose(pf);
}
va_end(args);
======

That meta-debugging code looks like a good suggestion, and I hope to have
time to try it later today.
or try to set locale (,,_tsetlocale(LC_CTYPE, _T(".932"))'') to CP 932
before you are calling ftprintf and compare the results.

That would be expected to cause problems. In both environments where the
problem has been observed, the actual code page was a Chinese code page not
Japanese:

(1) Japanese Vista Ultimate with system locale and user locale and MUI
language all set to Chinese (Hong Kong) and rebooted several times;

(2) Chinese (Hong Kong) Vista Business with default system locale and user
locale, and no MUI.
It called wctomb() which convert to the current locale (at the beginning
it is "C" which means that all chars >= U+0100 are not converted).

Wait a minute. I understand the possibility that the CRT might have
initialized the locale to the "C" locale, and I should try to figure out if
that happened. But if it did, then the point where it breaks and stops
converting characters shouldn't be at U+0100, it should be at U+0080. And
it should happen no matter what the system locale and user locale are.
After it failed fwprintf_s has failed too and the foo returned number
chars output so far. The rest of the code runs fine.

WTF, Outlook Express and every other Microsoft tool involved in these
newsgroup postings, WTF.

I put the cursor after "quotes)." and before " If". I hit the Enter key to
put in a line break so I can type this next stuff. Outlook Express puts the
line break after "quotes" and before "). If". More incredible editing
capabilities from Microsoft.

OK, end of second digression, back to first digression.

In my previous posting, I didn't type a raw JIS string with escape sequences
for shift-in and shift-out, I typed the actual characters. The encoding
format going over the wire was in raw JIS, ISO-2022-JP. Reading my own
previous message in Outlook Express, the message survived the round trip,
with the characters 中 and 文 and 2. But when reading your message which
quotes my previous message, Outlook Express is showing raw JIS with escape
sequences and 7-bit byte values. Oh I see, it's because your message format
is Central European. I think Central European encoding can't handle these
Chinese characters. Japanese encoding can hande them because these are
among the characters that were copied from China to Japan during recent
millennia.

Hmm, I guess I should set this current message to use UTF-8 encoding...
Done.

OK, where were we.
Oooo... ,,92 86 95 B6 32'' - 14 chars of text.

No, you're getting garbage because you're missing fonts and you couldn't
even display the original characters correctly. I looked them up this
morning so here they are:

中 = U+4E2D
æ–‡ = U+6587
2 = U+0032
At the beginning I thought that the first two char codes are confidential
and you can not disclose it
explicitly ;)
Really could not you enumerate codes even at the price of a solution of
your problem?

Well, a high-priority task came in two days ago and yes it was higher
priority than meta-debugging of debugging routines that look like they're
depending on broken library routines. (The actual working code of this DLL
had already been successfully debugged.) But this morning I had time to
look up the codes.
 
Kalle Olavi Niemitalo said:
Yes, that could cause the problem. (I expected to see an
OutputDebugString call.)

The target computer has no serial port, but it has an i1394 port, so maybe I
can try using Windbag over i1394, if I find a cable and ... hmm, and install
Windbag onto some other host that has an i1394 port...
From the documentation of fopen_s and _wfopen_s, it appears that the "b"
flag only affects control characters, and creating a UTF-16 file requires
_T("a, ccs=UTF-16LE") in Visual C++ 2005.

http://msdn2.microsoft.com/library/z5hh6ee9(VS.80).aspx

Looks like I need to do more reading and experimenting, when I get the time
to try Cezary Noweta's suggestion.
The documentation of fopen_s and _wfopen_s does not promise that they
reset the pointer to NULL on error.

Um, so what? The problem isn't in _wfopen_s. New text is being appended to
the existing debug log exactly as hoped. The problem comes in with either
_vsnwprintf_s or _ftprintf_s.
Outputting szBuf as a format string without arguments is likely to crash
the program as soon as percent signs appear. _fputts(szBuf, pf) would be
safer.

Hmm, yes, thank you. Luckily this week's user name has no percent signs,
but I'd better not add any potentially risky metadebugging code to
production code ^_^
I got the characters U+4E2D U+6587 U+0032, although I cannot be certain
they weren't corrupted by the software I am using.

Those match the values that I found this morning, looking them up.
 
Here is my test program:

#include <tchar.h>

#include <cstdio>
#include <cstdarg>

void DebugLog(TCHAR* szForm, ...)
{
va_list args;
va_start(args, szForm);
static TCHAR szBuf[2048];
_vsntprintf_s(szBuf, sizeof(szBuf) / sizeof (TCHAR), szForm, args);
_vstprintf_s(szBuf, szForm, args);
vwprintf(szForm, args);
va_end(args);
}

int __cdecl _tmain(int argc, _TCHAR* argv[])
{
_TCHAR userName[48] = _T("\u6211\u662f\u4e2d\u570b\u4eba");
DebugLog(_T("Username=%s\n"), userName);
return 0;
}

Tested on Windows XP (SysLocale 0x411), VS 2005 Express (SP0), and it
works like a charm
(minus the question marks on the console, but this was expected).
Cannot
test on WiVi.
 
Thank you for suggesting a test program, but it doesn't look like you ran a
useful test.

To repeat for the nth time, the environments where this failed have a
Chinese system locale and user locale, not Japanese. Only the development
environment was Japanese. Your test used the Japanese system locale and
unstated user locale.

You said you didn't try Vista, so I think we agree that you didn't observe
if you have a repro on Vista. But later today I will try your program on
Vista. (I'll have to see what your characters are though, since we might
perhaps expect failure if they're non-Chinese characters such as kana or
Greek or Cyrillic or accented Italian or whatever.)
 
OK, I ran approximately this test. The log file contains a lot of lines.
After every line of ordinary debugging information, there is a line with
STRLEN and OUTCHARS exactly as defined by Cezary Noweta.

After every line that doesn't contain a username, the values of STRLEN and
OUTCHARS are equal.

After every line that does contain a username, the value of STRLEN is what
it should be if the value of szBuf includes the entire formatted string,
i.e. some constant text before the username, the username itself, and some
constant text after the username. However, the value of OUTCHARS is -1.
The value of OUTCHARS isn't even the number of characters that _ftprintf_s
wrote before aborting, the value is -1.

So _vsnwprintf_s isn't broken, but at the moment _ftprintf_s seems to be
broken. _ftprintf_s might not be broken though, if the thing is executing
in the "C" locale as someone guessed. I'll have to figure that out next.

I had to use the unsafe code
outchars = _ftprintf_s(pf, szBuf);
as suggested by Cezary Noweta instead of the safer code
_fputts(szBuf, pf);
as recommended by Kalle Olavi Niemitalo because when _fputts succeeds it
returns a nonzero value which doesn't have to match the number of
characters.

After the above experiment, I tried another one. Using Notepad, I created
the log file in Unicode with no text. But _tfopen_s with _T("a") did not
inspect the existing file to decide whether to keep Unicode as Unicode, it
barged ahead and converted Unicode to ANSI and wrote the ANSI. Then opening
the result in Notepad, since the BOM was still there, Notepad faithfully
tried to display garbage ^_^

Now I have to add some calls to find out what locale the thing is executing
in at the time, is it the Chinese Hong Kong locale (matching the system
locale and user locale) or is it the "C" locale.
 
It does get worse.

I deleted the file and then ran the program with this code:
_tfopen_s(&pf, LOG_FILE_NAME, _T("a, ccs=UNICODE"));

http://msdn2.microsoft.com/en-us/library/z5hh6ee9(VS.80).aspx
* The flag is only used when no BOM is present or if the file is a new
* file.

That is a lie. _tfopen_s created a new file and it created the thing with
ANSI encoding not Unicode.

I deleted the file again, created a file in Notepad containing only an empty
line (CR-LF pair), saved it in Unicode, and then again ran the program with
this code:
_tfopen_s(&pf, LOG_FILE_NAME, _T("a, ccs=UNICODE"));

http://msdn2.microsoft.com/en-us/library/z5hh6ee9(VS.80).aspx
* If mode is "a, ccs=<encoding>", fopen_s will first try to open the file
* with both read and write access. If it succeeds, it will read the BOM to
* determine the encoding for this file;

This time _tfopen_s seems to have performed correctly. Now let's continue.

http://msdn2.microsoft.com/en-us/library/z5hh6ee9(VS.80).aspx
* When a Unicode stream-I/O function operates in text mode (the default),
* the source or destination stream is assumed to be a sequence of multibyte
* characters. Therefore, the Unicode stream-input functions convert
* multibyte characters to wide characters (as if by a call to the mbtowc
* function). For the same reason, the Unicode stream-output functions
* convert wide characters to multibyte characters (as if by a call to the
* wctomb function).

In other words, it doesn't matter if _tfopen_s performed correctly because
_ftprintf_s is still going to screw it up. Let's look for confirmation of
this screw-up.

http://msdn2.microsoft.com/en-us/library/c4cy2b8e(VS.80).aspx
* For the same reason, the Unicode stream-output functions convert wide
* characters to multibyte characters (as if by a call to the wctomb
* function).

Yup, no provision at all for keeping Unicode as Unicode.

However, both of those are half-lies. Half the time, _ftprintf_s violated
MSDN and it kept Unicode as Unicode in the spirit (but not the letter) of
http://msdn2.microsoft.com/en-us/library/z5hh6ee9(VS.80).aspx.
The other half of the time, _ftprintf_s screwed up worse.

Notepad opened the file in Unicode. The display alternates, a bunch of
readable lines, a few lines of garbage, a bunch of readable lines, a few
lines of garbage, etc.

It seems that ccs=UNICODE is unusable. It changes the result from being
mostly readable (with a little bit of lossage) to being half readable (with
half garbage).
 
It gets even more worse.

I added this call:
_ftprintf_s(pf, _T("%s\n"), _tsetlocale(LC_CTYPE, _T("")));
The output was:
Chinese_Hong Kong S.A.R..950

So there is absolutely no excuse for _ftprintf_s to screw up on Chinese
characters. The DLL is not running in the C locale, it's running in the
Chinese Hong Kong locale, code page 950, exactly as it should be.

Here's more MSDN stuff too.
http://msdn2.microsoft.com/en-us/library/x99tb11d(VS.80).aspx
* LC_CTYPE
* The character-handling functions (except isdigit, isxdigit, mbstowcs, and
* mbtowc, which are unaffected).

So mbtowc is one of the exceptions, it wouldn't have been affected even if
the C locale were in use, and presumably it would always use code page 950
and screw up because it's miscoded -- however, wctomb isn't one of the
exceptions, so it would have been affected if the C locale were in use, and
it would screw up differently from the way it actually screws up.

Anyway, thank you whoever it was who said that _vsnwprintf_s isn't broken
and _ftprintf_s. Sorry I found it hard to believe you. You're absolutely
right. _ftprintf_s is broken.
 
Back
Top