_vsnwprintf_s seems to be broken

  • Thread starter Thread starter Norman Diamond
  • Start date Start date
OMFG.

When I added this call:
_ftprintf_s(pf, _T("%s\n"), _tsetlocale(LC_CTYPE, _T("")));
it didn't query the DLL's current locale the way MSDN says it will. It SET
the current locale, and returned it:
Chinese_Hong Kong S.A.R..950

And, the result of this setting activity did affect the way wctomb operates.
And the result of that setting activity did affect the way _ftprintf_s
operates.

The result was that _ftprintf_s wrote the user name correctly.

In ANSI.

中文2

The good news is that there's a workaround for the breakage in _ftprintf_s.
The bad news is that I haven't finished learning how bad Windows can be.
 
Oh, I screwed up this last test. MSDN says my call to _tsetlocale() does
set the locale instead of querying. A null pointer does a query but a null
string sets it to the default from the OS.

OK, good news anyway, when setting to the default from the OS, it worked.

So I still don't know if the DLL actually started up in the C locale
instead, but I'm not in the mood to test it again.
 
Hello,
No, you're getting garbage because you're missing fonts and you couldn't
even display the original characters correctly. I looked them up this
morning so here they are:

No - this is good. This code sequence is Japanes ANSI (932) You have written that
your dev environment is Japanese and you coded your posts in ISO-2022-JP, so I have
sent JAP ANSI codes and not CHS ones.
This problem was reproduced in Chinese Vista
with no internationalization whatsoever.
Nonetheless, if wide printf foos stop output because they are too stupid to
understand their own native default built-in code page after not being
customized at all, then I understand your suggestion that maybe the breakage
occurs in _ftprintf_s instead of _vsnwprintf_s.

It does not matter what is your system ANSI CP. Also wide printf is not so stupid.
ISO states that:

"At program startup, the equivalent of
setlocale(LC_ALL, "C");
is executed."

That's all. Whan you want to play with mixed international language streams, and
especially with Unicde or double-byte character sets, then you must use setlocale().
I deleted the file and then ran the program with this code:
_tfopen_s(&pf, LOG_FILE_NAME, _T("a, ccs=UNICODE"));
http://msdn2.microsoft.com/en-us/library/z5hh6ee9(VS.80).aspx
* The flag is only used when no BOM is present or if the file is a new
* file.
That is a lie. _tfopen_s created a new file and it created the thing with
ANSI encoding not Unicode.
I deleted the file again, created a file in Notepad containing only an empty
line (CR-LF pair), saved it in Unicode, and then again ran the program with
this code:
_tfopen_s(&pf, LOG_FILE_NAME, _T("a, ccs=UNICODE"));

Everything is OK - look at the table below. When you are creating a new file you
should use "ccs=UTF-16LE" and not "ccs=UNICODE" as "UNICODE" creates ANSI files.
I added this call:
_ftprintf_s(pf, _T("%s\n"), _tsetlocale(LC_CTYPE, _T("")));
The output was:
Chinese_Hong Kong S.A.R..950

So now you know that you have set locale to your system codepage. setlocale(LC_CTYPE,
NULL) returns your current locale.
_ftprintf_s is broken.
The good news is that there's a workaround for the breakage in _ftprintf_s.

fwprintf is not broken. This is not workaround. This is normal way to achieve an
effect you wanted. Even according to the ISO standerd and not to MSDN or MS at all.

ISO states:

"The wide character output functions convert wide characters to multibyte characters
and write them to the stream as if they were written by successive calls to the
fputwc function. Each conversion occurs as if by a call to the wcrtomb function, with
the conversion state described by the stream’s own mbstate_t object. The byte output
functions write characters to the stream as if by successive calls to the fputc
function."

and

"An encoding error occurs if the character sequence presented to the underlying
mbrtowc function does not form a valid (generalized) multibyte character, or if the
code value passed to the underlying wcrtomb does not correspond to a valid
(generalized) multibyte character. The wide character input/output functions and the
byte input/output functions store the value of the macro EILSEQ in errno if and only
if an encoding error occurs."

After your _ftprintf() has been executed within "C" locale, errno contains EILSEQ.
After you have called setlocale(LC_CTYPE, "") your locale is set to your system
codepage and _ftprintf() works OK. Everything is OK.
The bad news is that I haven't finished learning how bad Windows can be.

Yea, but not this time ;)

-- best regards

Cezary Noweta
 
Norman Diamond said:
Um, so what?

So the program can dereference an uninitialized pointer and crash
if it cannot open the log file. I understand it has been able to
open the file in your experiments; but the possible failure
should be properly handled.
 
Norman Diamond said:
I had to use the unsafe code
outchars = _ftprintf_s(pf, szBuf);

Surely you could fix the bug with _ftprintf_s(pf, _T("%s"), szBuf).

(In standard C, L"%s" in a format string takes a char * argument,
just like "%s". In Microsoft C, it instead takes a wchar_t *
argument. I don't know if there is a way for Microsoft to
correct this violation without breaking countless programs.)
 
Kalle Olavi Niemitalo said:
So the program can dereference an uninitialized pointer and crash if it
cannot open the log file. I understand it has been able to open the file
in your experiments; but the possible failure should be properly handled.

You mean the "if ( pf ) {" line which you have correctly quoted several
times?

If pf were NULL then the thing wouldn't stop outputting when it hits the
username, the thing wouldn't have output anything at all.
 
Cezary Noweta said:
No - this is good. This code sequence is Japanes ANSI (932) You have
written that your dev environment is Japanese and you coded your posts in
ISO-2022-JP, so I have sent JAP ANSI codes and not CHS ones.

The default coding for new posts is ISO-2022-JP. The default for followup
posts is to use the encoding of the post that is being quoted. But you set
the encoding of your posts to Central European.

The program was running in a Chinese environment where the system and user
code page was 950.

14 bytes of text is not 14 chars of text.

Internal to the program, the coding was Unicode not ANSI. Ordinarily it
wouldn't convert to ANSI until _ftprintf_s writes to a file. From this
discussion I learned that even _ftprintf_s won't convert it to ANSI unless
the program does a call to use the system's code page instead of C locale.
It does not matter what is your system ANSI CP. Also wide printf is not so
stupid. ISO states that:

"At program startup, the equivalent of
setlocale(LC_ALL, "C");
is executed."

That's all.

I understand that now. Thank you.
Whan you want to play with mixed international language streams, and
especially with Unicde or double-byte character sets, then you must use
setlocale().

You still don't understand this part of it though. There was no mixing of
international language streams. The execution environment was 100% Chinese.
And the need to use setlocale() applies to single-byte character sets as
much as it does to double-byte character sets. _ftprintf_s should fail on
several characters in your language when using your code page, and
_ftprintf_s should fail on the English character £ when using Western
Europe's code page, just as quickly as it failed on Chinese characters when
using a Chinese code page.

Everyone has to call setlocale() and tell the CRT to use the system's code
page. Hopefully we both understand this now.
Everything is OK - look at the table below. When you are creating a new
file you should use "ccs=UTF-16LE" and not "ccs=UNICODE" as "UNICODE"
creates ANSI files.

Oh, you are right. The flag UNICODE means ANSI. I wonder why the flag ANSI
doesn't mean UNICODE. Who allowed some programmer to develop a CRT without
a flag named ANSI and meaning UNICODE? At least it's a relief to see that I
wasn't the least competent programmer in this discussion ^_^

Anyway now I understand to put this at the beginning of every program:
_tsetlocale(LC_CTYPE, _T(""));
Now I wonder how to find out if it's safe to call this from DllMain.
 
Kalle Olavi Niemitalo said:
Surely you could fix the bug with _ftprintf_s(pf, _T("%s"), szBuf).

No, the reason this entire discussion started was because _T("%s") fails.
Finally we understand the reason why _T("%s") fails (it was my fault for not
adding a call to setlocale). Nonetheless changing _T("%s") to _T("%s")
wouldn't fix it.
 
14 bytes of text is not 14 chars of text.

Ooops, internally I read "char" as "character" instead of "char". I get a
C- today.

In C, 14 bytes of text is 14 chars of text. It isn't 14 characters and you
didn't say that it is. Sorry.
 
Norman Diamond said:
You mean the "if ( pf ) {" line which you have correctly quoted
several times?

The scenario is this:
FILE* pf;

This defines a local variable pf but does not initialize it. The
initial value is indeterminate.
_tfopen_s(&pf, LOG_FILE_NAME, _T("a"));

Now suppose this call fails due to lack of permissions.
_tfopen_s returns a nonzero error code, but the documentation
does not promise that it would set pf = NULL in this case.
So the value of pf can remain indeterminate.
if ( pf ) {

This tests pf. The indeterminate value can very well appear
non-NULL, in which case, execution enters the block in the
if-statement.
int outchars;
outchars = _ftprintf_s(pf, szBuf);

Here the indeterminate value of pf goes to _ftprintf_s, which is
likely to dereference the pointer and crash. (There is also the
format string problem.)
If pf were NULL then the thing wouldn't stop outputting when it hits
the username, the thing wouldn't have output anything at all.

Yes, as I wrote, the _tfopen_s succeeded in your experiments.
What I'm concerned with is the error handling for cases that you
did not test. Or were you going to remove this logging code from
the final version?
 
Norman Diamond said:
Anyway now I understand to put this at the beginning of every program:
_tsetlocale(LC_CTYPE, _T(""));
Now I wonder how to find out if it's safe to call this from DllMain.

setlocale() and _wsetlocale() are not exported from Kernel32.dll,
so you should assume that they are not safe to call from DllMain.
If you link the C runtime statically into your DLL and audit its
source code, then it may be safe, until the next library upgrade.

If you link to the C runtime DLL, then setlocale() can also
affect other modules of the program, thereby increasing
dependencies between them. With _create_locale() and
_ftprintf_s_l(), you could better isolate DLLs from each other.
 
Kalle Olavi Niemitalo said:
The scenario is this:


This defines a local variable pf but does not initialize it. The
initial value is indeterminate.


Now suppose this call fails due to lack of permissions.
_tfopen_s returns a nonzero error code, but the documentation
does not promise that it would set pf = NULL in this case.
So the value of pf can remain indeterminate.

Oh, where your previous message said "reset the pointer", it looked like you
were talking about the file pointer, i.e. what would be called a cursor in a
database system, i.e. the location that gets seeked to or sensed. Opening
for append mode, the file pointer would be reset to the end of the file,
unless the open fails.

But you meant the program's pointer variable, pf. You are right. The
standard fopen function returns a FILE* so there is no way to omit receiving
a null pointer when it fails, but _tfopen_s can omit that effect.

Lesson learned yet again: When reading existing code, the first step is NOT
to assume the existing code is correct. If existing code calls a variation
of an API that you haven't called yourself, look up the API, don't assume
the code is correct. Sigh.

Thank you.
Or were you going to remove this logging code from the final version?

The release version has calls to logging functions #define'd out of
existence, just like calls to assert. So it doesn't crash on this
particular code. Two wrongs made an accidental right.
 
Back
Top