You're right, it's not as simple as just -ZI. -Zi also seems to be
sufficient, as well as -O2. In any case, having at least one concrete
repro will allow someone to triage the bug. Watch it on product feedback
center - you should see something within a few days.
Just stepped through the assembly and found something strange after I
tailored the code to the following. (also turned of security check to not
get in to the way). My comments are inlined.
__declspec(align(16)) BYTE buff[1234];
#pragma omp parallel private(buff) num_threads(2)
{
#pragma omp for
for(int i = 0; i < 2; i++)
{
_mm_store_si128((__m128i*)&buff[16], _mm_setzero_si128());
}
}
When the debugger reached this multi-threaded code then I saw this:
#pragma omp for
for(int i = 0; i < 2; i++)
00401000 push ebp
00401001 mov ebp,esp
00401003 and esp,0FFFFFFF0h
00401006 sub esp,4E0h
buff was just aligned to 16 bytes, fine!
0040100C lea eax,[esp]
0040100F push eax
00401010 lea ecx,[esp+8]
00401014 push ecx
00401015 push 1
00401017 push 1
00401019 push 1
0040101B push 0
0040101D call _vcomp_for_static_simple_init (40710Ah)
It pushed 6 arguments on stack (esp -= 0x18) and called
_vcomp_for_static_simple_init.
00401022 mov ecx,dword ptr [esp+1Ch]
00401026 mov eax,dword ptr [esp+18h]
0040102A add esp,18h
Looks like it is cleaning up the stack, esp += 0x18
0040102D cmp ecx,eax
0040102F jg wmain$omp$1+4Bh (40104Bh)
00401031 sub eax,ecx
00401033 pxor xmm0,xmm0
00401037 add eax,1
0040103A lea ebx,[ebx]
00401040 sub eax,1
#include <windows.h>
#include <omp.h>
#include <xmmintrin.h>
#include <emmintrin.h>
int _tmain(int argc, _TCHAR* argv[])
{
__declspec(align(16)) BYTE buff[1234];
buff[0] = 0;
#pragma omp parallel private(buff) num_threads(2)
{
#pragma omp for
for(int i = 0; i < 2; i++)
{
_mm_store_si128((__m128i*)&buff[16], _mm_setzero_si128());
00401043 movdqa xmmword ptr [esp+18h],xmm0
Storing at "esp+18h" ???? esp is aligned to 16 bytes right now
(esp=0x0012fa00), what's that +18h doing there? Also, this 18h looks
familiar, could it be a coincidence?
00401049 jne wmain$omp$1+40h (401040h)
{
#pragma omp for
for(int i = 0; i < 2; i++)
0040104B call _vcomp_for_static_end (407104h)
00401050 call _vcomp_barrier (4070FEh)
}
00401055 mov esp,ebp
00401057 pop ebp
00401058 ret