/CLR Compilation: 125x slower when calling virtual function from DLL?

  • Thread starter Thread starter Jochen Kalmbach
  • Start date Start date
Greetings! I've been porting an application for Builder to VS .NET 2003 and
searching for possible bottlenecks (the application is currently running
slow).

I found out one scenario that takes a test function 125x more time to
execute when the DLL is compiled with the /CLR parameter then when compiled
without it.

Average timings:
-----------------------------
Compiled with /CLR: 18879
Compiled without /CLR: 151
Builder: 420

This *huge* difference appears when I call the function
"CallTheFunctionAndBenchmark()" from this DLL (there is more below):

..h
#ifdef DRAWDLL_EXPORTS
#define DRAWDLL_API __declspec(dllexport)
#else
#define DRAWDLL_API __declspec(dllimport)
#endif

class DRAWDLL_API CDrawDll {
public:
CDrawDll(void);
void CallTheFunctionAndBenchmark(void);
virtual int* gimmeAnInt(void);
int number;
}

..cpp:

#include "stdafx.h"
#include "DrawDll.h"
#include <stdio.h>

BOOL APIENTRY DllMain( HANDLE hModule,
DWORD ul_reason_for_call,
LPVOID lpReserved
)
{
return TRUE;
}

CDrawDll::CDrawDll()
{
number = 7;
return;
}

void CDrawDll::CallTheFunction(void)
{
int* ptrReceive;

__int64 m_ElapsedTime;
DWORD m_Retval;
LARGE_INTEGER m_StartCounter; // start time
LARGE_INTEGER m_EndCounter; // finish time

//Start profile
m_Retval = QueryPerformanceCounter (&m_StartCounter);
///-----------------------------------------------------------------

//10.000 calls to the function...
for(int i=0;i<10000;i++)
{
ptrReceive = gimmeAnInt();
};

//------------------------------------------------------------------
//End profile
m_Retval = QueryPerformanceCounter (&m_EndCounter);

// get and store finishing time and calc elapsed time(ticks)
m_ElapsedTime = (m_EndCounter.QuadPart - m_StartCounter.QuadPart );

char buff[255];
sprintf(buff, "%I64d\n", m_ElapsedTime);
OutputDebugString(buff);

//To try to avoid compiler from optimizing profiling code, read the data
sprintf(buff, "nr: %d\n", *ptrReceive);
OutputDebugString(buff);

};

int* CDrawDll::gimmeAnInt(void)
{
return &number;
};


One curious fact is that if the virtual function called is from the same
module (for instance, a function defined in the Form itself) this does not
happen... The thing goes fast! And the Windows Forms Executable is /CLR by
definition. this only happens when the function is inside a DLL.

I know CLR compilation is supposed to take longer, but *this* longer?

My program makes hard use of properties (__declspec( property( get =
GetXX...), and most of them are based on virtual functions calls. And the
program is all modularized (DLLs).

Why does this happen? And is there a way I can avoid this bottleneck?

Thanks for any help/clarification on this issue!
 
What I meant with
I know CLR compilation is supposed to take longer, but *this* longer?
is actually
I know CLR compiled code is supposed to take longer, but *this* longer?

Fabro
 
Thanks for your reply. I have verified that this is an issue with the Everett
compiler but this is fixed in beta1 whidbey. You can download beta1 from
msdn.microsoft.com/visualc.
Thanks,
Kapil
 
Please let me know if you need help getting beta1 whidbey compiler.
Thanks,
Kapil
 
Hi =?Utf-8?B?a2tob3NsYQ==?=,
Thanks for your reply. I have verified that this is an issue with the
Everett compiler but this is fixed in beta1 whidbey. You can download
beta1 from msdn.microsoft.com/visualc.

This is nice to know but does not help!
We are shipping our product since release of VS2003.
And we cannot switch to an beta-product or to an later release, because we
*must* maintain our shipped environment (.NET1.1)!

--
Greetings
Jochen

My blog about Win32 and .NET
http://blog.kalmbachnet.de/
 
Thanks for your reply. I have verified that this is an issue with the
Everett
compiler but this is fixed in beta1 whidbey. You can download beta1 from
msdn.microsoft.com/visualc.
Thanks,
Kapil


Thanks for the comments. I've downloaded whidbey beta 1 (Visual C++ 2005
Express Edition Beta) and run the same tests. After a few teaks to compile
the test properly, things indeed worked much better:

Timings:

Visual C++ 2005 Beta 1
----------------------------
With /CLR : 495
Without /CLR: 382

Visual C++ .NET 2003
----------------------------
With /CLR: 18879
Without /CLR: 151


Borland Builder 5.0 (for comparison)
----------------------------
Builder: 420

Any clues as to when will the final version of VS 2005 be available?

And as Jochen well said, unfortunately this doesn't solve the problems of
current projects...
Any chances of a service pack to fix such issue on 2003 or it's a big change
and will only be available on whidbey?

Thanks

Gustavo Fabro
gustavo_fabro%removethis%@hotmail.com
 
And as Jochen well said, unfortunately this doesn't solve the problems of
current projects...
Any chances of a service pack to fix such issue on 2003 or it's a big
change and will only be available on whidbey?

Sure. I think I can give you a workaround for it in everett.

From your code

#include <time.h>
#include <stdio.h>
/*******Note the pragma unmanaged *******/
#pragma unmanaged
class MyTest1
{
private:
int s_i;
virtual void Test1(int i)
{
s_i += i;
}



public:
void MakeTest()
{
clock_t start, finish;
double duration;

start = clock();
for(int i = 0; i< 10000000; i++)
{
Test1(i);
}
//EndTimeMeasure();
finish = clock();
duration = (double)(finish - start) / (CLOCKS_PER_SEC);
printf( "%2.1f seconds\n", duration );

}
};

int main()
{
MyTest1 obj;
obj.MakeTest();
}

When you compiled your native class with /clr, it causes what is called
double thunking. A call is made from a managed call site to a mangaged
function definition via a native vtable.

Yes, that is right. The vtable is native in this case so there is
performance penalty when you make a transition from managed - native and
native - managed. When I applied #pragma unmanaged over the function
definition, I reduce the thunking from native to managed and thus saved the
performance penalty. On my everett compiler this code took .4 seconds vs
13.5 seconds if I dont use #pragma unmanaged.



Please let me know if this fixes your problem.

Thanks, Kapil
 
Kapil,
Sure. I think I can give you a workaround for it in everett.
/*******Note the pragma unmanaged *******/
#pragma unmanaged
class MyTest1
When you compiled your native class with /clr, it causes what is called
double thunking. A call is made from a managed call site to a mangaged
function definition via a native vtable.

Yes, that is right. The vtable is native in this case so there is
performance penalty when you make a transition from managed - native and
native - managed. When I applied #pragma unmanaged over the function
definition, I reduce the thunking from native to managed and thus saved
the performance penalty. On my everett compiler this code took .4 seconds
vs 13.5 seconds if I dont use #pragma unmanaged.

Hmmmm that's why!
Please let me know if this fixes your problem.

In the testing scenario it indeed solved the issue. It will require some
work to
add the #pragma directive to the 1.119 cpp files and correspondent .h files
but, heck, what are trainees for, anyway? ;)

In the background I'm compiling my code in VS 2005 to see if that alone was
the
big performance issue or if anything new comes up. On any case I will post
the
results later...

Thanx!

Gustavo Fabro
gustavo_fabro%removethis%@hotmail.com
 
If all your cpp files start with #include "stdafx.h" you could use the visual
studio replace in files to add "#pragma unmanaged" before it automatically
rather than editing a thousand files.
 
Back
Top