S
Steve McLellan
Hi,
We've found some pretty serious performance hits that we didn't expect in a
mixed mode C++ application. The number crunching bits of our algorithms are
compiled with #pragma unmanaged. They call a number of inline functions
elsewhere, and from the documentation my understanding was that inlined
function calls are supposed to be effectively replaced by their
implementation at compile time. This would mean that inline functions used
in a #pragma unmanaged function would also be compiled unmanaged.
However, having written a short test, this appears not to be the case -
putting #pragmas in the header affects how the inlined functions are
compiled (and affects running time enormously).
So given the code at the bottom of this post, my understanding would be the
following:
- Calling from a managed function, the first function should be as quick as
NumberCrunching_InlineFunction_InNoGCClass_Managed, which is compiled as
managed.
- Calling from an unmanaged function, the first should be as fast as
NumberCrunching_InlineFunction_InNoGCClass_Unmanaged, which is specified to
be unmanaged.
Given that it is apparently possible to influence how inlined functions are
compiled in the header, and that our inlined functions may be called from
both managed and unmanaged functions, what are we supposed to do? And
equally, how does all this apply to templated functions (which should be
compiled as they're used)? We're pretty sure that instead of the inlining
giving us good performance, it's causing masses of transitions between
managed / unmanaged processes. Any help would really be appreciated, as
would any links to info.
Thanks!
Steve
### CODE ###
__nogc class UnmanagedClass
{
public:
// This function isn't specified how to compile
__forceinline void
NumberCrunching_InlineFunction_InNoGCClass_LeftToItsOwnDevices()
{
for ( int i = 0 ; i < 1000000 ; i++ )
{
double d = sqrt( 69765.43556 ) * log( 9032425.543535 ) / sqrt( log
( exp( 3.65464 ) ) );
}
}
#pragma unmanaged
__forceinline void NumberCrunching_InlineFunction_InNoGCClass_Unmanaged()
{
for ( int i = 0 ; i < 1000000 ; i++ )
{
double d = sqrt( 69765.43556 ) * log( 9032425.543535 ) / sqrt( log
( exp( 3.65464 ) ) );
}
}
#pragma managed
__forceinline void NumberCrunching_InlineFunction_InNoGCClass_Managed()
{
for ( int i = 0 ; i < 1000000 ; i++ )
{
double d = sqrt( 69765.43556 ) * log( 9032425.543535 ) / sqrt( log
( exp( 3.65464 ) ) );
}
}
};
We've found some pretty serious performance hits that we didn't expect in a
mixed mode C++ application. The number crunching bits of our algorithms are
compiled with #pragma unmanaged. They call a number of inline functions
elsewhere, and from the documentation my understanding was that inlined
function calls are supposed to be effectively replaced by their
implementation at compile time. This would mean that inline functions used
in a #pragma unmanaged function would also be compiled unmanaged.
However, having written a short test, this appears not to be the case -
putting #pragmas in the header affects how the inlined functions are
compiled (and affects running time enormously).
So given the code at the bottom of this post, my understanding would be the
following:
- Calling from a managed function, the first function should be as quick as
NumberCrunching_InlineFunction_InNoGCClass_Managed, which is compiled as
managed.
- Calling from an unmanaged function, the first should be as fast as
NumberCrunching_InlineFunction_InNoGCClass_Unmanaged, which is specified to
be unmanaged.
Given that it is apparently possible to influence how inlined functions are
compiled in the header, and that our inlined functions may be called from
both managed and unmanaged functions, what are we supposed to do? And
equally, how does all this apply to templated functions (which should be
compiled as they're used)? We're pretty sure that instead of the inlining
giving us good performance, it's causing masses of transitions between
managed / unmanaged processes. Any help would really be appreciated, as
would any links to info.
Thanks!
Steve
### CODE ###
__nogc class UnmanagedClass
{
public:
// This function isn't specified how to compile
__forceinline void
NumberCrunching_InlineFunction_InNoGCClass_LeftToItsOwnDevices()
{
for ( int i = 0 ; i < 1000000 ; i++ )
{
double d = sqrt( 69765.43556 ) * log( 9032425.543535 ) / sqrt( log
( exp( 3.65464 ) ) );
}
}
#pragma unmanaged
__forceinline void NumberCrunching_InlineFunction_InNoGCClass_Unmanaged()
{
for ( int i = 0 ; i < 1000000 ; i++ )
{
double d = sqrt( 69765.43556 ) * log( 9032425.543535 ) / sqrt( log
( exp( 3.65464 ) ) );
}
}
#pragma managed
__forceinline void NumberCrunching_InlineFunction_InNoGCClass_Managed()
{
for ( int i = 0 ; i < 1000000 ; i++ )
{
double d = sqrt( 69765.43556 ) * log( 9032425.543535 ) / sqrt( log
( exp( 3.65464 ) ) );
}
}
};