Excel performance through COM

  • Thread starter Thread starter Fabz
  • Start date Start date
F

Fabz

Hi there,

I am writing COMAddIns and Automation AddIns for Excel 2007 (with C#).
However, except a few sites, I cannot find any very helpful resources on
performance.

1. Does anyone know where I can find a precise description what accessing
the Excel Object Model through a COM interface really does? Am I accessing
"proxy objects" thus in fact marshalling calls from my C# code to Excel?

2. When calling myRange.Value2 do I in fact marshal a call through the COM
interface or not?

3. Question 2 leads to another question: Is is correct that using
Worksheetfunction.sum(myRange, ...) (1 marshalled call for all cells in the
range together) is much faster than using a loop and summing the cell values
in my own C# code (1 marshalled call for every single cell in the range)?

4. When providing an Excel Range to a Worksheetfunction in my C# code, does
Excel actually remarshal the Excel Range and send it back to the Excel
application (e.g. copying all the values in the range) or does Excel only
send the cell reference (e.g. "A1:B10") to Excel and let Excel then access
the Range? Is there any way of doing something in my C# code like
Worksheetfunction.sum("A1:B10", System.Reflection.Missing.Value, ...)?

5. Does shimming my COMAddIn/Automation AddIn actually increase or decrease
performance (or doesn't it matter)?

Thanks for your answers.
Fabz
 
I can only comment on your Q.3

Most of Excel's worksheet functions are extremely efficient when called in
cell formulas. However in code the call to the worksheetfunction library is
relatively slow, possibly slower with your C# addin even than in VBA.
If processing a small to medium qty of cells it's probably faster if you
roll your own function. With a large number though even with delay of
calling the function overall speed might be faster. Best way is to test with
a simple Sum routine.

Read and particularly Write to individual cells is slow. Generally faster to
read the entire range to an array, do your stuff, and write back the result

myArray = myRange.Value2 ' 2d array
loop myArray


For questions about Excel's object model you have come to the right place.
You may also find it simpler to devise routines in Excel VBA before adapting
to C#. For the rest of your questions, if you don't get answers here it
might be worth asking in a forum more dedicated to C# addins for Excel.

Regards,
Peter T
 
Fabz,

Unfortunately the performance of c# addins is extremely poor with Excel.
Much but not all of this is due to the COM Interop Layer.

I am not a .NET expert but here are the results of my investigations so far

As with VBA/VB6 you need to transfer data into and out of Excel in as large
a block as possible,
this line is the fastest .NET data transfer method I have found so far:

vArr = DirectCast(vntTheList.Value2, Object(,))

Last Nonzero function timing comparison
VBA 2.2
VB6 1.6
c# 13.6
Vb.net 8.6

here is the Last non-zero VB .NET function

Public Function LNZVBNet(ByVal vntTheList As XL.Range) As Double
Dim lCounter As Long
Dim vArr As Object(,)
LNZVBNet = 0

vArr = DirectCast(vntTheList.Value2, Object(,))

For lCounter = UBound(vArr, 1) To LBound(vArr, 1) Step -1
If vArr(lCounter, 1) <> 0.0 Then
LNZVBNet = vArr(lCounter, 1)
Exit For
Else
End If
Next lCounter

End Function

You will get much better performance (but still a lot slower than VBA/VB6)
if you use one of the .Net Excel interface tools that use the C API/XLL
interface
(Addin Express, ExcelDNA ...)

AverageTol function timing
VBA 109
VB6 63
C XLL 37
Addin Express Automation VB.net 170
Addin Express XLL VB.net 100
ExcelDNA XLL CVB.Net 81

In VBA if you want to use Excel native worksheet functions it is much faster
to keep the range as a range object rather than transfer it into an array.
I would imagine that the same is true with .Net (but more so with .Net
because the transfer is slower)


Charles
___________________________________
The Excel Calculation Site
http://www.decisionmodels.com
 
Okay, thanks for your help. I think I will first go the casual route and see
whether there will be performance problems occurring at all.

But I am still wondering whether there is a COM specification for Excel.
There surely must be something like this somewhere (otherwise, how are you
supposed to build efficient AddIns?), but I cannot find it.
 
What is very interesting is the comparison of the following two UDFs
(written as Automation AddIn in C#), iterating through a cell range of
10'000 cells:

public double LoopExample1(Excel.Range range)
{
DateTime before = DateTime.Now;
foreach (Excel.Range r in range)
{
double d = (double)r.Value2 + 1;
}
DateTime after = DateTime.Now;
TimeSpan duration = after - before;
return duration.TotalMilliseconds;
}

public double LoopExample2(Excel.Range range)
{
DateTime before = DateTime.Now;
Object[,] rng = range.get_Value(System.Reflection.Missing.Value) as
Object[,];
foreach (Object obj in rng)
{
double d = (double)obj + 1;
}
DateTime after = DateTime.Now;
TimeSpan duration = after - before;
return duration.TotalMilliseconds;
}

public double LoopExample3(Excel.Range range)
{
DateTime before = DateTime.Now;
foreach (Excel.Range r in range.Cells)
{
double d = (double)r.Value2 + 1;
}
DateTime after = DateTime.Now;
TimeSpan duration = after - before;
return duration.TotalMilliseconds;
}

On my machine, the first UDF (LoopExample1) requires around 375 milliseconds
to finish, the (LoopExample2) around 16 milliseconds and the third
(LoopExample3) around 359 milliseconds! Thus, it seems to be much faster to
actually use a 2D-Object array instead of going through the implicit
enumerator inside the foreach loop. (This website seems to confirm these
assumptions: http://dotnetperls.com/excel-interop.)

Be aware that range.get_Value does not belong to the official API specs
published for the Range interface by MS.
 
Yes, as we said, getting the data out of excel into an array in as large a
block a possible is the way to go.

But .Net is still very slow compared to VBA and VB6 and C++.
Although you can speed up the .Net performance by going through the XLL/C
API interface

Charles
___________________________________
The Excel Calculation Site
http://www.decisionmodels.com
 
I did some more tests, summing up 20'000 doubles:

public double SumExample1(Excel.Range range)
{
DateTime before = DateTime.Now;
Excel.WorksheetFunction wsFunc = range.Application.WorksheetFunction;
double d = wsFunc.Sum(range,
System.Reflection.Missing.Value, System.Reflection.Missing.Value,
System.Reflection.Missing.Value, System.Reflection.Missing.Value,
System.Reflection.Missing.Value, System.Reflection.Missing.Value,
System.Reflection.Missing.Value, System.Reflection.Missing.Value,
System.Reflection.Missing.Value, System.Reflection.Missing.Value,
System.Reflection.Missing.Value, System.Reflection.Missing.Value,
System.Reflection.Missing.Value, System.Reflection.Missing.Value,
System.Reflection.Missing.Value, System.Reflection.Missing.Value,
System.Reflection.Missing.Value, System.Reflection.Missing.Value,
System.Reflection.Missing.Value, System.Reflection.Missing.Value,
System.Reflection.Missing.Value, System.Reflection.Missing.Value,
System.Reflection.Missing.Value, System.Reflection.Missing.Value,
System.Reflection.Missing.Value, System.Reflection.Missing.Value,
System.Reflection.Missing.Value, System.Reflection.Missing.Value,
System.Reflection.Missing.Value);
DateTime after = DateTime.Now;
TimeSpan duration = after - before;
return duration.TotalMilliseconds;
}

public double SumExample2(Excel.Range range)
{
DateTime before = DateTime.Now;
Object[,] rng = range.get_Value(System.Reflection.Missing.Value) as
Object[,];
double d = 0;
foreach (Object obj in rng)
{
d += (double)obj;
}
DateTime after = DateTime.Now;
TimeSpan duration = after - before;
return duration.TotalMilliseconds;
}


The first eaample finished in 47 milliseconds, the second in 16
milliseconds. However, one should be careful with interpreting these
results: Worksheetfunctions that compute complicated mathematical results
might possibly end up being computed faster than computing the same results
in C# code (because of various internal optimizations). However, I have not
tried this myself.

Fabz


Fabz said:
What is very interesting is the comparison of the following two UDFs
(written as Automation AddIn in C#), iterating through a cell range of
10'000 cells:

public double LoopExample1(Excel.Range range)
{
DateTime before = DateTime.Now;
foreach (Excel.Range r in range)
{
double d = (double)r.Value2 + 1;
}
DateTime after = DateTime.Now;
TimeSpan duration = after - before;
return duration.TotalMilliseconds;
}

public double LoopExample2(Excel.Range range)
{
DateTime before = DateTime.Now;
Object[,] rng = range.get_Value(System.Reflection.Missing.Value) as
Object[,];
foreach (Object obj in rng)
{
double d = (double)obj + 1;
}
DateTime after = DateTime.Now;
TimeSpan duration = after - before;
return duration.TotalMilliseconds;
}

public double LoopExample3(Excel.Range range)
{
DateTime before = DateTime.Now;
foreach (Excel.Range r in range.Cells)
{
double d = (double)r.Value2 + 1;
}
DateTime after = DateTime.Now;
TimeSpan duration = after - before;
return duration.TotalMilliseconds;
}

On my machine, the first UDF (LoopExample1) requires around 375
milliseconds to finish, the (LoopExample2) around 16 milliseconds and the
third (LoopExample3) around 359 milliseconds! Thus, it seems to be much
faster to actually use a 2D-Object array instead of going through the
implicit enumerator inside the foreach loop. (This website seems to
confirm these assumptions: http://dotnetperls.com/excel-interop.)

Be aware that range.get_Value does not belong to the official API specs
published for the Range interface by MS.
 
Back
Top