Lessons from optimizing a slow spreadsheet

  • Thread starter Thread starter Robbro
  • Start date Start date
R

Robbro

I've been working on a report that contains YTD sales and cost info. Its
about 4 megs, took a little over 3 minutes to fully recalculate (I know, many
around here have larger/slower ones, but thats not the point).
The primary data tab has about 4k rows and 50 or so columns, there are 2
reports (top 10 customers and "others) which are about 400 rows and 3000 rows
respectively and 37 columns, which to begin with were had about 8 sum
products and 7 or so sumifs, one vlookup and the rest were sum or
multiplication. There are 2 date fields up top to enter beg and ending to
pull specific month's or quarters out of the data.
At first each sumifs or sumproduct had 4 conditions, it matched customer,
lot #, date after beg date and date before end date. My first step was to
add 2 calculation columns called include to the data tab (1 for top 10 report
1 for other report) that was as follows if(and(transaction date>=beg
date,transaction date<=end date),1,0) and reduced the sumifs and sumproducts
to 3 conditions, customer, lot #, include=1. That alone reduced my calc time
by about 50% to just over 1.5 min.
Next I added calculation columns to the data tab to eliminate the need for
sumproducts and replaced all sumproducts with sumifs, this reduced my calc
time to under 1 min.
Finally I had about 15 dynamic named ranges, I calculated how many rows down
my data went just once in a cell using counta(a:a) then referred to that cell
in all my ranges. I had used this to limit the # of rows sumproduct had to
deal with, but since I had replaced all sumproducts with sumifs, and I had
read sumifs can intelligently handle whole column references and not waste
time on empty cells, I changed my sumifs to use entire columns and deleted
all named ranges, this saved me another 30 seconds, and got me to 28 sec for
a full recalc, which seems much more reasonable to me.

It seems with a lot of sumifs or sumproducts each extra condition increases
recalc time nearly exponentially.

My final idea which I have not tried yet and dont really know if it will
help would be to sort my data by date, calculate the rows included in each
month just once and have my sumifs use that range only, eliminating the need
the third condition, so then they would only have to match customer and lot
#. Again this seems the most complicated and I dont know if I would gain or
lose from it.
 
Check out this link on optomizing speed, calculations and memory...

http://www.decisionmodels.com/index.htm

Dynamic named ranges using offset are volatile meaning that the sumproducts
using them are always dirty and require calculation. Generally I use dynamic
named ranges for vlidation lists or pivot table ranges. If you use them with
calculations the calculations had better be very light weight or you will see
performance drop.

Based on your descripiton have you considered using pivot tables. That would
take your calculation overhead down to just about zero.
 
Yes, I have considered, but gave up fairly quickly, we have a number of
issues that I'm not sure can be easily incorporated into pivot tables. We
have multiple plants so we have customer by product with summary by plant
then customer summary with returns and allowances figured into some numbers,
some rebates included at times, manufacturing and material variances and just
a whole host of issues in the summary area that at least with my limited
knowledge of pivot tables would exclude them from use. I've actually started
the project over in access with some nice results, its a bit of a pain to set
up but once its done it seems to be what we are looking for. The final
layout and information though are still constantly being changed, and once
all that is finalized it will probably be an access report instead of a
spreadsheet if those above will allow.
 
Back
Top