Multiple (as in many) regressions

  • Thread starter Thread starter lk
  • Start date Start date
L

lk

Hi all

Does anyone have a quick routine for running multiple regressions over a
range of files in Excel?

I have a whole load of .xls files. Each contain data in columns D and E.
D is always the independent (x axis) and E is always the dependent (y axis)
but each file will have a different number of rows. I've never been too
fond of .xlDown in VBA and a clumsy routine to count until the first blank
row would take a while with over 5,000 files!

I need a routine to open up each file, feed the ranges in D and E into the
Regression routine in the Data Analysis toolpak, create the outputs but
(crucially) dump the output somewhere. With 5,000 files, this means 5,000
regressions and therefore 5,000 worksheets in the results workbook. I'm
not sure what the limit is nowadays - used to be 256 sheets per book when I
last got anywhere near it.

What I really need is a table showing (for each regression) what the
"important" results were i.e. the intercept, the coefficient and the
associated goodness of fit.

If anyone can help, that would be great.

Thx
lk
 
lk has brought this to us :
Hi all

Does anyone have a quick routine for running multiple regressions over a
range of files in Excel?

I have a whole load of .xls files. Each contain data in columns D and E. D
is always the independent (x axis) and E is always the dependent (y axis) but
each file will have a different number of rows. I've never been too fond of
.xlDown in VBA and a clumsy routine to count until the first blank row would
take a while with over 5,000 files!

I need a routine to open up each file, feed the ranges in D and E into the
Regression routine in the Data Analysis toolpak, create the outputs but
(crucially) dump the output somewhere. With 5,000 files, this means 5,000
regressions and therefore 5,000 worksheets in the results workbook. I'm not
sure what the limit is nowadays - used to be 256 sheets per book when I last
got anywhere near it.

What I really need is a table showing (for each regression) what the
"important" results were i.e. the intercept, the coefficient and the
associated goodness of fit.

If anyone can help, that would be great.

Thx
lk

Firstly, to find the last row of data you should start at the last cell
of a column and use .End(xlUp).Row to find the last entry in the column
regardless if the data is contiguous.

Secondly, I don't recommend you open over 5000 Excel workbooks to just
grab the contents of Cols D:E. You might want to consider using ADODB
to grab the data (from closed workbooks) into a recordset that you can
manipulate however you like. Examples of how to do this can be
downloaded here...

http://www.appspro.com/conference/DatabaseProgramming.zip

I'm not sure why anyone would store output data in an Excel file as a
storage container. Normally, outputs are written to DAT, TXT, or CSV
files so the data can be utilized by any program via its normal file
I/O functions. Excel files take up way more storage space than plain
text files and so is not an efficient format for any program to output
raw data to. Not saying exporting to Excel is a bad practice for
analysis purposes, just that it's not an efficient way to store data
for use by other software using that data. So.., if the files actually
are CSVs then the specific data you need to pass to Analysis Toolpak
functions can be manipulated fairly easily using VB's normal file I/O
functions to load the data into an array where it can also be
easily/quickly parsed as needed for further use. Since all of this
happens in memory (as apposed to physically opening each file) it's
blazingly faster than working with Excel workbooks and using ADODB to
grab data from closed workbooks.
 
"GS" wrote in message
lk has brought this to us :
Hi all

Does anyone have a quick routine for running multiple regressions over a
range of files in Excel?

I have a whole load of .xls files. Each contain data in columns D and E.
D is always the independent (x axis) and E is always the dependent (y
axis) but each file will have a different number of rows. I've never
been too fond of .xlDown in VBA and a clumsy routine to count until the
first blank row would take a while with over 5,000 files!

I need a routine to open up each file, feed the ranges in D and E into the
Regression routine in the Data Analysis toolpak, create the outputs but
(crucially) dump the output somewhere. With 5,000 files, this means
5,000 regressions and therefore 5,000 worksheets in the results workbook.
I'm not sure what the limit is nowadays - used to be 256 sheets per book
when I last got anywhere near it.

What I really need is a table showing (for each regression) what the
"important" results were i.e. the intercept, the coefficient and the
associated goodness of fit.

If anyone can help, that would be great.

Thx
lk

Firstly, to find the last row of data you should start at the last cell
of a column and use .End(xlUp).Row to find the last entry in the column
regardless if the data is contiguous.

Secondly, I don't recommend you open over 5000 Excel workbooks to just
grab the contents of Cols D:E. You might want to consider using ADODB
to grab the data (from closed workbooks) into a recordset that you can
manipulate however you like. Examples of how to do this can be
downloaded here...

http://www.appspro.com/conference/DatabaseProgramming.zip

I'm not sure why anyone would store output data in an Excel file as a
storage container. Normally, outputs are written to DAT, TXT, or CSV
files so the data can be utilized by any program via its normal file
I/O functions. Excel files take up way more storage space than plain
text files and so is not an efficient format for any program to output
raw data to. Not saying exporting to Excel is a bad practice for
analysis purposes, just that it's not an efficient way to store data
for use by other software using that data. So.., if the files actually
are CSVs then the specific data you need to pass to Analysis Toolpak
functions can be manipulated fairly easily using VB's normal file I/O
functions to load the data into an array where it can also be
easily/quickly parsed as needed for further use. Since all of this
happens in memory (as apposed to physically opening each file) it's
blazingly faster than working with Excel workbooks and using ADODB to
grab data from closed workbooks.

--
Garry

Free usenet access at http://www.eternal-september.org
ClassicVB Users Regroup! comp.lang.basic.visual.misc


If it's any help - I have the whole dataset in a 15,000,000 row csv file.
I'm quite happy to manipulate if from there
but couldn't see how to get Excel to read chunks of it, analyse that chunk
and then move on to the next chunk.
 
Hi all

Does anyone have a quick routine for running multiple regressions over a
range of files in Excel?

I have a whole load of .xls files.   Each contain data in columns D andE.
D is always the independent (x axis) and E is always the dependent (y axis)
but each file will have a different number of rows.   I've never been too
fond of .xlDown in VBA and a clumsy routine to count until the first blank
row would take a while with over 5,000 files!

I need a routine to open up each file, feed the ranges in D and E into the
Regression routine in the Data Analysis toolpak, create the outputs but
(crucially) dump the output somewhere.   With 5,000 files, this means 5,000
regressions and therefore 5,000 worksheets in the results workbook.   I'm
not sure what the limit is nowadays - used to be 256 sheets per book whenI
last got anywhere near it.

What I really need is a table showing (for each regression) what the
"important" results were i.e. the intercept, the coefficient and the
associated goodness of fit.

If anyone can help, that would be great.

Thx
lk

If it makes things easier, you don't need to use the Data Analysis
Toolpack. You can use the inbuilt Excel functions SLOPE and INTERCEPT
over whole columns.

In a spare spot you can put something like
Range("Z1") = "=SLOPE(E:E,D:D)"
Range("Z2") = "=INTERCEPT(E:E,D:D)" in your VBA. The when the
file is opened, it copies the formulas into Z1:Z2 and the slope and
intercept appear, ready to be copied off somewhere else. It's
probably quicker too.

xt
 
Back
Top