Regression with more than 16 variables

Guest · Jun 23, 2005

I noticed linest and the regrssion tool doesn't allow more than 16 variables.
Does anyone know the most convenient way to do this? I currently don't have
SAS. Is it possible in Access or anything else?

Guest · Jun 23, 2005

Without being pessimistic I think you'll struggle to do this mutlivariate
regression in excel. There may be a way but it will be long-winded I am sure.

As a suggestion, stats websites often allow you to input data and see the
output running off Javascript as an example. Try and find a website that you
can cut and paste your data to and crank the analysis...

SAS could do it easily. The R project (google it to find it) offers a free
stats package. It is the update of S and is pretty much as good as it gets.
Upside is that it is free, downside is that it is not user friendly in that
you would have to put some hours in before you got around to doing your
multivariate regression.

Sorry to be a naysayer....

Alex

Jerry W. Lewis · Jun 27, 2005

You could directly implement the equations that were used in Excel prior
to 2003
http://groups-beta.google.com/group...worksheetfunctions/msg/17656937595e596d?hl=en

Also, the address of the R project is www.r-project.org, and there is an
supplemental package for an R-Excel interface

Jerry

MrShorty · Jun 27, 2005

How many rows/columns will the built in matrix functions (mmult
minverse, etc.) handle? Least squares regression "reduces" to som
fairly simple matrix algebra which can be performed using the built i
matrix functions. Most of the LSQ regression I do, I prefer to use th
matrix notation rather than LINEST or LOGEST, but I've never had t
regress on more than 5 or 6 parameters. I haven't heard of any limi
to the size of matrix these equations can handle, so, in theory anyway
you should be able to perform the regression this way

Mike Middleton · Jun 27, 2005

MrShorty -

How many rows/columns will the built in matrix functions (mmult, minverse,
etc.) handle? Least squares regression "reduces" to some fairly simple
matrix algebra which can be performed using the built in matrix functions.
Most of the LSQ regression I do, I prefer to use the matrix notation
rather than LINEST or LOGEST, but I've never had to regress on more than 5
or 6 parameters. I haven't heard of any limit to the size of matrix these
equations can handle, so, in theory anyway, you should be able to perform
the regression this way. <

Some of the limitations, e.g., maximum 52 columns by 52 rows for MINVERSE,
are described at the end of the following Knowledge Base article:

http://support.microsoft.com/default.aspx?scid=kb;en-us;166342

- Mike
www.mikemiddleton.com

MrShorty · Jun 27, 2005

Based on the information in that article, once could (assuming enoug
RAM, processor time, and such) theoretically do a least square
regression for up 52 parameters, then

Jerry W. Lewis · Jun 28, 2005

MrShorty said:
Based on the information in that article, once could (assuming enough
RAM, processor time, and such) theoretically do a least squares
regression for up 52 parameters, then.

Theoretically yes, but forming the normal equations is a numerically
poor way to do least squares, since numerical singularity is then
governed by the ratio of largest to smallest eigenvalues in X'X instead
of that ratio in X. MINVERSE will often give a numerically inaccurate
solution instead of warning of approximate numerical singularity.
Consequently I would tend distrust the output.

This issue in 1 dimension is the difference between calculating
=SUMSQ(data)-SUM(data)^2/COUNT(data)
(as was done by VAR, STDEV, etc prior to Excel 2003) and calculating
=DEVSQ(data)
The first form becomes more and more inaccurate as the CV of the data
decreases. For example, STDEV(x,x+0.1) should be 0.0707106781186548
=SQRT(0.005) regardless of x. But for x=100000 (CV=0.00007%) the first
form (and STDEV in Excel 2002 and earlier) is inaccurate in the 4th
significant figure of the standard deviation calculation. Between
1000000 and 10000000, all significant figures are lost in the first
calculation (summing squared x's has pushed the interesting part of the
calculation beyond the accuracy if IEEE double precision). While the
second form continues to work well.

Personally, I can't remember ever being asked to analyze data with a CV
that small, (and if I did, I would have used coding (linear
transformation to a more practical range) to avoid numerical problems,
but things are not so clear in the multiple dimensions of the general
linear model. For instance, if you fit a 6th degree polynomial where
the x data is 20.0, 20.5, 21.0, 21.5, ..., 27.0, the condition number of
X'X is around 10^30, suggesting that direct solution of the normal
equations would require quadruple precision to calculate it
http://groups-beta.google.com/group/microsoft.public.excel/msg/969a2bb33e6cdbb8
Pre-2003 LINEST (or the direct MINVERSE() formula linked from my
previous post) will try, but will get no figures correct without
warning. SAS PROC GLM and PROC REG will complain of numerical
singularity (SAS's use of the sweep operator necessarily requires it to
form the normal equations). That is why books on statistical computing
recommend other approaches, such as Givens rotations, as implemented in
http://lib.stat.cmu.edu/apstat/274
http://lib.stat.cmu.edu/apstat/75
the lm() function in S-PLUS and R will get 8 figures correct for each
coefficient in the preceding example despite the ill-conditioning.
Excel's trendline will get 9 figures correct, but will not go beyond a
6th degree polynomial. There is a contributed package for R that would
allow you to pass data from Excel to R, analyze it in R, and pass the
result back to Excel. To learn more about R, go to
http://www.r-project.org

Jerry

Regression with more than 16 variables

Guest

Guest

Jerry W. Lewis

MrShorty

Mike Middleton

MrShorty

Jerry W. Lewis