R-squared formula

  • Thread starter Thread starter Elizabeth Brown
  • Start date Start date
E

Elizabeth Brown

I was just wondering if anybody knows the formula for the
r-squared value for a power curve. Excel states that it
uses a transformed r-squared value. I am trying to
calculate the value by doing the mathematical calculations
in the spreadsheet. I have gotten the correct equation
that the chart shows, but I can't seem to get the correct
r-squared value. The formulas that they give are:

R^2 = 1-(SSE/SST)

SSE = E(Yi-Yi^)^2

SST = (EYi^2)-(EYi)^2/n

E is the best I can get to a sigma in this. Sigma is the
sign to sum up all the indicated values.

Yi is the original Y values.

Yi^ means Yi(hat). Can't place the carrot over the Y. Y
(hat) is the value you get when putting original x values
into the determined equation.

To transform the R-squared value, I took the natural log
transformation of and performed the operations on those
values. The equations that I have used are as follows:

R^2 = 1-SSE/SST

SSE = E(ln(Yi)-ln(Yi^))^2

SST = E(ln(Yi^2))-(E(ln(Yi)))^2/n

ln refers to the natural logarithm.

My problem lies within these 3 equations, I am sure of it.

Thanks for any help you can give,
Elizabeth
 
Not sure where you are having problems, but here are a few random
thoughts that may help.

Two ways to calculate SST are
=SUMSQ(y_data)-SUM(y_data)^2/COUNT(y_data)
which corresponds to your formula. A numerically better way to
calculate the same quantity is
=DEVSQ(y_data)
Both are mathematically equivalent, but the first version can introduce
finite precision rounding errors if the CV of the data is small.

You can take logs in the same step by
=DEVSQ(LN(y_data))
or if there is not data in every cell, then use
=DEVSQ(IF(ISNUMBER(y_data),LN(y_data)))
Both of these are array formulas. You use enter with Ctrl+Shift+Enter.
If you do it right, the formula bar will display the equation wrapped
in curly brackets
{=DEVSQ(LN(y_data))}
even though you did not type curly brackets.

To calculate SSE, use
=SUMPRODUCT((y_data-y_hats)^2)
or the array formula
=SUMSQ(y_data-y_hats)
Numerically these are equivalent, but the second must be array entered.
Again, you can have these take logs where needed.

An alternate expression for r-squared is
R^2 = SSM/SST
where SSM=SST-SSE is the model sum of squares, which can be calculated by
=DEVSQ(y_hats)

Note that LINEST and LOGEST will calculate r-squared if you request
statistics via the optional fourth parameter.

Jerry
 
Jerry, can you confirm that these are quite general solutions that would
apply to, say, polynomial relationships.
I know that correlation and regression are different things, but there are
quite different regression formulae for dealing with linear and non-linear
relationships. Same for r? PEARSON and CORREL seem to produce the same (r)
result and both are designed for linear relationships (?).
The Spearman Rank correlation coefficient is supposed (according to my stats
book) to be the one for non-linear data, but Excel doesn't seem to have
heard of Spearman.

One final related point. Is possible to force Excel to best-fit a line of
the form: y = a.x + bx^2 + bx^3.
That is, find the coefficients a,b,c, ....plus the r value.
Using the Trendline (type = power) option, Excel typically fits a curve of
the form: y = a.x^p, where p could be, say, 1.935.
Would use of the formulae described in your reply to Elizabeth produce the
same value or r (r squared actually) as that displayed with the trendline?

Phil
 
For polynomial relationships, create a column for each power
that occurs in the equation.
So, if you want
y = a.x + bx^2 + bx^3
and you already have x and y columns, create 2 columns for
x^2 and x^3, thee need to be adjacent to bthe x columns.
Then use Regression from the Analysis Toolpack addin
(you might have to install and/or acticate it).
There, enter the y column and the range with the 3 x columns for the
x and y ranges.
This gives you polynomial regression with r and everything you need.
 
Yes. These equations apply generally for ordinary least squares
fitting, and have better numerical properties than the ATP routines, or
LINEST/LOGEST in Excel versions prior to 2003. The particular model
details come into play in how the yhats are computed, which I did not
address because you did not specify a model.

Jerry
 
Back
Top