Correlation coefficient for log plot..

  • Thread starter Thread starter Phil C
  • Start date Start date
P

Phil C

Hi All

I posted on 3 August ("Growth function falls over..") about LOGEST's
inability to handle missing data.
My data [x=120, 150, 200, 235; y=100, 75.5, 61.2, 50.1) plots on a log/lin
plot (x-y scattergraph).

Jerry Lewis came up with a neat solution using SLOPE and INTERCEPT to
replace =INDEX(LOGEST, etc..), which works fine, but I now have a similar
problem with the correlation coefficient, r. The only way I have found for
(presumed) accurate calculation of r is to use LOGEST entered as an array
formula to get the extra stats, including r squared. This agrees exactly
with the value that you can display (Format trendline|options) along with
the (exponential) trendline (r squared = 0.9754), but I need an alternative
to LOGEST! The CORREL function seems to get the wrong answer for my data
(r=-0.9688, so r squared = 0.9405) . but seems to work OK for ordinary
linear data. The discepancy between LOGEST and CORREL varies as you play
with the data but the difference is always significant. Do a long-hand
formula for 'r'?

Any ideas?

Thanks, Phil
 
Phil -
I posted on 3 August ("Growth function falls over..") about LOGEST's
inability to handle missing data. My data [x=120, 150, 200, 235; y=100,
75.5, 61.2, 50.1) plots on a log/lin plot (x-y scattergraph).[QUOTE]
Jerry Lewis came up with a neat solution using SLOPE and INTERCEPT to[/QUOTE]
replace =INDEX(LOGEST, etc..), which works fine, but I now have a similar
problem with the correlation coefficient, r. The only way I have found for
(presumed) accurate calculation of r is to use LOGEST entered as an array
formula to get the extra stats, including r squared. This agrees exactly
with the value that you can display (Format trendline|options) along with
the (exponential) trendline (r squared = 0.9754), but I need an alternative
to LOGEST! The CORREL function seems to get the wrong answer for my data
(r=-0.9688, so r squared = 0.9405) . but seems to work OK for ordinary
linear data. The discepancy between LOGEST and CORREL varies as you play
with the data but the difference is always significant. Do a long-hand
formula for 'r'?
Any ideas?

Use array-entered CORREL(LN(y-range),x-range).

- Mike

www.mikemiddleton.com

P.S. Here are some relevant excerpts from my book "Data Analysis Using
Microsoft Excel: Updated for Office XP":

The exponential model creates a trendline using the equation

y = c*(e^(b*x)).

Excel uses a log transformation of the original y data to determine fitted
values, so the values of the dependent variable in your data set must be
positive. If any y values are zero or negative, the Exponential icon on the
Add Trendline Type tab will be grayed out. (As a workaround, you can add a
constant to each y value.)

The exponential trendline feature does not find values of b and c that
minimize the sum of squared deviations between actual y and predicted y
(=c*(e^(b*x))). Instead, Excel's method takes the logarithm of both sides of
the exponential formula, which then can be written as

Ln(y) = Ln(c) + b * x

and uses standard linear regression with Ln(y) as the dependent variable and
x as the explanatory variable. That is, Excel finds the intercept and slope
that minimize the sum of squared deviations between actual Ln(y) and
predicted Ln(y), using the formula

Ln(y) = Intercept + Slope * x.

Therefore, the Intercept value corresponds to Ln(c), and c in the
exponential formula is equal to Exp(Intercept). The Slope value corresponds
to b in the exponential formula.
 
Mike

Many thanks. I may even buy the book!

Phil


Michael R Middleton said:
Phil -
I posted on 3 August ("Growth function falls over..") about LOGEST's
inability to handle missing data. My data [x=120, 150, 200, 235; y=100,
75.5, 61.2, 50.1) plots on a log/lin plot (x-y scattergraph).[QUOTE]
Jerry Lewis came up with a neat solution using SLOPE and INTERCEPT to
replace =INDEX(LOGEST, etc..), which works fine, but I now have a similar
problem with the correlation coefficient, r. The only way I have found for
(presumed) accurate calculation of r is to use LOGEST entered as an array
formula to get the extra stats, including r squared. This agrees exactly
with the value that you can display (Format trendline|options) along with
the (exponential) trendline (r squared = 0.9754), but I need an alternative
to LOGEST! The CORREL function seems to get the wrong answer for my data
(r=-0.9688, so r squared = 0.9405) . but seems to work OK for ordinary
linear data. The discepancy between LOGEST and CORREL varies as you play
with the data but the difference is always significant. Do a long-hand
formula for 'r'?
Any ideas?

Use array-entered CORREL(LN(y-range),x-range).

- Mike

www.mikemiddleton.com

P.S. Here are some relevant excerpts from my book "Data Analysis Using
Microsoft Excel: Updated for Office XP":

The exponential model creates a trendline using the equation

y = c*(e^(b*x)).

Excel uses a log transformation of the original y data to determine fitted
values, so the values of the dependent variable in your data set must be
positive. If any y values are zero or negative, the Exponential icon on the
Add Trendline Type tab will be grayed out. (As a workaround, you can add a
constant to each y value.)

The exponential trendline feature does not find values of b and c that
minimize the sum of squared deviations between actual y and predicted y
(=c*(e^(b*x))). Instead, Excel's method takes the logarithm of both sides of
the exponential formula, which then can be written as

Ln(y) = Ln(c) + b * x

and uses standard linear regression with Ln(y) as the dependent variable and
x as the explanatory variable. That is, Excel finds the intercept and slope
that minimize the sum of squared deviations between actual Ln(y) and
predicted Ln(y), using the formula

Ln(y) = Intercept + Slope * x.

Therefore, the Intercept value corresponds to Ln(c), and c in the
exponential formula is equal to Exp(Intercept). The Slope value corresponds
to b in the exponential formula.
[/QUOTE]
 
Back
Top