Trendlines, best fit and projections

  • Thread starter Thread starter Dellie
  • Start date Start date
D

Dellie

I'm a newbie to this group, so please excuse me if this isn't the right
place.

I have a number of column charts with data representing 12 months. Most
often, these data are not linear. I have been fitting trendlines (ok by
experimentation) and can get very high R2's. But, when I go to project
out only 1 month, it seems that the better the fit, the more extreme
the projection. If I back off a polynomial model by 1, the projection
may reverse itself. Back off by another 1, and the projection looks
(yes I'm just looking and not getting into the stats behind the
regressions) more "reasonable". The behavior of the projections seems
to be erratic and the best regression fit makes it appear the a steep
increase or decrease is coming. Anyone care to comment? Thank you so
much for your time.
 
R2 will continue to increase, even if you are overfitting the data. You
might be better served by adjusted R2
http://en.wikipedia.org/wiki/Coefficient_of_determination

If you have n data points, you can always perfectly fit (R2=1) a polynomial
of degree n-1 to that data, but it will be chasing the noise in the data
instead of the signal and be totally useless for extrapolation and frequently
useless for interpolation. With less extreme polynomials, you may still
overfit the data.

A famous qoute (attributed to various persons from Poincare on) says that
"With four parameters I can fit an elephant; with five I can make it wag its
tail."

Jerry
 
Jerry said:
R2 will continue to increase, even if you are overfitting the data. You
might be better served by adjusted R2
http://en.wikipedia.org/wiki/Coefficient_of_determination

If you have n data points, you can always perfectly fit (R2=1) a polynomial
of degree n-1 to that data, but it will be chasing the noise in the data
instead of the signal and be totally useless for extrapolation and frequently
useless for interpolation. With less extreme polynomials, you may still
overfit the data.

A famous qoute (attributed to various persons from Poincare on) says that
"With four parameters I can fit an elephant; with five I can make it wag its
tail."

Jerry
 
Jerry said:
R2 will continue to increase, even if you are overfitting the data. You
might be better served by adjusted R2
http://en.wikipedia.org/wiki/Coefficient_of_determination

If you have n data points, you can always perfectly fit (R2=1) a polynomial
of degree n-1 to that data, but it will be chasing the noise in the data
instead of the signal and be totally useless for extrapolation and frequently
useless for interpolation. With less extreme polynomials, you may still
overfit the data.

A famous qoute (attributed to various persons from Poincare on) says that
"With four parameters I can fit an elephant; with five I can make it wag its
tail."

Jerry
 
You're welcome.

The Regression tool in the Anlysis ToolPak includes Adjusted R2 in its
output. Note, however, that Adjusted R2 (and much of the other output) will
be wrong if you check the "Constant is Zero" option.

Alternately, you can calculate directly from the formula in
http://en.wikipedia.org/wiki/Coefficient_of_determination
LINEST(ydata,xdata,,TRUE) gives R2 in the 3rd row, 1st column of output and
df in the 4th row, 2nd column. You can use COUNT(ydata) to get n. The
wikipedia formula then is simply 1-(1-R2)*(n-1)/df. Again, this formula is
wrong if you use the option to force the intercept to be zero.

Jerry
 
Dellie,

The whole point of Jerry's perfect response was that you are basically perfectly
fitting a curve to the past. However, the future rarely looks exactly like the
past. So, the better you model the past, the more likely you are to get a bad
forecast of the future.
 
Back
Top