Forecast with two unknown variables

  • Thread starter Thread starter Crutch
  • Start date Start date
C

Crutch

I'm hoping this is an easy question but I can't figure it out for the life of
me.

I've simplified my scenario down to two variables, order quantity and order
frequency (represented by the number of days between orders). I want to
predict a customer's next order date and quantity based on past order
history. So I have two columns with historic information, one for the order
quantity the other showing the number of days since their last order.

If I try to use the forecast function I end up with a circular reference
because I'm trying to predict both variables. (ie I would have two forecast
functions in the last cell of each column, each function would want to know
the value of the other cell's forecast to make its own prediction).

Anyone know how I can do this? Here's sample data (note that because I'm
looking at the number of days since the last order, the first order does not
have a value for # of days, if this screws up your method feel free to drop
the first order's quantity from the analysis)

# of Days Quantity
2
14 2
14 2
21 2
14 2
28 2
21 2
35 2
14 2
14 2
14 2
14 2
14 2
14 2
28 2
14 2
4 2
10 2
7 4
21 2
14 2
6 2
15 2
10 6
32 10
145 2
1 2
15 2
21 2
7 2
11 2
15 2
9 4
13 2
15 2
14 2
14 2
14 2
7 2
26 2
20 3
10 2
7 1
11 2
17 2
14 2
14 2
21 2
21 2
20 2
36 2
14 2
14 2
21 2
10 2
11 2
14 2
14 2
14 2
14 2
21 2
14 2
7 2
14 2
21 2
11 2
3 2
4 2
17 2
14 2
14 2
10 2
32 2
41 4
27 4
9 8
21 4
6 4
15 4
19 4
21 4
43 4
35 4
40 4
24 4
63 4
x y

Need to predict both x and y...

If you know of any excel add in or third party software that would be a good
choice for this type of analysis I'm all ears (especially if it addresses
seasonality, variability and a bunch of other factors that throw predictions
off).

Thanks!!!!!
 
Crutch said:
I'm hoping this is an easy question but I can't
figure it out for the life of me.

Because predicting trends based on historical data is __not__ an "easy"
question to answer -- at least, not correctly. Anyone who suggests an "easy"
solution probably does not know what he/she is talking about, especially
given the sample data that you provide.

If I try to use the forecast function I end up with a
circular reference because I'm trying to predict both
variables.

It is difficult to offer any constructive comments because you neglect to
show us how you structured the FORECAST function usage. (Klunk!) But note
that the columns you label x and y should __not__ be used as the known_X and
known_Y arrays in the same FORECAST function call. That might be your
mistake.

I want to predict a customer's next order date and
quantity based on past order history.

This a subjective process, to some degree. If you are not familiar with the
parable about the blind men and the elephant, see
http://en.wikipedia.org/wiki/Blind_men_and_an_elephant . ;-)

When done correctly, it can also be a very complex statistical process, the
details of which go far beyond the scope of this response and of this
"discussion group".

Normally, the first step might be to look for correlations with seasons,
market and industry factors, etc. It is impossible to even begin such an
analysis here due to the dearth of details in your posting.

It might be interesting to note that the "typical" order size doubled after
the first the first 1218 days in your sample data. But whether or not that
indicates a trend, and what trend that might indicate, depends on what the
sample data represents. Again, the dearth of details makes any conclusions
purely speculative ("wild-ass guesses").

Taking a purely numercal approach (wrongfully)....

To analyze the order-size data ("y") in relation to the order-interval
("x"), you might start by creating a third column with formulas of the form
=B1/A1 (order per day). Then use FREQUENCY to look at the distribution.

If the distribution were bell-shaped ("normal" distribution), you might
compute the AVERAGE and STDEV; those statistics could be used to express an
"expected" order/day ration with a specified degree of confidence.

But clearly, your sample data does not have a "normal" distribution for the
size-to-interval ratio. So the STDEV and perhaps even the AVERAGE is useless.

You might also use FREQUENCY to analyze the order interval ("x"). Again, if
the distribution were bell-shaped, you might compute the AVERAGE and STDEV.

But again, your sample data does not have a "normal" distribution for order
interval.

You might look for a correlation between order interval and order size.
Off-hand, I do not see any in the sample data. But that is based on
eyeballing the data, not a rigorous analysis. Moreover, we do not know how
well your sample data represents the larger set of data, if any.

So I would fall back to using the results of the FREQUENCY bins to develop
conditional probabilities, which can be used to derived an "expected" order
interval and order size.

Again, I think the details go beyond the scope of this response and of this
"discussion group". If you would like to see what I have done, send me email
at joeu2004 "at" hotmail.com.

But note that I am not a statistician, and my approach is superficial. The
details can be used to satisfy your curiosity about the process. But if
there are significant financial consequences, I would suggest that you find a
professional satistician who specializes in this kind of analysis.

I must admit: I don't know how to do that. But Mike Middleton might. ;-)


----- original message -----
 
Errata....
To analyze the order-size data ("y") in relation to the
order-interval ("x"), you might start by creating a third
column with formulas of the form =B1/A1 (order per day).
Then use FREQUENCY to look at the distribution.

But that presumes a (direct) correlation between order size and order
interval, which I noted later does not appear to exist, based on a
non-rigorous eyeballing of the data. Klunk!

So scratch that idea. Just create a frequency distribution of order size.


----- original message -----
 
Errata#2....
So I would fall back to using the results of the
FREQUENCY bins to develop conditional probabilities,
which can be used to derived an "expected" order
interval and order size.

That, too, presumes that there is a correlation between order size and order
interval. But again, my non-rigorous eyeball assessment is that there is
none.

I strayed a little too far from the main points that I wanted to make, namely:

- You should not use the "x" and "y" data as the known_X and known_Y arrays
in FORECAST, if that is what you were doing.

- FORECAST might not even be the right tool for you to use, given your data,
although by coincidence it might not hurt in this case, if used correctly.

- Predicting the future based on historical data is an art, not a science.
Although some statisticians might take exception to that, the point is: it
depends on how you interpret the data.

- In any case, it is not an "easy question" to answer intelligently.

That said....

Given the lack of discernible patterns in the data (IMHO), I think the
"expected" order size and interval are indeed simply the AVERAGE of each
column of data.

However, you could use a FREQUENCY break-down of each column of data to
determine, for example, that there is about a 78% chance that the order size
is 2 (unless you believe the predominance of 4 after the first 1218 days
establishes a new level), and that there is about a 64% chance that the order
interval is less than 16 days.

That can be determined much more simply; for example,
COUNTIF(Y1:Y85,2)/COUNT(Y1:Y85) and COUNTIF(X1:X85,"<16")/COUNT(X1:X85). But
I think the FREQUENCY analysis provides greater insight overall.


----- original message -----
 
Sorry for the late reply, I thought I had e-mail notifications turned on but
I must have been mistaken. You bring up some good points about whether or
not the two columns have a cause and effect relationship. After scratching
my head a bit I realized I omitted a key fact, the number of days between
orders is always going to lag behind the order quantity since there's an
infinity number of days before the first order. So I was eventually able to
write the equation using the last order quanttiy as x and then had a matching
number of x and y data points. This all assumes that the customer has
regular demand. Thanks for the help!
 
Back
Top