F-Test without population

  • Thread starter Thread starter Ion Chalmers Freeman
  • Start date Start date
I

Ion Chalmers Freeman

Hi!
Can I run an F-Test if I know just the population count, average and
population standard deviation? There are a lot of numbers in my Access
database, and I'd like to leave them there.
ion
 
The F statistic is is the ratio of squared sample standard deviations.
It is used to draw inference about the ratio of squared population
standard deviations. If you already know the poplulation standard
deviations then it is unclear what you are trying to do.

Jerry
 
Jerry,
I want to know with what confidence I can say that two populations are
different. If I have a n1 dots with a mean mu1 and population standard
deviation sigma1, how sure can I be that this does not belong to the
same distribution as n2 dots with a mean mu2 and a standard deviation
sigma2?
Thanks for getting back to me.
ion
 
How are you using the term "population"? In statistics, the "population
standard deviation" is the true (usually unknowable) standard deviation
of the distribution, not an estimate from a sample. Also greek letters
are usually reserved for population parameters rather than sample
statistics. If you really know the means and standard deviations of the
populations, then the populations (distributions) are obviously
different if either mu or either sigma are not identical.

I am also not sure what you mean by a "dot", but if you have two samples
of size n1 and n2, with sample (rather than population) means and
standard deviations (calculated with STDEV or STDEVP?) then the
appropriate analysis would depend on what you are willing to assume
about the populations from which these samples are drawn. Are you
willing to assume that they follow a normal distribution? Are you
willing to assume that either the population means are equal or that the
population standard deviations are equal? ...

Jerry
 
Jerry,
I have lists of numbers. I'm willing to assume they're normally
distributed. I get the population variance with VARP in Jet SQL.
Thanks!
ion
 
VARP does not give you the population variance unless you have observed
the entire population. Since you are assuming normality, it gives you a
biased estimate of the population variance.
VARP*n/(n-1)
is the unbiased estimate (that would be computed directly by VAR)

Var1/Var2 (unbiased estimates) should follow the F distribution if the
true (unobserved) population variances are equal. Otherwise the ratio
will be too large (if the unobserved sigma1^2 > sigma2^2) or too small
(if the unobserved sigma2^2 > sigma1^2). This is assessed by evaluating
whether FDIST(Var1/Var2,n1-1,n2-1) is too close (often taken to mean
within 0.025) to zero or one. If you knew a priori (before looking at
the data) that if there were a difference then population 1 would be
more variable, then you would just be concerned with whether FDIST is
too close (say within 0.05) to zero.

If you conclude that the population variances are different, then the
populations are different.

If you conclude that the population variances are the same, then you
would pool sample variances to get
Var = ((n1-1)*Var1 + (n2-1)*Var2)/(n1+n2-2)
(again Var1 and Var2 refer to unbiased estimates; (n1-1)*Var1 =
n1*VarP1). The sample means could then be compared by calculating
t = (Xbar1 - Xbar2)/SQRT((1/n1+1/n2)*Var)
where Xbar refers to the sample mean. You would conclude that
population means were different if ABS(t) is too large. This is
assessed by whether TDIST(ABS(t),n1+n2-2,2) is too close to zero (say
within 0.05). If you knew a priori that if population means were
different then mu1 would exceed mu2, you would use TDIST(t,n1+n2-2,1)
instead.

Not concluding that the population variances are different does not
necessarily imply that they are the same (your sample sizes may be too
small to prove a difference that exists. If you are unwilling to
conclude that the population variances are the same, then comparing the
means becomes much more complicated.

A good statistics book or class will help you understand and properly
address these issues.

Jerry
 
VARP does not give you the population variance unless you have observed
the entire population. Since you are assuming normality, it gives you a
biased estimate of the population variance.
VARP*n/(n-1)
is the unbiased estimate (that would be computed directly by VAR)
...

Life would be easier if Excel online help termed VAR/STDEV maximum likelihood
variance and standard deviation and VARP/STDEVP least squares variance and
standard deviation rather than using sample and population qualifiers without
providing a warning that this terminology is ambiguous. But that's not the only
revision to Excel's online help that'd make many people's lives easier, it is?
 
Back
Top