![]() |
![]() |
|
|||||||||||
|
|
|
|
|
|
|
|
|
||||||
|
|
¡¡ Dear Cynthia:
Thanks for your email sent on Monday.
Using F-statistic to compare means is often
classified under ANOVA (Analysis of Variances). Excel has ANOVA
functions for both single factor and two-factor analysis, with or
without replications. Its Ftest function is used to test the difference
of variances between two samples. As you know, Excel only works on
datasets with equal sample sizes, that is, equal number of data points.
But, S-plus or R and a few other software packages can help us to
implement the ANOVA work when sample sizes are unequal.
In the following, I will explain it in more details. First, I will describe how F stats are calculated. Then, I will describe the computational procedures in S-plus and in R. I think you will prefer S-plus or R to other software, as S-plus and R are very powerful to handle this type of work. If you do not have access to S-plus, you may get the R from www.r-project.org that is free. If you want a quick solution, you may want to skip the following formulas and just read about the implementations part. ----- formulas and math part (you may skip) ---
The F-statistic is often calculated by
SSB/d.f B (or SS treat / d.f. treat)
divided by SSW/d.f. W. Here, B means ¡°between groups¡±, W
means ¡°within groups¡±and treat refers to treatments in an
experiment. So, SSB is the sum of squares between groups (or SS treat
is the sum of squares between treatments) and SSW is the sum of squares
within groups. As you may know, d.f. is the degrees of freedom. Sum of
squares divided by degrees of freedom is often referred as MS (mean
square). MS between groups (or between treatments) divided by MS within
groups becomes the F-statistic. When sample sizes are equal, the
calculation is very simple.
When sample sizes are not equal, if only one
factor, the calculation is not that simple but still very
straightforward as the followings:
Assume the number of groups is a and let ni be the
number of subjects in group i
DFtreat = a-1
MS treat = SS treat / DF treat MS within = SS within / DF within
Then, Compare F to the Fa-1,N-a distribution for your decision-making. When there are two factors, the calculation is a
little bit more complicated. But, the principle is still the same as
above. As we always use software to complete the calculation, the above
formula is mainly for us to see the principle behind.
--------- implementation starts here ----- As to compute the F statistics, I suggest using
S-plus or R. Using R is the same as using S-plus. In R, after you invoke
R and import your data, please type aov(Dependent ~ Factor1 +
Factor2, dataset). Or you may type aov(Dependent ~ Factor1 +
Factor2 + Factor1*Factor2, dataset)
if you want to test the interaction between the two factors. If
only one factor, just type aov(Dependent ~ Factor, dataset). The
ANOVA tables returned will include F-statistics and p-values for your
decision-making.
For your convenience, this
linked web page
contains one example of using R to calculate F statistics created by me. The implementation is rather simple. However, if
you are working on a two factor ANOVA and the sample sizes are unequal,
the order in which factors are analyzed becomes very important. If you
have factors A and B, and if factor A is analyzed first (or entered
first for the work in a statistical software), the ANOVA table gives the
sum of squares explained by factor A, then the sum of squares explained
by B after removing effects of A. Similarly if B is entered
first, the sum of squares for B, then for A after removing B¡¯s
effects. In other words, the ANOVA table produced needs to be
interpreted sequentially. The contribution of each row must be
interpreted as adding that term to the design containing the previous
terms.
Therefore, when the sample sizes are unequal, the
results will depend on which factor is entered first. In general, you
want to enter the less important factor first. If all the factors are
equally important to you, you may use different orders to produce all
the ANOVA tables and use a combination of them (use all the ones after
removing other factors). Also, function drop1 in S-plus or R
gives results showing the effects of dropping each item in your design
that can be interpreted without the sequential consideration. I guess
you may not want to deal with this issue now. If you need any more
assistance on this factor order matter later, please contact me anytime.
By the way, for experiments, ¡°unequal sample
sizes¡± is referred as ¡°unbalanced experiment¡±. In S-plus or R,
you can use functions replication or alias to investigate
the pattern of unbalance if needed.
I hope the above answers most of your questions for
this research. If you need any more assistance or need me to take a look
of your dataset, please do not hesitate to contact me.
Sincerely, ¡¡ |
||||||||||||
RM PublicationsRM ProgramsRM Platforms¡¡ |
|||||||||||||
|
|
|
|
|
|
|||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|