Popular Posts

ANOVA

How to compute ANOVA:
(With example)

Below is a step-by-step description on how to compute one-way ANOVA. Each step also provides an example. If you came here looking for help because you are frustrated with ANOVA worry no more! You can do it. This page makes it easy: it takes out all the complicated equations and just tells you in English and shows you with examples how to do it. IT CAN BE EASY TO LEARN ANOVA IF IT'S PRESENTED IN FAMILIAR TERMS!

The purpose of ANOVA is similar to a two mean test, but the advantage is that it can be used for more than two means. 

The end goal is to calculate and F statistic (similar to Z, t or CHI2) and see if it is bigger than a critical value that can be looked up in a table. 

The F statistic you will compute is literally a ratio of how much difference is between groups compared to how much difference is within groups--so the bigger the F statistic, the more likely it is to be statistically significant!

It is often easiest to understand using an example, so let us use the following table to compute ANOVA for test scores in 4 different classes. Each class has only 3 students to keep the example easy to follow. 


Test scores by class
Mr. Fainzout
Ms. Terry
Mr. Eyus
Ms. Issipppi
85
45
95
72
65
88
92
69
72
58
89
55


1. Compute the total mean: 
add up every observation and divide by the total number of observations.

Observation #
Mr. Fainzout
1
85
2
65
3
72

Ms. Terry
4
45
5
88
6
58

Mr. Eyus
7
95
8
92
9
89

Ms. Issipppi
10
72
11
69
12
55
Grand total->
885
Total number of observations->
12
Total mean (grand total/number of obs.)->
885/12=73.75

SO THE TOTAL MEAN=73.75

2. Compute group means: 
(add up the observations in a group and divide by the total number of observations in that group, repeat for each different group).

Test scores by class
Mr. Fainzout
Ms. Terry
Mr. Eyus
Ms. Issipppi
85
45
95
72
65
88
92
69
72
58
89
55
(85+65+72)/3
(45+88+58)/3
(95+92+89)/3
(72+69+55)/3
=74
=63.67
=92
=65.33

Group means are: 74, 63.67, 92 and 65.33. 

Already we see some difference between groups, but due to our very small sample size we don't know if this is just a statistical fluke, so we have to quantify it using ANOVA...
3. Compute TOTAL SUM OF SQUARES (SST):
A. Subtract the total mean from each observation (include all observations from all groups).
B. Square each result from part “a” above.
C.  Add up all the squared results from part “b” above.
D. THE RESULT IN “C” IS YOUR TOTAL SUM OF SQUARES (SST)

NOTICE THAT EACH PART ABOVE IS LABELLED WITH THE CORRESPONDING LETTER IN THE EXAMPLE BELOW 


A
A
B
Observation #
Mr. Fainzout
Total mean
Difference
Squared difference
1
85
-
73.75
11.25
126.56
2
65
-
73.75
-8.75
76.56
3
72
-
73.75
-1.75
3.06

Ms. Terry
4
45
-
73.75
-28.75
826.56
5
88
-
73.75
14.25
203.06
6
58
-
73.75
-15.75
248.06

Mr. Eyus
7
95
-
73.75
21.25
451.56
8
92
-
73.75
18.25
333.06
9
89
-
73.75
15.25
232.56

Ms. Issipppi
10
72
-
73.75
-1.75
3.06
11
69
-
73.75
-4.75
22.56
12
55
-
73.75
-18.75
351.56
Grand total->
885
2878.25
      <-C
Total number of observations->
12
Total mean (grand total/number of obs.)->
     885/12=73.75


The TOTAL SUM OF SQUARES is 2878.25

Notice that "Total sum of squares" is an abbreviation for "Sum of squared differences from the total mean". That's exactly what we do to compute it-sum (add up) all of the squared differences from the mean!
4. Compute WITHIN GROUP SUM OF SQUARES (SSW):
A. Subtract each group average from the observations that go with that group average.
B. Square each of the differences in “A”.
C. Add up all of the squared differences in “B”.
D. You now have a sum of squared differences for each group.
E. Add all of the summed differences together.
F. The result in “E” is WITHIN GROUP SUM OF SQUARES (SSW)

A.
A.
B.
C.
Mr. Fainzout
Mr. Fainzout class mean=74
Difference
Squared difference
  SS by group
85
-
74
11
121.00
65
-
74
-9
81.00
72
-
74
-2
4.00
206.00
<-D.
Ms. Terry
Ms. Terry class mean=63.67
45
-
63.67
-18.67
348.57
88
-
63.67
24.33
591.95
58
-
63.67
-5.67
32.15
972.67
<-D.
Mr. Eyus
Mr. Eyus class mean=92
95
-
92
3
9.00
92
-
92
0
0.00
89
-
92
-3
9.00
18.00
<-D.
Ms. Issipppi
Ms. Issippi class mean=65.33
72
-
65.33
6.67
44.49
69
-
65.33
3.67
13.47
55
-
65.33
-10.33
106.71
164.67
<-D.

1361.33
   <-E. Sum group SS


So Sum of Squares Within is 1361.33

Notice that this is another abbreviation. The full name is "The sum of squared differences of each group observation from the within-group mean (the mean of each group)". And that is exactly what we are doing--computing the sum of squared differences of group observations from their groups' means!

5. Compute BETWEEN GROUP SUM OF SQUARES (SSB):

INSERTED USEFUL FACTOID: SST=SSW+SSB 
We could bail out right here--not because we are defeated, but because SST=SSW+SSB! Because we now know SST (2878.25) and SSW (1361.33), we can use simple algebra to solve: 2878.25=1361.33+SSB. So SSB=2878.25-1361.33=1516.92=SSB. We will still learn how to compute it (it may be on the test after all), but you should note that SST=SSW+SSB. If nothing else, it makes a good way to check if you got all of the correct Sums of Squares at the end! 
A. You have a group mean for each group that you computed in part 2.
B. Take each mean and subtract the total mean from part 1.
C.  This is the difference between each group mean and the total mean.
D. Square each of these differences.
E. Multiply each squared difference in "d" by the number of observations in the group associated with each (This is to “adjust” SSB so that it has the same weight as SSW and SST)
F. Add up all the results from "E"
G. The result in “F” is BETWEEN GROUP SUM OF SQUARES (SSB).


Mr. Fainzout
Ms. Terry
Mr. Eyus
Ms. Issipppi

85
45
95
72

65
88
92
69

72
58
89
55

74
63.67
92
65.33
     <-Group means
A.
73.75
73.75
73.75
73.75
     <-Total mean

0.25
-10.08
18.25
-8.42
     <-Group mean - total mean
B.
0.06
101.61
333.06
70.90
     <-Squared
C.
3(0.06)=.19
3(101.61)=304.8
3(333.06)=999.2
3(70.90)=212.7
     <-multiply by number in each group
E.
      .19      +           304.8       +         999.2       +        212.7      =
1516.89
F.

So, SSB is 1516.89! 

REMEMBER STEP E! The long name for Sum of Squares Between is just a little different than the other Sum of Squares so far. You could call it "The Sum of Squared Differences Between Group Means and the Total Mean MULTIPLIED by the Number of Observations in Each Category!" That is a long name--no wonder we stick with Sum of Squares Between or "SSB" for even shorter!
Why multiply by the number in each category? 

PAUSE FOR CHECKPOINT!

Remember how SST=SSW+SSB? Now you can check to make sure everything adds up--literally. 

So from part 3 we have SST=2878.25
From part 4, SSW =1361.33
And now for SSB we have 1516.89

So plug these into the equation SST=SSW+SSB to get 2878.25=1361.33+1516.89.

Well 1361.33+1516.89 actually equals 2878.22, but we know this is due to rounding error (as is often the case) so EVERYTHING CHECKS OUT! 

***Remember, here we are not concerned about very small differences that could be due to rounding, but major differences that mean we computed something wrong! More often than not you will be off in the hundredths or even the tenths place so don't sweat it!

Now that we have SST, SSW and SSB can we compute F?

For ANOVA we compute an F statistic. It is basically the same as Z or t if you are familiar with those, except it gets a special letter to give Mr. Fisher the credit he deserves for developing it. 

But just like with Z or t, the bigger the F statisic the more likely it is to be significant. If it is bigger than the critical value, we reject the null. More on this later...

The formula for F is (SSB/DFb)/(SSW/DFw)

So we have SSB and SSW, but what are DFb and DFw? They are simply degrees of freedom for the between group (DFb) and within group (DFw) calculations. They are computed in a very similar way to other degrees of freedom--we subtract 1 from the things we are analyzing.

We need to compute degrees of freedom for between and within! 

6. Compute between group degrees of freedom (DFB):

Remember that to compute SSB we used group data (group means compared to the total mean). So we subtract one from the number of groups.

DFb=number of groups - 1.

In our example we have four groups so DFb=4-1=3. DFb is 3.

7. Compute within group degrees of freedom (DFW):

In our within group calculations (SSW) we used all of the observations, but we did it group by group. So we take the number of observations -1 for each group and then add them together.

DFw=number of observations -1 for each group, then add them all together. 

OR: total observations - number of groups (either works fine!)

Here, each group has 3 students so we have: (3-1)=2, or 2 df for each group. So 2+2+2+2=8.

Some texts call this: Total number of observations - the number of groups. And, some students prefer to just memorize that instead, and that works fine. Notice that we have 12 total observations and 4 groups, so 12 - 4 =8! (The same answer we got by adding 2+2+2+2).

In our example, DFw is 8.

8. Compute F:

NOW, armed with SSB, DFb, SSW and DFw, we are ready to compute F and see if there is a statistically significant difference between groups!

Keep in mind the equation for the F statistic: 

(SSB/DFb)/(SSW/DFw)=F

A. The first part: (SSB/DFb) is called "Mean Sum of Squares Between Groups" and it is just that. Dividing by degrees of freedom "averages" SSB.

So, Mean Sum of Squares Between Groups is 1516.89/3=505.63

B. The second part: (SSW/DFw) is called "Mean Sum of Squares Within Groups" and it is just that. Dividing by degrees of freedom "averages" SSW.

So, Mean Sum of Squares Within Groups is 1361.33/8=170.17

SIDE NOTE: The F statistic is literally a ratio of how much difference there is between groups compared to how much difference there is within groups. So the bigger F is, the more the groups are likely to really be different in the population!
C. Now just divide A. by B.: 505.63/170.17=2.97


F=2.97.

D. To know if this is statistically significant, we have to look it up in a table. Go over the number of degrees of freedom BETWEEN and down the number of degrees of freedom within. FOR OUR EXAMPLE, WE GO OVER 3 AND DOWN 8.



The critical value is 4.07.

SIDE NOTE: "B" sure to start with "B" (DFb). If you go over the number of within group degrees of freedom and down the number of between group degrees of freedom, you would have set the critical value at 8.85, which is not correct (and makes you miss test questions among other things!) "B OVER"ly cautious!
9. Make a decision:

To reject the null at alpha level 0.05, F must be bigger than 4.07 (the critical value). Because F=2.97, it is lower than the critical value and we FAIL TO REJECT THE NULL. We cannot say that there is a statistically significant difference between groups in the population. 

SIDE NOTE: This is actually not very surprising given our very small sample sizes!


CONCLUSION:

The F test works in a similar way to other statistics: we are quantifying difference from the mean and then testing if that difference is extreme enough to conclude that it is (sufficiently) unlikely to have occurred just by a statistical fluke (like a "funny" sample...). 

The F test gives us an option when we want to compare more than two means and make a conclusion about the statistical significance of those differences (or the lack thereof). 

The F test is literally a ratio of how much difference there is between groups over how much difference there is within groups. 

The bigger F is the more likely it is to be statistically significant. 

F can be easily computed once you know SSB, DFb, SSW and DFw. Although it can be somewhat monotonous to do those calculations, they are very straightforward if you follow the examples given above. Practice doing this a few times with different datasets from your textbook or online and you will soon feel so comfortable doing ANOVA that you will even be a little bored by it!

Good luck!


No comments:

Post a Comment