We have seen previously that a major goal of experimental science is to detect differences between measurements that have resulted from different treatments. Early on we learned that it is not possible to assess these differences based on a single measurement of each treatment. Without knowing how much variation existed within a treatment, we could not know if the difference between treatments was significantly large. The simplest and first formal statistical test we learned about, the ttest of means, provided a mathematical way of comparing the size of differences of means relative to the variability in the samples used to calculate those means.
Differences caused by an experimental treatment can be thought of as just one part of the overall variability of measurements that originates from many sources. If we measured the strength of the response of cockroach retinas when stimulated by light, we would get a range of measurements. Some of the variability in measurements could be due to the differences in the stimulus (color, intensity, and duration of the light pulse), some of the variability could be due to the differences in the eyes (different surface areas, different chemical composition, different position of the insertion of the electrode), and some of the variability could have no identifiable cause (random uncontrollable variation). In a good experiment, we try to increase the variation due to factors we are interested in (e.g. in a color sensitivity experiment by having widely differing wavelengths of light), while trying to control as many other things as we can by careful manipulations of the environmental conditions (e.g. standardizing type and position of electrode, using the same intensity of light each time, etc.). In the end, our ability to claim significant differences boils down to being able to say that the variation in results (i.e. differences) due to the factor that we manipulate is much larger than the variation due to uncontrollable factors (i.e. "noise"). Variation between treatment groups is our friend; variation within a treatment group is our enemy. [Note: the terms "treatment groups", "treatment categories", and "levels" are used synonymously.]
Analysis of variance (commonly abbreviated ANOVA), is a powerful statistical technique that is commonly used by biologists to detect differences in experimental results. The fundamental principle in ANOVA is to determine how many times greater the variability due to the treatment is than the variability that we cannot explain. The measure of this is called an "F statistic" (named in honor of the inventor of ANOVA, the geneticist R. A. Fisher). A complete understanding of the theoretical underpinnings and assumptions of ANOVA is beyond the scope of this course and is covered robustly in the Biometry course. Our purpose in this course is to understand the basic idea of ANOVA and to learn about situations in which it can be useful.
An ANOVA tests the null hypothesis that there is no difference among the mean values for the different treatment groups.
Although it is possible to conduct an ANOVA by hand, no one in their right mind having access to computer software would do so. Setting up an ANOVA using RStudio is quite easy. The exact method will be described later. The results of an ANOVA are summarized in a table. Table 14 shows the results of an actual experiment (click to see the data) comparing 24 measurements of the amplitude of the response of cockroach eyes stimulated by pulses of blue and green light having the same brightness and duration: In this particular example, the null hypothesis is that there is no difference between the mean response to blue light and the mean response to green light.
Table 14. Analysis of variance comparing effect of green and blue light on a roach eye
Source 
Degrees of freedom 
Sum of squares 
Mean square 
F ratio 
P 
Model 
1 
53.0 
53.0 
1.26 
0.268 
Residuals 
46 
1941.4 
42.2 


Total 
47 
1994.4 



The row titled "Model" represents the variation caused by the difference between the blue and green light treatments. In a single factor ANOVA statistical software may replace "model" with the name of the experimental variable that is being tested (e.g. "color"). The row titled "Residuals" represents the variation within the treatments that cannot be attributed to the light factor. Sometimes the term "error" is used instead of "residuals", which is a bit unfortunate, because this variation is not due to any mistake of the experimenter but rather represents the variation that the experimenter was not able to control.
The column "Degrees of freedom" (df) is related to the numbers of categories and samples. The Model df is the number of categories minus one and the Total df is the total number of measurements minus one. The "Sum of squares" can be thought of as a measure of the way that variation is distributed (or "partitioned") among sources of variability and is additive.
The "Mean square" is calculated by dividing the sum of squares by the degrees of freedom for that source. The mean square is analogous to the variance (i.e. the square of the standard deviation) of a distribution. Thus a large mean square represents a large variance, and vice versa. The F ratio is simply the model mean square divided by the residuals mean square. As was said previously, the point of ANOVA is to compare how many times greater the variance due to the treatment (=good) is than the variance due to uncontrolled effects (=bad), so the F ratio is the statistic that represents this quantity.
In this example, one can see that the variance due to the light color treatment ("model") is larger than the variance that we cannot account for ("residuals"), but not by much (F=1.26). With this number of samples, the difference caused by the light treatment is not great enough to be considered significant, as demonstrated by the P value of 0.268 . We would fail to reject the null hypothesis that the means for the two colors of light are the same.
If we consider 24 measurements taken using red and green light instead (click to see data), we get the results in Table 15. In this experiment, our null hypothesis is that there is no difference between the mean response for red light and the mean response for green light.
Table 15. Analysis of variance comparing effect of green and red light on a roach eye
Source 
Degrees of freedom 
Sum of squares 
Mean square 
F ratio 
P 
Model 
1 
437.6 
437.6 
17.1 
0.0001 
Residuals 
46 
1174.9 
25.5 


Total 
47 
1612.5 



We can see that the total variation in the measurements is similar (represented by the total sum of squares) to the total in the comparison of blue and green light, but this time a lot more of the variation can be accounted for by the light color treatment (=model). Comparison of the variance of the light treatment to the unaccountable variance (=residuals) shows that it is 17.1 times larger. With this number of samples, that represents a highly significant difference (P=0.0001). We would reject the null hypothesis that the means for the two colors of light are the same.
Although an ANOVA represents a different way of thinking about the significance of differences than a ttest, for a single factor with two treatments there is no advantage to conducting an ANOVA over performing a ttest. In fact, both tests will result in identical P values. The advantage of an ANOVA comes when considering more complicated experimental designs.