At the end of the exercise you should be able to:
Analysis of variance (ANOVA) is a statistical test that falls into an important class of analyses called general linear models. In the simple tests that we have learned so far, like the t-test of means and linear regression, we analyze how a single factor has an effect on the outcome of an experiment. General linear model is an extremely important and powerful class of statistical tests that allow the effect of several factors to be investigated simultaneously. Other well-known statistical tests that fall into this category are multiple regression and analysis of covariance (ANCOVA).
Statistical tests that fall into the category of general linear models are among the most commonly used statistical tests. If you continue into research, it is very likely that you will encounter them. We introduce ANOVA so that you can see the circumstances in which it can be used, and how it is a broader extension of the t-test of means and paired t-test. A full understanding of ANOVA is beyond the scope of this course, but if you take the Biostatistics course, you can learn a lot more about how it works and what it is used for. Also be aware that the earlier statistical tests that were explored (linear regression, t-test of means, paired t-test, Chi Square Goodness of fit, Chi Square Contingency) tend to focus upon one thing being tested or two groups compared to each other. In other words, a design that can have little possible variation/noise from outside influences on the data. Once you test more than one group, more than one factor, more than one possible affector of data ... the design of the experiment becomes more complex and more important to be sure that you think of all things to test, why, and how.
If you recall, T-tests allow you to compare group A to B. However, with ANOVA, you may compare A to B to C…and D, and so on. This would be a single factor ANOVA where you have multiple groups generally testing one thing or one factor. In 1510L, you were asked to look at several papers including a paper by Sickbert-Bennett, et al. (2005). In this paper, there was a figure that displayed their work of comparing 14 chemicals in order to test their associated effect on bacterial removal while handwashing. This paper used simple Confidence Intervals to compare between groups with a sample size per group of about 5. So why not do a more powerful statistical test like an ANOVA? You would have to ask the authors that question. And maybe the feeling was that it was not quite appropriate based on the experimental design. But the base idea is there of an example where an ANOVA could be used: multiple groups testing a single factor (hand wash agent on bacteria)
Another paper you have come across if you were with us in Fall 1510L was a similar paper in which a student compared the effect of using an antimicrobial soap versus a non-antimicrobial soap in reducing bacteria counts on hands. The student in this paper was essentially comparing ‘A vs B.’ What if they also compared washing hands without any soap and instead just used water? Additionally, what if the student wanted to understand if the amount of time washing would have a measurable effect? If that were the case, the number of independent variables the student is trying to test will increase ('hand-wash time' and 'agent used on hand'). In order to handle the increasing complexity of the data, an ANOVA should be utilized. A two-factor ANOVA would need to be used in order to analyze data generated from an experiment dealing with time and different wash chemicals. Whereas just comparing the soap use or not would be a single-factor ANOVA (i.e., A versus B versus C).
To start, we will not perform a direct experiment this semester to gain data to perform either ANOVA. However, we do want to give you some data to use to perform tests in order for you be exposed to this powerful statistical tool. Thus, the following reading will show how to perform single-factor and two-factor ANOVA with some older data sets.