Skip to Main Content

BSCI 1511L Statistics Manual: 3.5 Conducting an ANOVA using RStudio

Introduction to Biological Sciences lab, second semester

3.5.1 Single factor ANOVA

To perform a single factor ANOVA using RStudio, you need to set up a table with two columns.  Although it is possible to enter the data directly into the script, it is more likely that you will want to load the data from a CSV file, probably one created using Excel or some other spreadsheet software.  The format of the table is the same as what was described in section 0.2.1: one column should contain the continuous data to be analyzed and the other should contain values that assign the row to a category (a "grouping variable").  Note: the names used to assign a row to a given category must be exactly the same in every row.  To avoid accidentally misspelling the category name, it is best to type it in the first cell, then paste it into the other cells.  After entering the data, save the file in CSV format in a location where it can be accessed via the "file open dialog" command.  See section 0.2.1 for the details on how to access a file by that method.

Here is an R script that is set up to run the first ANOVA shown in Section 3.1

# read in the blue and green color data from a CSV file
ergData <- read.csv(file.choose())

# display the data table
ergData

# fit a linear model to the data
model <- lm(response ~ color, data = ergData)

#run the ANOVA on the model
anova(model)

To test this script, copy it from the raw text of this Gist and paste it into the Source Editor pane of RStudio.  Notice that there is some variation in this script from previous ones.  The left arrow symbolism ("<-") assigns the value on the right to the variable on the left in a manner similar to the equal sign ("=") in earlier examples.  There is also an alternative method of specifying the file location via a URL.  

As was the case with the t-test of means, in the lm function, the name of the data column is the first argument of the function, followed by a tilde and the name of the grouping variable.   

Highlight the text, then click Run.  Compare ANOVA table in the results in the Console pane with the ANOVA table in Fig. 14 of Section 3.1.  You can modify the script to use the red and green color data (CSV file linked below) to produce the results shown in Fig. 15 and the data for all three colors discussed in Section 3.2.

Since the format required for a single factor ANOVA with two categories is exactly the same as the format required for a t-test of means, the same file could be used for either test and the resulting P-values should be the same. 

3.5.2 Two-factor ANOVA

The setup for a multi-factor ANOVA in R is similar to a single factor ANOVA except that there are two columns for grouping variables instead of one.  Click here to see the structure of the data for the example in Section 3.3.  

Here is the R script to run the two-factor ANOVA:

# read in the data for the fake soap experiment
soapData <- read.csv(file.choose())

# display the data table
soapData

# fit a linear model to the data
model <- lm(counts ~ soap + triclosan, data = soapData)

#run the ANOVA on the model
anova(model)

You can copy the script from this raw text Gist.  Notice that the format of the function ("lm") that fits the linear model is very similar to the single-factor ANOVA.  The only difference is that there are two grouping variables ("soap + triclosan") specified after the tilde, instead of one.  Here is the ANOVA table from the output:

Analysis of Variance Table

Response: counts
          Df   Sum Sq Mean Sq F value Pr(>F)  
soap       1  4704500 4704500  7.0898 0.0164 *
triclosan  1   264500  264500  0.3986 0.5362  
Residuals 17 11280500  663559                 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

 

Since this is a two-factor ANOVA, there is a line in the ANOVA table for each of the two factors, and each factor has its own P-value.  The star after the soap P-value indicates that the factor is significant at the P < 0.05 significance level as shown by the key of significance codes below the table.  If the P-value had been less than 0.01, two stars would have been used, etc.  Compare the results with Table 18 in Section 3.3.  

 

There is fundamentally no difference in the setup of the test when one of the two factors is a block effect.  (Technically, this is not true since we should be making some modifications to the script due to the fact that the block is a random effect.  But that's beyond the scope of this class.)  Here is the script to analyze the data described in Section 3.4:

# read in the data for the blocked ERG experiment
colorData <- read.csv(file.choose())

# display the data table
colorData

# fit a linear model to the data
model <- lm(response ~ block + color, data = colorData)

#run the ANOVA on the model
anova(model)

Here's the link to the raw text Gist of this script.  Although the names of the factors and tables are different, the format of the script is exactly the same.  Run the script and verify that you obtain the same results as in Table 19 of Section 3.4.