Research Guides: BSCI 1511L Statistics Manual: 3.5 Conducting an ANOVA using RStudio

3.5.1 Single factor ANOVA

To perform a single factor ANOVA using RStudio, you need to set up a table with two columns. Although it is possible to enter the data directly into the script, it is more likely that you will want to load the data from a CSV file, probably one created using Excel or some other spreadsheet software. The format of the table is the same as what was described in Section 0.2.1 (Lab 2 Model Organisms II): one column should contain the continuous data to be analyzed and the other should contain values that assign the row to a category (a “grouping variable”). Note: the names used to assign a row to a given category must be exactly the same in every row. To avoid accidentally misspelling the category name, it is best to type it in the first cell, then paste it into the other cells. After entering the data, save the file in CSV format in a location where it can be accessed via the “file open dialog” command. See Section 0.2.1 for the details on how to access a file by that method.

YOU SHOULD PERFORM THESE TESTS as you can try the scripts out and get the expected results as shown in the previous tables from the "ANOVA" Brightspace pages. As you have for T-tests and Chi Square tests, we want you to get the R scripts, run the data and get the same results with the examples you read about earlier. Remember, you need to use .CSV files for R Studio.

The homework will ask you to perform single factor and two-factor ANOVAs with different data sets that should require 'hacking' the script (modify). It would be in your interests to try this out and get a script that works BEFORE trying it on your own.

Here is an R script that is set up to run the first ANOVA shown in Section 3.1: DO NOT COPY THE BELOW SCRIPT!

# read in the blue and green color data from a CSV file ergData <- read.csv(file.choose())

# display the data table ergData

# fit a linear model to the data model <- lm(response ~ color, data = ergData)

#run the ANOVA on the model anova(model)

To test this script, copy it from the raw text of this Gist and paste it into the Source Editor pane of RStudio. Notice that there is some variation in this script from previous ones. The left arrow symbolism (“<-”) assigns the value on the right to the variable on the left in a manner similar to the equal sign (“=“) in earlier examples. There is also an alternative method of specifying the file location via a URL.

As was the case with the t-test of means, in the “lm” function, the name of the data column is the first argument of the function, followed by a tilde and the name of the grouping variable.

Highlight the text, then click Run. Compare ANOVA table in the results in the Console pane with the ANOVA table in Fig. 14 of Section 3.1. You can modify the script to use the red and green color data (CSV file linked below) to produce the results shown in Fig. 15 and the data for all three colors discussed in Section 3.2.

Since the format required for a single factor ANOVA with two categories is exactly the same as the format required for a t-test of means, the same file could be used for either test and the resulting P-values should be the same.

CSV files containing ERG data used in the examples

3.5.2 Two-factor ANOVA

The setup for a multi-factor ANOVA in R is similar to a single factor ANOVA except that there are two columns for grouping variables instead of one. Click here to see the structure of the data for the example in Section 3.3.

Here is the R script to run the two-factor ANOVA:

# read in the data for the fake soap experiment soapData <- read.csv(file.choose())

# display the data table soapData

# fit a linear model to the data model <- lm(counts ~ soap + triclosan, data = soapData)

#run the ANOVA on the model anova(model)

You can copy the script from this raw text Gist. Notice that the format of the function ("lm") that fits the linear model is very similar to the single-factor ANOVA. The only difference is that there are two grouping variables ("soap + triclosan") specified after the tilde, instead of one. Here is the ANOVA table from the output:

Analysis of Variance Table

Response: counts Df Sum Sq Mean Sq F value Pr(>F) soap 1 4704500 4704500 7.0898 0.0164 * triclosan 1 264500 264500 0.3986 0.5362 Residuals 17 11280500 663559 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Since this is a two-factor ANOVA, there is a line in the ANOVA table for each of the two factors, and each factor has its own P-value. The star after the soap P-value indicates that the factor is significant at the P < 0.05 significance level as shown by the key of significance codes below the table. If the P-value had been less than 0.01, two stars would have been used, etc. Compare the results with Table 5 in Section 3.3.

There is fundamentally no difference in the setup of the test when one of the two factors is a block effect. (Technically, this is not true since we should be making some modifications to the script due to the fact that the block is a random effect. But that's beyond the scope of this class.) Here is the script to analyze the data described in Section 3.4:

# read in the data for the blocked ERG experiment colorData <- read.csv(file.choose())

# display the data table colorData

# fit a linear model to the data model <- lm(response ~ block + color, data = colorData)

#run the ANOVA on the model anova(model)

Here's the link to the raw text Gist of this script. Although the names of the factors and tables are different, the format of the script is exactly the same. Run the script and verify that you obtain the same results as in Table 6 of Section 3.4.

p-qrs-anova example 2.0

p-qrs-anova-example 2.0.csv

For the 'p-qrs-anova' example above, you should obtain a P-value of 0.000147 for doing the single factor ANOVA. If you do not, you have made an error. Try again before asking for assistance. Common errors are not using the correct labeling terms that in the file versus the script, adding spaces in without knowing, not using the GitHub script.