Skip to main content

BSCI 1511L Statistics Manual: 0.3.1 Running a paired t-test using RStudio

Introduction to Biological Sciences lab, second semester

Running RStudio

This exercise assumes that you already have both R and RStudio installed on your computer, or that you are using the lab computers where it is already installed.  If you need to install R and RStudio, see the Installing R and RStudio page.  

To start the exercise, double-click on the RStudio icon.  It is possible to do the exercise using R rather than RStudio, but RStudio has additional capabilities, so using it is recommended.

The Source Editor pane

Run RStudio, then from the File menu, select "New File -> R Script".  This will open a new Source Editor pane.  The Source Editor is where you can work on putting together scripts that will run a series of R commands in the console.  You could just as easily run those commands manually by yourself by typing them in the Console pane.  But by creating the script in the Source Editor, you can easily change things and re-run the script.  You can also save your script to use as a starting point in the future.  

Data format for a paired t-test

We will be doing the example from the previous section, so you should refer to that before starting this exercise.   

The format for doing a paired t-test in R is different from the format for a t-test of means.  It does NOT use a grouping variable.  Rather, the data values are in two columns with a column for each treatment that is being compared. Each pair is placed in its own row so that the analysis knows which data belong together.  (Note: it is possible to do an equivalent analysis using a grouping variable and the values in a single column - an ANOVA with blocking, which is discussed in Section 3.4.)   The format of the data is as shown in Fig. 17 in the previous section.  Note that it does not matter whether you include a column that assigns a label to the pairs.  If you do, R will simply ignore that column.

Example script

The following script shows how to run a paired t-test using the data from the example.  DON'T try to paste this script into RStudio!  Read on for the reason why!  Also, you will want to get these data from a file, so read the next section to see how!

# get the data into the software
Input =(
"group no_malonate malonate
a 0.026 0.026
b 0.052 0.047
c 0.045 0.005
d 0.09 0.088
e 0.012 0.02
f 0.084 0.087
g 0.006 0.007
h 0.078 0.074
i 0.016 0.015
j 0.005 0.001
k 0.12 0.08
l 0.055 0.035
m 0.055 0.035
n 0.02 0.018
")

# turn the text into an R table
Data = read.table(textConnection(Input),header=TRUE)

# do the paired t-test
t.test(Data$no_malonate, Data$malonate,
       paired=TRUE,
       conf.level=0.95)

Notice that in this script, we specify the columns to be used in the test by listing the name of the table ("Data"), followed by a dollar sight ("$"), then the name of the column (e.g. "no_malonate").  This differs from the arguments used for a t-test of means.  The test does not do anything with the first column "group", so it is ignored.

Unfortunately, copying this script from here includes invisible bad characters.  Instead, go to this GitHub Gist raw file page, copy, and paste the test into the Source Editor pane.  To run the script, highlight it in the Source Editor, then click on the Run button.  You will see each step of the script in the Console pane in blue, followed by the output of that step (if any) in black.  Compare the values of t, df, and P that are given to the values in the example.

It would be pretty dumb and annoying to have to type your data into the script like we've done here.  Usually the numbers will be typed into Excel, then exported as a CSV file that can be read in by R.  See the next section for more on that.

Running a script using data from a file

See Section 0.2.1 for more details about CSV files if you aren't already familiar with them.

To load data from a CSV file, in the script above delete the "Input = …" command along with all of the typed-in data, and replace the "Data = read.table …" command with:

Data = read.csv(file.choose())

Another alternative is to put the file on the cloud and read it from a URL.  Here is an example of the command to get the data from a GitHub Gist:

Data = read.csv(file="https://gist.githubusercontent.com/baskaufs/b65183639ee397c14d1b/raw/160610474bea3d7b09f491fcf6e161c4502aadc9/paired-t-test.csv")

There are several alternative syntaxes that you may see in scripts that you find online and want to hack.  A "left arrow" can replace the equal sign:

Data <- read.csv(file="https://gist.githubusercontent.com/baskaufs/b65183639ee397c14d1b/raw/160610474bea3d7b09f491fcf6e161c4502aadc9/paired-t-test.csv")

You can also use the "url" function:

Data = read.csv(url("https://gist.githubusercontent.com/baskaufs/b65183639ee397c14d1b/raw/160610474bea3d7b09f491fcf6e161c4502aadc9/paired-t-test.csv"))

All options will produce the same result.

 

Data files

Here is the paired-t-test.csv data file required to load the example data and run the paired t-test as described above.