Skip to main content

BSCI 1511L Statistics Manual: 2.6 Conducting a chi-squared contingency test using R

Introduction to Biological Sciences lab, second semester

Setup

It is assumed that you already have R and RStudio set up on the computer that you are using.  See Section 6 if you need to install them.  It is also assumed that you have done the example in Section 0.2.1 and are familiar with pasting a script into the Source Editor pane and running it.

Data format

When running a chi-squared contingency test using R, the data are organized in the same tabular formula as Excel.  The data must be in the form of an R data structure called a matrix. Details of the matrix data structure are beyond the scope of this class - we will simply use a function (as.matrix) to convert a text string into the matrix.  The function uses whitespace (spaces, tabs) to determine where one column ends and another begins and linefeeds at the end of the line to determine when to start a new row.  So be careful not to have extra spaces at the end of each line in the text string.

It is not required for the matrix to have labels for the states of the two factors; however, the table is more readable to humans and makes it less likely that you will make a mistake entering the numbers.  

Here is the Excel contingency table as it was laid out in the previous Excel example:

Here is how the table could be laid out as an R input string:

Input =(
"SecondChild    FirstMale  FirstFemale
 Male           114         131
 Female         132         123
")

Example script

Here is a script that uses the data shown above:

# chi-squared contingency test for association between gender of first and second children

# read in the data as an input text string
Input =(
"Child            FirstMale  FirstFemale
 SecondMale       114         131
 secondFemale     132         123

")

# convert the text string into a table with a header row of labels and a first column of row labels
myMatrix = as.matrix(read.table(textConnection(Input),
                              header=TRUE, 
                              row.names=1))

# print the matrix so we can check that the numbers were read in correctly
myMatrix  

# run the actual chi-squared goodness of fit test
chisq.test(myMatrix, correct=FALSE)      # don't do continuity correction

Note: we print out the matrix before running the test to make sure that there weren't any goofy hidden problems with the way we entered the data between the quotes.  The "correct=FALSE" argument tells R not to do Yate's correction for 2x2 tables - something we don't need to worry about.

When pasting the script into RStudio, use the raw text from this GitHub gist rather than copying and pasting from this page.  Note that although it would be possible to input the data from a file, it probably isn't worth it in this case, since only four numbers need to be typed into the script.

If there are more than two categories for either factor, simply add more columns or rows to the input table. 

Output

Here is the output given by R:

> myMatrix  
             FirstMale FirstFemale
SecondMale         114         131
secondFemale       132         123

> # run the actual chi-squared goodness of fit test
> chisq.test(myMatrix, correct=FALSE)      # don't do continuity correction

    Pearson's Chi-squared test

data:  myMatrix
X-squared = 1.3696, df = 1, p-value = 0.2419

The printout of the matrix shows that the numbers went in correctly.  The test results are reported using the standard format, and are the same as those produced using Excel.