Introduction to Biological Sciences lab, second semester

When running a chi-squared contingency test using R, the data are organized in the same tabular formula as Excel. The data must be in the form of an R data structure called a matrix. Details of the matrix data structure are beyond the scope of this class - we will simply use a function (as.matrix) to convert a text string into the matrix. The function uses whitespace (spaces, tabs) to determine where one column ends and another begins and linefeeds at the end of the line to determine when to start a new row. So be careful not to have extra spaces at the end of each line in the text string.

It is not required for the matrix to have labels for the states of the two factors; however, the table is more readable to humans and makes it less likely that you will make a mistake entering the numbers.

Here is the Excel contingency table as it was laid out in the previous Excel example:

Here is how the table could be laid out as an R input string:

`Input =(`

"SecondChild FirstMale FirstFemale

Male 114 131

Female 132 123

")

Here is a script that uses the data shown above:

`# chi-squared contingency test for association between gender of first and second children`

# read in the data as an input text string

Input =(

"Child FirstMale FirstFemale

SecondMale 114 131

secondFemale 132 123

`")`

# convert the text string into a table with a header row of labels and a first column of row labels

myMatrix = as.matrix(read.table(textConnection(Input),

header=TRUE,

row.names=1))

# print the matrix so we can check that the numbers were read in correctly

myMatrix

# run the actual chi-squared goodness of fit test

chisq.test(myMatrix, correct=FALSE) # don't do continuity correction

Note: we print out the matrix before running the test to make sure that there weren't any goofy hidden problems with the way we entered the data between the quotes. The "correct=FALSE" argument tells R not to do Yate's correction for 2x2 tables - something we don't need to worry about.

When pasting the script into RStudio, use the raw text from this GitHub gist rather than copying and pasting from this page. Note that although it would be possible to input the data from a file, it probably isn't worth it in this case, since only four numbers need to be typed into the script.

If there are more than two categories for either factor, simply add more columns or rows to the input table.

Here is the output given by R:

`> myMatrix `

FirstMale FirstFemale

SecondMale 114 131

secondFemale 132 123

>

> # run the actual chi-squared goodness of fit test

> chisq.test(myMatrix, correct=FALSE) # don't do continuity correction

` Pearson's Chi-squared test`

`data: myMatrix`

X-squared = 1.3696, df = 1, p-value = 0.2419

The printout of the matrix shows that the numbers went in correctly. The test results are reported using the standard format, and are the same as those produced using Excel.

Jean and Alexander Heard Libraries · 419 21st Avenue South · Nashville, TN 37203 · Phone

© Vanderbilt University · All rights reserved. Site Development: Digital Strategy and Development (Division of Communications)

Vanderbilt University is committed to principles of equal opportunity and affirmative action. Accessibility information.

Vanderbilt®, Vanderbilt University®, V Oak Leaf Design®, Star V Design® and Anchor Down® are trademarks of The Vanderbilt University