Research Guides: BSCI 1511L Statistics Manual: 2.6 Conducting a chi-squared contingency test using R

Data format

When running a chi-squared contingency test using R, the data are organized in the same tabular formula as Excel. The data must be in the form of an R data structure called a matrix. Details of the matrix data structure are beyond the scope of this class - we will simply use a function (as.matrix) to convert a text string into the matrix. The function uses whitespace (spaces, tabs) to determine where one column ends and another begins and linefeeds at the end of the line to determine when to start a new row. So be careful not to have extra spaces at the end of each line in the text string.

It is not required for the matrix to have labels for the states of the two factors; however, the table is more readable to humans and makes it less likely that you will make a mistake entering the numbers.

The following will be your example to do in an Assignment. So save it! Here is the Excel contingency table as it was laid out in the previous Excel example:

Here is how the table could be laid out as an R input string:

Input =( "SecondChild FirstMale FirstFemale Male 114 131 Female 132 123 ")

Example script

Here is a script that uses the data shown above:

# chi-squared contingency test for association between gender of first and second children # read in the data as an input text string Input =( "Child FirstMale FirstFemale SecondMale 114 131 secondFemale 132 123
") # convert the text string into a table with a header row of labels and a first column of row labels myMatrix = as.matrix(read.table(textConnection(Input), header=TRUE, row.names=1)) # print the matrix so we can check that the numbers were read in correctly myMatrix # run the actual chi-squared goodness of fit test chisq.test(myMatrix, correct=FALSE) # don't do continuity correction

Note: we print out the matrix before running the test to make sure that there weren't any goofy hidden problems with the way we entered the data between the quotes. The "correct=FALSE" argument tells R not to do Yate's correction for 2x2 tables - something we don't need to worry about.

When pasting the script into RStudio, use the raw text from this GitHub gist rather than copying and pasting from this page. Note that although it would be possible to input the data from a file, it probably isn't worth it in this case, since only four numbers need to be typed into the script.

You should save the raw text and the r script to your statistics folder.

If there are more than two categories for either factor, simply add more columns or rows to the input table.

Output

Here is the output given by R:

> myMatrix FirstMale FirstFemale SecondMale 114 131 secondFemale 132 123 > > # run the actual chi-squared goodness of fit test > chisq.test(myMatrix, correct=FALSE) # don't do continuity correction

Pearson's Chi-squared test

data: myMatrix X-squared = 1.3696, df = 1, p-value = 0.2419

The printout of the matrix shows that the numbers went in correctly. The test results are reported using the standard format, and are the same as those produced using Excel.

Reference

For other options and examples, see the Chi-square Test of Independence page in An R Companion for the Handbook of Biological Statistics.

BSCI 1511L Statistics Manual: 2.6 Conducting a chi-squared contingency test using R

Setup

Data format

Example script

Output

Reference