When running a chi-squared goodness of fit test using R, the actual frequencies (i.e. the observed frequencies) must be absolute (i.e. counts). The expected frequencies must be relative (i.e. the probabilities or proportions expressed as decimal fractions). This differs from the way the test is conducted in Excel where both the actual and expected frequencies must be absolute. For example, the Excel example in section 1.5 looks like this:
whereas the values used in R would be:
|category||expected relative frequency||actual absolute frequency|
Here is a script that uses the data shown above:
# chi-squared goodness of fit test testing whether coin flips differ significantly from what we expect
observedFlips = c(46, 41) # actual absolute: (heads, tails)
expectedProb = c(0.5, 0.5) # expected relative: (heads, tails)
x = observedFlips,
p = expectedProb,
When pasting the script into RStudio, use the raw text from this GitHub gist rather than copying and pasting from this page. Note that although it would be possible to input the data from a file, it probably isn't worth it in this case, since only four numbers need to be typed into the script.
If there are more than two categories, simply include more numbers within the parentheses, separated by commas.
Here is the output given by R:
Chi-squared test for given probabilities
X-squared = 0.2874, df = 1, p-value = 0.5919
Notice that this conforms to the expectations about what is normally reported for a statistical test: the value of the statistic (chi-squared value), the number of degrees of freedom, and the P-value.