At the end of this section, you should be able to:
What happens if one flips a penny and a quarter? A basic principle of probability is that the probability of two independent outcomes cooccurring is the product of their individual probabilities. This principle applies in the penny/quarter situation as long as the outcome of one flip doesn't influence the outcome of another (i.e. they are independent). Another way of describing the situation is to say that probability of a penny outcome is not associated with the probability of a quarter outcome.
The probability of obtaining the coin flip combinations under an assumption of independence can be calculated using this table:
Table 5. Joint probabilities of flipping two normal coins


quarter 



heads 0.5 
tails 0.5 
penny 
heads 0.5 
0.25 
0.25 

tails 0.5 
0.25 
0.25 
In Table 5, the sum of the probabilities for the four possible outcomes add up to one (meaning that it is certain that one of the four will happen). Since the four joint outcomes have the same probability values, we can say that the four outcomes are equally likely.
Now consider the situation where we have trick coins that are loaded to produce heads more often than tails. The trick penny has a probability of 0.6 of obtaining heads, while the trick quarter has a probability of 0.7 of obtaining heads. Under these circumstances, the joint probabilities can be calculated with this table:
Table 6. Joint probabilities of flipping loaded coins


trick quarter 



heads 0.7 
tails 0.3 
trick penny 
heads 0.6 
0.42 
0.18 

tails 0.4 
0.28 
0.12 
The outcomes are less obvious this time. A quick check shows that again the combination probabilities add up to one. However, this time the four joint probabilities are not equally likely.
In summary, we can predict the probability of two kinds of events cooccurring by multiplying the probabilities of the individual kinds of events.
Consider the case of the sex of children in families who have two children. Assume that the probabilities of having males and females are each 0.5 (i.e. it is equally likely to have a boy or a girl). There are various ways that the sexes of children could be distributed among families with two children and still produce an overall relative frequency of 0.5 males and 0.5 females. Some possible distributions are shown in Tables 7 through 9:
Table 7. Absolute frequencies of sexes of children with extreme negative association


second child 



male 
female 
first child 
male 
0 
250 

female 
250 
0 
Table 8. Absolute frequencies of sexes of children with extreme positive association


second child 



male 
female 
first child 
male 
250 
0 

female 
0 
250 
Table 9. Absolute frequencies of sexes of children with no association (complete independence)


second child 



male 
female 
first child 
male 
125 
125 

female 
125 
125 
The examples in Tables 7 through 9 are extreme, but they demonstrate the range of possible distributions. You should notice that in all three examples the sex ratios are the same (half males and half females). The difference is in the way those sexes are distributed within families. In the case of Table 7, the second child born is always the opposite sex of the first child born. In Table 8, the second child born is always the same sex as the first child born. In Table 9, there is no association between the sex of the first child born and the sex of second child born. An alternative to the term "association" is "contingent". We can say that the second outcome is contingent on the first outcome when the state of the second outcome depends on the state of the first outcome.
A scientist collects data on the sexes of children in 500 families having two children and records the following data:
Table 10. Actual absolute frequencies of children in some families with two children


second child 



male 
female 
first child 
male 
114 
131 

female 
132 
123 
From the data in Table 10, it appears that there may be a small negative association between the sexes of first and second children. However, it is also possible that there is no association and that the deviation from the expected is due to random variation. This situation can be tested statistically using a special case of the chi squared goodness of fit test that was described in Section 1.4 and 1.5 . This test is called a chi squared contingency test.
In this case, the null hypothesis is that there is no association between the sex of the first and second child (i.e. that the two factors, sex of first child and sex of second child, are independent). So it would seem like we could just compare the cells in Table 10 with those in Table 9 since both were based on 500 families. However, that would actually be inappropriate because it would be testing two different things: whether the sex of the first and second children were associated AND whether the sex ratios of the children were actually 1:1. What we really want to know is this: given the sex ratios that exist, are the sexes of the first and second children associated? So our first task is to determine the actual sex ratios of the first children and actual sex ratios of the second children. We can do this by expanding the table to provide totals for each category:
Table 11. Calculation of actual relative sex frequencies of children in some two child families


second child 





male 
female 
total 
actual relative frequencies 
first child 
male 
114 
131 
245 
0.490 

female 
132 
123 
255 
0.510 

total 
246 
254 
500 
1.000 

actual relative frequencies 
0.492 
0.508 
1.000 

The totals in Table 11 were used to calculate the actual relative frequencies of males and females for first and second children. These frequencies are near, but not identical to 0.5 . If we assume that the observed relative frequencies represent the probabilities of achieving these states (as discussed in section 1.6), we can now use these actual relative frequencies to calculate the joint probabilities of the various combinations of sexes for first and second children by multiplying the probabilities of single outcomes, as discussed in section 2.1 . The results of this are in Table 12.
Table 12. Calculation of expected joint probabilities for children in some two child families


second child 




male 
female 
actual relative frequencies 
first child 
male 
0.241 
0.249 
0.490 

female 
0.251 
0.259 
0.510 

actual relative frequencies 
0.492 
0.508 

In order to conduct an actual goodness of fit test, the expected joint probabilities must be converted into expected absolute frequencies of combinations, based on a total sample of 500 (i.e. the test must be performed on counts, not relative frequencies). This has been done in Table 13 by multiplying each expected joint probability by the total number of children observed.
Table 13. Expected absolute frequencies of children in some two child families


second child 



male 
female 
first child 
male 
120.5 
124.5 

female 
125.5 
129.5 
We are now in a position to conduct a goodness of fit test to see if the actual (observed) absolute frequencies listed in Table 10 differ significantly from the absolute frequencies we would expect if there were no association (Table 13).
The chi squared term for each combination (cell in Tables 10 and 13) is calculated as described in section 1.4 and the sum of the terms for each combination represents the chi squared value for the test. The number of degrees of freedom in a chi squared contingency test is reduced when compared to a generic goodness of fit test. That is because we lose degrees of freedom when we calculate the relative frequencies based on the data itself (as we did in Table 11). The rule for degrees of freedom in contingency tests is:
df=(rows1)(columns1)
In this example, there are two rows and two columns, so (21)(21)=1 and there is one degree of freedom. As in the regular goodness of fit test, the value of P depends on the chi squared value and number of degrees of freedom, and can be calculated using Excel as will be shown in the following section.
A chi squared contingency test is used to determine whether two factors are associated or independent. Each of the two factors must be discontinuous and recorded as one of several possible states. The test is performed on counts of outcomes. If the factors are not independent, they may simply be associated in some unknown way. It is also possible to use the test in circumstances where one variable is suspected to be dependent on the other (i.e. that the state of one variable is affected by the state of the other variable).