Research Guides: BSCI 1510L Literature and Stats Guide: 5.6 Discussing statistics in your scientific writing

5.6 Discussing statistics in your scientific writing

It is relatively easy to place numbers into statistical software and have the software produce results. It is more difficult to discuss those results in a way that indicates that you understand what they mean. There are many mistakes that beginning students make when writing about statistical results. These mistakes immediately flag the writer as a novice and may cause the reader to discount the writer's conclusions. An easy way that students tend to show their lack of understanding is to proudly state, "I did a Chi test." to which the person asks, "Which one?"... (student stammers and realizes they messed up). This section discusses the way particular terminology is used in the context of writing about statistical results.

You should be aware that scientists are extremely careful about using the word "significantly" (or "significant" or "significance"). In experimental science, the word "significant" is a technical term and should never be used except in the context of an assessment of a statistical analysis. "Significant" has the specific meaning that a statistical test has met the P<0.05 criterion. If you do not mean "significant" in the sense of statistical significance, you should use another word like "important" or "noticeable". Similar, there is no such thing (statistically speaking) as "insignificant", so do not say it!

You should also note that if the 95% CIs do not overlap we have "shown significance". We do not say that we have PROVEN that the means are different because that is an impossible thing to do. No matter how low the value of P, there is always some remote possibility that the distributions of the two patient groups are actually the same and that the differences that we are seeing are due to chance sampling of unrepresentative individuals.

If the 95% CIs do overlap we say that we failed to show that the patient responses to the drug are the different. That is radically different from saying that we "know" or "have shown" that the patient responses are the same. Remember that we may have failed to show significance because we simply did not measure enough samples to shrink the 95% CIs enough to keep them from overlapping. If we repeat the experiment with more samples, we might be able to show that the means are significantly different.

A common mistake is to say that there is a 95% chance that the true mean lies within the 95% CI. That is not true. The true mean is either within the 95% CI or it is not - there is no probability involved. Rather, it is correct to say that if we ran a sampling experiment many times, the 95% CI would contain the true mean in 95% of the experiments.

You may also have noticed that in this manual we have always framed the questions in terms of assessing the null hypothesis (i.e. that the distributions of the two groups are the same). Why would we choose to test the hypothesis that they are the same, when generally the question that we are interested in is whether two things are different? There are two reasons for this. It is a general principle in science that it is possible to falsify a hypothesis (within an acceptable margin of error, of course), but never possible to prove that a hypothesis is true because there may be alternative hypotheses that may also be consistent with your observations. Since it is not really possible to prove a hypothesis true, then a clever strategy would be to define the hypothesis as the opposite of what we are trying to show. If we can show that this opposite hypothesis is very unlikely, then we can infer that what we are actually trying to show is true. In the example of statistics we demonstrate that two things must be different by rejecting the null hypothesis of sameness.

The other reason for framing an experiment in terms of a null hypothesis is a practical one. Assessment of a P-value directly or indirectly is a common feature of all statistical tests. By its definition, P describes the likelihood of deviation due to chance unrepresentative sampling under the scenario of the null hypothesis. Since the criterion of significance in all tests is based on comparing P with the critical (alpha) value, framing the question in terms of the null hypothesis is unavoidable.

Although these ideas and the terminology may seem foreign to you, you should make the effort to acquaint yourself with them because they form the basis of most data analysis in the scientific literature. This can be seen by a casual examination of nearly any journal article. The idea of a null hypothesis and the P<0.05 criterion for significance are ubiquitous in all statistical tests. Because it takes time to absorb statistical concepts and orient your brain to statistical thinking, you may need to come back to this section to refresh your memory at points in the future.