Skip to Main Content

BSCI 1511L Statistics Manual: 4 Summary of statistical tests

Introduction to Biological Sciences lab, second semester

Summary of tests

Introduction and Review of Statistical Tests

Over the course of 1510L and 1511L, you have been exposed to several statistical tests, or rather ways to objectively decide if there is a true difference or not between (usually) two data sets. And while Confidence intervals have their uses, it is simply a wholesale difference between two (or more) groups of data, and all you are able to determine is if a group of data overlaps another group or not… meaning there are no other indicative values to assess the strength of the difference. The other statistical tests that have been seen in 1510L and 1511L are summarized below.

Statistical Test

Value Type

What the data might look like…

Used for:

 a GENERAL null hypothesis

T-test of Means

Continuous

(Usually)

There is no reason why a particular value in one group would be related to a particular value in the other group

Comparing two group means

The true means of the two groups are the same (i.e. differences in sample means are due only to chance).

Paired T-test

Continuous

(Usually)

There is a connection between particular values in one group and corresponding values in the other group

Comparing differences between pairs of values in the two groups

The average difference between pairs of blocked data is zero (i.e. pairs of blocked data deviate randomly).

Linear Regression

Continuous

(Usually)

Values representing the experimental factor vary continuously rather than fall into discrete categories

Determining whether the line of best fit has a slope greater than zero

The slope of the best fit line through the data is zero (i.e. deviations from a best-fit line having slope of zero are random).

ANOVA

Continuous

Multiple groups or multiple factors influencing data

May analyze more than one experimental factor at a time, and determining whether means of several groups are different

There is no difference among the means of several groups.

Chi-Squared Contingency

Numerical Counts

(Usually)

 

Comparing whether states of one factor are associated with states of another factor

There is no association between the state of one factor and the state of another factor (i.e. the frequencies are the same as those that would be predicted if they were independent)

Chi-Squared Goodness of Fit

Numerical Counts

(Usually)

 

Comparing observed and expected frequencies

The frequencies observed in the categories are the same as the expected frequencies

This is being repeated again here: Note: the Null hypotheses above are stated in GENERAL TERMSYou should state them in more specific terms based on the details of the particular design of the experiment that you are discussing.

***Do not just paste the above examples as “the answer”.***

Also, when you choose a test, be aware of what the minimum sample size is for the test in general.  You can make any number you type into Excel to work, within reason.  However, recall the examples that you have done prior. We did not perform a t-test with only one value in group A and one value in group B.

 

 

There are several reasons for having learned about all the statistical tests and how to use them.

One reason has been stated in the past, to expand your knowledge-base, to give you experience using a "tool" used often in research and science. Also why we used Excel and R, using tools that are active in research and science.

Another is to show you how to objectively quantify something, to give you a clear measure of deciding if there is an affect or not.

A third reason is this: to help you in life to think about things.  To be able to hear information on something and decide based on the data if the claim is valid or not.  

There are many ways statistics and data can be used to be helpful. There are also ways that it can be used in less than forthright ways. Although we generally know better nowadays to not just blindly trust the advertising claim for a product, that was not always the case.  There was a survey performed (supposedly) years ago of over 113,000 medical doctors about what tobacco-product they preferred.  The claim made by the advertiser was that their brand of product was the most preferred by doctors, nothing about its benefits, just being preferred.

While today, the use of tobacco-products being unhealthy notwithstanding, it would be hoped that people would not immediately be swayed by such claims.  "Of over 113,000 medical doctors, this product was the most preferred." The claim never gave the details, such as how many (percentage) even smoked versus not smoked, or what percentage that was.  Without having a good basis for understanding how to analyze data, for deciding if an experimental design was robust enough, to understand the sample size was large enough....you end up with statements that are true (but not quite accurate) from a certain point of view.

 

R Studio recapping

R-studio recapping:

Using R is a challenge.  Using Excel (which is not a true "statistical program") is easy.  Press a button, highlight some text, poof....results!  With R, it is actually the same thing, only not with a convenient single button to push.  The "button" is all the command lines occurring in the background (in Excel) that you do not see.

As a summary for R, you have seen how to take a command line series (script) and modify it several ways.  By taking a previous script for a test and simply changing the numbers (either Chi tests or T-test of means or paired T-test). By use a command line (file.choose()) to search for a .csv file that has your data and to read the file and perform the test.  By use of actual file location on your system (e.g. something like file.c:local drive/1511L/statistics/t-test.csv).  You ALSO should have an example of each test performed and how it worked (with the script also) saved on your system that you can simply change the data into the file and re-run the script (after saving it or using a copy).

So you now know there are multiple ways that you can analyze data now AND you have at least two ways to do so....plus can verify if the numbers are correct and using a different program to get the same results.  You are now Statistics Pros!

Remember for R, if it doesn't work, there are a few simple things to check: the script has an error, the file is not in .csv format (comma delimited series...NOT UTF8!), what you copied into the script brought an invisible character (paste into notepad, then copy paste from there), the file you chose is not the correct file/you didn't save the new data you input into that file.  And sometimes, you know the script is correct, it isn't working....close R, try again. Also, BE AWARE of what script commands you have in the script.  For instance, it is nice to see the data (display.table) but it is not required to run the test.  R has a limit of how many rows of a table it will display (about 250 and no more unless you do a different command line).

R has a default lower limit of what it will show for a P-value.  It is possible that comparing your statistical test result of the same data with Excel to R that the df will match, the statistical value term (chi, t, f, etc.) will match, but the P-value will not.  The current low value for R is about 2.2x10^-16 (or 0.0000000000000022).