As stated earlier most scientific experimental data are influenced by a wide range of variables, some of which we cannot adequately control, so we must manage them with a technique called blocking. This technique works when there is an understanding of variables outside of the researcher’s reasonable control, but big enough sample size to understand and group similarly influenced data. Although, this is a relatively simple experimental technique it is very powerful and takes a large role in many important studies. The following example will help illustrate its importance.
Blocking example:
Did you know that cockroaches ‘see’ different than humans? They have a diurnal cycle like we do, they can detect if the light comes on or off…but they also do not see the same wavelengths of light that we do. For example, cockroaches are less able to detect the red wavelength compared to human eyes. Imagine at each station in the BSCI lab, that there is a box that a cockroach will be inside. This box has the ability to do two things: shine a light at the roach at a given color (red, green, blue) and can give the researcher data representing the cockroach’s response (an electrical response from the 'brain') to that light stimuli.
If we were to run this experiment (which we are not) you could imagine the data we would collect. With a full lab this semester we would have 30 stations generating three types of data points: a red, green, and a blue light response. After analyzing the data from all 8 sections of 1511L (up to 240 data points), you notice some interesting trends. You hypothesized that the red-light response to be smaller compared to the other color responses, and that did happen. What was more interesting though was that for some sets of red response data were much higher than the overall red data values. Similarly, you notice that the data sets with higher than average red values, also contained higher than average green and blue values. Likewise, when the red was low compared to the average red values, the green and blue values from the same stations were lower also. Despite these deviations, the red response to light was always less than the other colors.
When comparing the station numbers across sections to the data sets, a few trends emerge. Station 01 tended to always give the ‘high’ values, and station 15 was generally ‘low’ across sections. To investigate this, the cockroaches were switched between station 01 and 15. With the switch, the cockroach giving ‘high’ values was then giving values on the low side, and the same level of response to the switch could be seen in the original low response cockroach. Therefore, it can be assumed that something associated with the station is influencing the response data.
In this case, researchers can incorporate a blocking technique. The reasoning is that there is some ‘thing,’ some unseen and unplanned factor that is influencing experimental data. Experiments could be done to investigate these variables, but those could take time and money the researcher might not have, or the researcher might have already collected data and cannot repeat the experiment. Focusing on what the experiment design could control (the light) and could not control (unknown factor, such as the equipment reading high/low) could help the individual interpret the data. For instance, this experiment did not control various factors in the data collection system, meaning that there was no calibration across the stations. Many factors could influence data collection such as that some connections in the ‘roach box’ have a different resistance level of some circuit connections than others. Despite these unknown nuances, by blocking the data one could account for all of them.
To do this, three vertical columns could be made for each color (red, green, blue) in Excel. Each column could be assigned a second horizontal classifier for the lab station it was at (i.e., 01, 02…20). In doing this there would be a resultant Red 01, Green 01, and Blue 01 values. So instead of a single-factor ANOVA (effect of wavelength of light or 'color' on roach eye response), you have a two-factor ANOVA, effect of wavelength and effect of equipment (block variable). NOTE: in reality, using ‘01’ and ‘20,’ either you would change them to a term such as ‘a’ and ‘t’ or set it so that the number ‘01’ is not an actual number, but a nominal value or group name. R and Excel require the data to be set up differently for ANOVA. So please pay attention to the later reading. Additionally, ‘blocking’ as an idea can extend into other aspects of the experiment, such as being sure that you did not accidentally block your data by having all older persons in one group receiving a treatment and younger persons in another group not receiving it: your data would have that second variable that age may play a factor, or all people washing their hands with soap were female while the other group of males used water only.
This example was to give an example of a situation in which you would use ANOVA and blocking without too much statistical jargon. Following, is the set-up and reading to perform an ANOVA with R. Excel is unfortunately not able to do more than a single-factor ANOVA. Below you will get practice in running ANOVA analyses with similar data on red, green, blue light cockroach responses. Additionally, there will be a second set of data to use in the Lab next week, on some EKG heart data.
Using an ANOVA, we can simultaneously consider the effects of red, green, and blue light on 24 amplitude measurements and answer the question of whether color of light has an effect in general. (click to see data). In this example, the null hypothesis is that there is no difference among the mean responses for the three colors.
Table 3. Analysis of variance comparing effect of three colors of light on a roach eye
Source |
Degrees of freedom |
Sum of squares |
Mean square |
F ratio |
P |
Model |
2 |
857.2 |
428.6 |
14.8 |
4.44 x 10-6 |
Error |
69 |
1996.4 |
28.9 |
|
|
Total |
71 |
2853.6 |
|
|
|
These results in Table 3 show that the variance due to the effect of the three color treatments (model mean square) is much greater than the variance due to other factors (error mean square). This is reflected in the very small P-value. We can reject the null hypothesis and conclude that in general color has a significant effect on the response of cockroach retinas.
Notice that this test did not break the comparison down into individual pairs of colors. So it cannot say which particular colors are different from each other. It may be that some colors (e.g. blue and green) are not different from each other on average, but at least one of the treatment categories is different from another category. To determine which particular treatments are different from others, a significant ANOVA can be followed by a posteriori (“after the fact”) pairwise tests or by comparing 95% confidence intervals (a good method when there are many treatment categories, remember Sickbert-Bennett now?).