The basic concept of "analysis of variance" (ANOVA) is not a difficult one to grasp. Some basic understanding of regression analysis, indicator or dummy variables combined with computer statistical skills are important prerequisites to interpret the data for ANOVA, but the concept of the analysis itself is quite simple.
In nonequivalent, quasi-experimental research designs the analysis of variance is an important tool to understand the statistical differences between the averages or the "means" of two different populations, however they are distinguished by research design. When the statistical significance of the differences of means can be assessed, then a more accurate comparison can be made between groups. Statistical differences are assessed through an ANOVA analysis.
In an ANOVA analysis, the relationship between measurements of the mean and the variance or "random error" of each group provides the information needed to determine if the difference between the two is significant. First, in order to illustrate the idea of means and variance, I will use part of a lecture given by Karl Popper in 1965.
Clouds and Clocks
Imagine a cloud of gnats flying at you. In some ways, there does not seem to be any order to the mass of bugs moving towards you. But somehow, as they approach, you note that this cloud of bugs has an order to it because, however chaotic each bug appears to be, the movement revolves around a nucleus from which the group does not stray to far. In this way, the cloud of gnats moves about closer and closer until they buzz on by, possbly leaving a few stragglers behind. We wouldn't describe the movement of the cloud as efficient or precise since each individual gnat's movement is not directly associated with the direction of the entire group. The "cloud's" path between point A to point B (your location), can be measured precisely based on the average path of the group, but the variance or standard error will be high since each bug indirectly arrived at the same general destination. Now imagine a clock. With a clock such as a caesium atomic clock, adjustments for precision are occasionally made in "leap seconds." Such an instrument keeps time to the billionth of a second, but really, most clocks are reliable to the minute with only periodic adjustments for daylight savings time twice a year. Popper imagined that scientific measurement ran the continuum of measuring systems with precision ranging in randomness from clouds to clocks:
... some natural phenomena, such as the weather, or the coming and going of clouds, are hard to predict: we speak of the 'vagaries of the weather'. On the other hand, we speak of 'clockwork precision' if we wish to describe a highly regular and predictable phenomenon. There are lots of things, natural processes and natural phenomena, which we may place between these two extremes, the clouds on the left and the clocks on the right. The changing seasons are somewhat unreliable clocks and may therefore be put somewhere towards the right, though not too far.
Karl Popper's discussion on clouds and clocks describes the fundamentals of ANOVA analysis. At one end of the continuum are clouds. Clouds are associated with higher random error - with all of the imprecision associated with higher variance. In terms of clouds, they are systems with a average or mean, but far more variance, that is, many gnats are dispersed throughout the cloud in a seemingly chaotic fashion. Clocks, on the other hand, are far more efficient systems with much smaller measurements of random error.
The Regression Formula
First of all, an Anova analysis assumes that there is a cause- effect relationship. The statistical analysis upon which the ANOVA is used, describes the relationship between cause and effect through statistics:
Y = B0 + B1x1 + e
where:
- Y = the effect
- B0 = the intercept
- B1 = the coefficient of the variable
- x1 = the variable
- e = random error
The formula above describes a "mean" or an average effect based on a either a sample from a population or an entire population. If the statistical regression is representative, it has what is called a normal distribution. That is not every piece of data can be measured at the mean. Rather, the data for the population is contained within three standard errors from the mean either to the positive or negative side.
An ANOVA calculation is derived from a statistical calculation for significance based on a T-Statistic calculated from the regression formula based on the least squares fit. A relatively higher mean (Y) compared to the errors (e) indicates a more "precise" system or group from which the data was derived. From the relative calculations of averages and errors for each group, a more precise determination on the significance of the difference between the two groups can be assessed.
Clouds and clocks are very different. Clearly there are significant differences in the two systems. Clouds would have high errors associated with the mean. Clocks would have low errors associated with the mean and the difference between clouds and clocks is highly significant. But in groups which are more alike than different, the ANOVA analysis provides a logical tool to assess the significance of the difference between two groups. What if a scientist wished to determine whether or not ten degrees farhenheit made a difference in the movement of a cloud of gnats? Then the differences are not so clear unless an ANOVA analysis is used to compare the averages and the errors for levels of significance.
Work Cited
Popper, Karl. Objective Knowledge. Oxford: Claredon Press. 1972.
Work Cited
Popper, Karl. Objective Knowledge. Oxford: Claredon Press. 1972.