Error of the third kind

We are very well aware of type I and type II errors in statistics. However, there is the error of the third kind or type III error that is more conceptual than type I and type II. It results from using inappropriate research methods that result in an outcome of finding the right answer to the wrong question. It is closely linked to choosing the incorrect level of analysis and/or incorrect inferences drawn. Sometimes it is also used when we correctly conclude that the two groups are statistically different (reject the null hypothesis), but we are wrong about the direction of the difference. Let’s understand this by first looking at the type I and type II errors.

The purpose of scientific research is to gain knowledge about the phenomenon of interest. To this end, we use all sorts of tools, techniques, and methodologies available to ensure that knowledge gained is “good” knowledge. The main task of the philosophy of science is to analyze the method of inquiry used in the sciences using formal logic. We use empiricism, rational thought, interpretivism, postmodernism, pragmatism, and/or a combination thereof, to gain scientific knowledge. Whether the science is well-developed then depends on our ability to draw logical inferences and/or explanations. At the same time inferences and explanations often have contextual components. That is, logic dictates science, and in turn, science dictates logic. Thus the philosophy of science and science are done at the same time and they mutually affect each other. However, philosophical reflection can uncover assumptions that are implicit in scientific inquiry. Hence, adopting a paradigm is not enough to produce good science. It is equally important to ask, “Are we asking the right question?”

Empirical research in social science often makes claims about probability. Some claims are explicit about the phenomenon of interest while others are at a more broader level. Explicit claims about the phenomenon have a chance of making a decision error. Neyman and Pearson (1933) provided a rule to guide this decision process of when to reject the null hypothesis. We are familiar with them as type I (false positive) and type II errors (false negative). What is often not talked about is the error of the third kind (type III).

Type I and Type II errors

Type I error occurs when we reject the null hypothesis when we should be accepting it. That is if we cannot rule out the chance factor (establish statistical significance) for the observed pattern in the sample, we should reject the alternative hypothesis but instead, we accept it. We reject a null hypothesis even when it is true.

Similarly, we commit a type II error when we accept a null hypothesis when we should be rejecting it. That is, if we can establish statistical significance for the observed pattern in the sample, we should be rejecting the null hypothesis but instead we accept it. We reject an alternative hypothesis when it is true.

	Null hypothesis is true	Null hypothesis is false
We accept null		Type II error
We reject null	Type I error

Type I and Type II errors

We commit a type I error when accidentally observe a relationship in the sample that is not there in the population. If an effect is detected we need to make sure that this is not due to a chance or random variations. Often this type of error can be captured later when replication is performed. We generally set the chance variation to be less than 5% (significance level). Conventionally, a chance variation of 5% is considered to be small enough to rule out chance variation. That is, we are less than 5% likely to draw a sample from the population that will show the observed patterns or effects.

We can reduce the chance of variation by setting our alpha lower. Lowering the alpha will also lower the power of the test. The power of a statistical test depends on the significance level, reliability of the sample, and the effect size. More stricter the standards for detecting a pattern in the sample (significance), the lower the power. However, the complement of power (failing to reject a false null) is a type II error. That is, an attempt to reduce type I error is inversely related to type II error. Cohen argued that “mistaken rejection of the null hypothesis is considered four times as serious as mistaken acceptance” (p. 5). Thus, social science research generally sets the error rate for type I errors at 5% and type II errors at 20%.

Sample size calculations based on these levels are a common practice these days. We also know that reliability always depends on the sample size. The larger the sample size smaller the standard error of the sample and the better the reliability. Effect size simply means the effect or degree to which the null hypothesis is false. The power is also affected by the tails. Two-tailed hypotheses will have less power compared to one-tailed hypotheses.

The error of the third kind

In 1957, statistician A. W. Kimball identified an additional error that may occur and termed them as “errors of the third kind”. He defined it as “giving the right answer to the wrong problem” (p. 134). Mitroff and Featheringham (1974) defined it as solving the incorrect representation or formulation of the problem. Kaiser (1960) defined it as, an incorrect decision of the direction of rejection in a two-tailed null hypothesis statistical testing. From the first two definitions, we can see that this error is not just a statistical error. It can also be a theoretical or methodological error. It can stem from the lack of awareness or a mismatch between the theory being tested and the research design. Thus, these errors are broader and perhaps more logical or epistemic in nature. Let’s look at an example.

Suppose we are interested in examining the causes of the extremely high rate of recidivism in the US. If we choose to focus on the individual differences (gender, race, age, etc.) that lead to recidivism, without considering the structural barriers faced by formerly incarcerated individuals, we commit a type III error. Research on this topic has highlighted multiple social and structural barriers faced by formerly incarcerated individuals. These barriers include state and federal laws governing their lives dictating what they can do, who they can live with when they can be out on the street, access to housing, financial resources, employment, and job market discrimination. These structural barriers play a much bigger role in driving the recidivism rate than the individual factors. Failing to consider structural barriers would preclude us from understanding the true cause of recidivism. It is a theoretical or methodological error that is outside the realm of statistics. We might find individual differences (right answer) but the question we formulated would be incorrect (wrong question). Even when we fail to detect individual differences it will still result in a type II error.

Another example of such an error is the excessive usage of objectifying students’ learning in a course through a focus on evidence-based instructions. Trying to measure learning objectively is fundamentally flawed as learning is a complex construct. Instructors are required to develop course objectives that are associated with course assignments that can objectively measure students’ learning. So having a course objective that attempts to develop a complex and nuanced understanding of our social world is nearly impossible. Why? Because, it is extremely difficult, if not impossible in some cases, to design an assignment that can objectively evaluate students’ understanding. Instructors are then forced to approximate the construct of understanding to objectively map it to an assignment. Course objectives that aspire to develop critical thinking and even more difficult to incorporate as the development of critical thinking takes time.

Furthermore, learning itself is a complex construct. Learning is constructive. That is, it builds upon previous knowledge. Learning is socially constructed. Learning is individual. Learning is experiential. Learning is interactive. The approximate operationalization of such a complex construct introduces theoretical and methodological errors leading to misrepresentation of what constitutes learning but results in an objective score. It knowingly puts instructors in a precarious position. Such policies, often enacted by course design institutes at the university, can deter instructors from setting objectives that may enhance higher-order thinking (metatheoretical and metacognitive) but are harder to objectively measure.

In either case, they commit a type III error either by omission of certain useful course content or by measuring learning incorrectly. University administrators are often inclined to use these objective scores to make decisions as it gives them a sense of bias-free objectivity. It is easy to see that most of the time it ends up doing more harm than good.

Type III error in general is rejecting the null hypothesis but for the wrong reasons or providing the right answer to the wrong question. In a sense, we can argue that this type of error comes into existence from how we theorize and operationalize our theories.

Mitroff and Featheringham (1974) argued that the progress of science is often measured by its ability to treat complex phenomena and not by patient accumulation of facts. In that sense, the error of the third kind is a fundamental error and unless we understand it and avoid it, reducing type I and type II error might be a moot exercise. They refer to it as the fallacy of misplaced precision (p. 393). That is, an incomplete or imprecise answer to the right question is more valuable than a precise answer to the wrong question. Thus we must safeguard against the error of the third kind. Unfortunately, such an error is more common than we might like as it emanates from the perception of people interpreting reality. The problem-solving context in organizations is ripe with this error when managers fail to distinguish between the cause and the symptoms. Problem identification is often a process of designing problems rather than discovering them. There is always a tension between devising a symptomatic solution to the visible problems versus devising a long term fundamental solution that requires a deeper understanding of the structures that produce the pattern of behavior in the first place. Fundamental solutions often require deeper understanding, more time, greater commitment, more resources, and greater patience. But organizations often fall prey to quick fixes. Awareness of the error of the third kind can help organizations ensure problem-solving efforts are not wasted on badly defined problems.

Sources

Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Lawrence Erlbaum Associates.

Hopkins, B. (1973). Educational research and type III errors. The Journal of Experimental Education, 41, 31-32.

Kimball, A. W. (1957). Errors of the third kind in statistical consulting. Journal of the American Statistical Association, 52(278), 133– 142. doi:10.2307/2280840

Mitroff, I. I., & Featheringham, T. R. (1974). On systemic problem solving and the error of the third kind. Behavioral Science, 19(6), 383–393. doi:10.1002/bs.3830190605

Neyman, J., & Pearson, E. S. (1933). On the problem of the most efficient tests of statistical hypotheses. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 231(694-706), 289–337. doi:10.1098/rsta.1933.0009

Raiffa, H. (1968). Decision analysis. Reading, MA: Addison-Wesley.