Threats to external validity

External validity is about the generalizability of the results to different populations, settings, periods, treatment variables, and measurement variables. Then, threats to external validity are the conditions that limit this generalization of the results. Since external validity specifies conditions under which the internal validity relationship can be reproduced, threats to external validity invoke internal validity. These conditions are very similar to statistical interactions of the treatment with setting, context, population, and history. Two independent variables interact with each other when the effect of one IV on the DV depends on the level of the other IV (moderating variable) or the presence or absence of the moderating variable. We call this an interaction.

For example, a recent study by Misra et al found that the visible presence of smartphones (moderating variable MV) even when it is not used, reduces the quality of communication where couples report a lower level of empathy (dependent variable) from their partners especially when they share a close relationship (independent variable). Here the combined effect of the presence of a smartphone (MV) and the closeness of the relationship (IV) interact to produce a negative effect on empathy (DV). The presence of the smartphone moderates the relationship between closeness between couples and empathy.

Campbell and Stanely (1963) argued that the threat to external validity in some sense is not completely answerable. This is because induction or generalizability is never fully justified logically (p. 17). This is how science progresses. We can only verify instances where a specific theory/hypothesis/claim may or may not work, but we cannot make a general statement of its viability unless we test every scenario. This is the condition of falsifiability. That is, no theory is considered “true” or above the threat of disproof in future tests. Falsifiability is rarely, if ever, complete. But we do make generalizations based on cumulative experience (not logical deduction). We use our experience to make educated guesses as to what factors might legitimately interact with the treatment variables, setting, etc., and we guess what can be disregarded. Additionally, we also assume that events that are closer in time and space tend to follow the same laws. In general, efforts to control threats to external validity do require us to find close similarities of the experimental conditions to the real-world conditions.

Threats to external validity include:

Treatment by setting (pretest) interaction
Treatment by context (socio-physical surrounding) interaction
Treatment by history interaction
Treatment by selection (population) interaction

Treatment by setting (pretest) interaction

The experimental setting includes all experiences of participants due to the experiment whether planned or not. The effect of IV on the DV might be due to its combination with some aspects of the experimental arrangements including a pre-test. That is, any setting experience that moderates or changes the effect of treatment on the DV is a threat to its external validity. Many experiments include more than one treatment. This can raise the possibility of the treatments interacting with each other. If the treatment is only effective along with the presence of other treatments, it will limit its external validity.

For example, we are interested in evaluating the effect of a smoking cessation video (IV) on attitudes toward smoking (DV). We might observe that the treatment and control groups were similar at the pretest but not at the posttest. One explanation is that the treatment was effective. However, another explanation might be that there was treatment by setting interaction. The change in attitude toward smoking was a result of some factors related to the experimental setting, for example, the pretest. Participants might have been sensitized by the pretest. If we suspect such an interaction we might want to use the solomore four group design.

Treatment by context (socio-physical surrounding) interaction

When the observed effect of IV is due to its combination with some aspect of the social or physical environment, we can suspect treatment by context interaction. Context is defined as the larger social & physical environment surrounding the experiment. For example, a program for drug abusers might work well in rural areas but not in urban areas simply because the availability and access to drugs in urban areas is much higher than in rural areas.

Treatment by context interaction is more common when the research takes place in a laboratory setting or “artificial” environment where all the cues remind participants that they are in a research setting rather than a natural situation. The extent to which the laboratory conditions fail to represent the natural world may limit its external validity. This is the reason why critics of laboratory studies emphasize going for ecological validity. Ecological validity is the extent to which a research situation represents the natural social environment or the “real world”. A closely related concept is of mundane realism. Mundane realism asks how closely the research setting resembles the natural setting. Higher mundane realism results in a higher external validity. However, others have argued that trying to achieve mundane realism can hurt internal validity. Instead, we should work to achieve experimental realism, where the participants are fully engaged in the experiment.

When we suspect such treatment by context interaction we must try to replicate the result in a different context.

Treatment by history interaction

As the name suggests this threat comes from the fact that the observed effect of IV is due to its combination with some recent event or with a particular time. For example, programs related to testing attitudes toward vaccination during the COVID-19 pandemic are most likely to be heavily influenced by this global pandemic. Cause and effect relationships may appear in one time period but not in another. That is, not all cause relationships might be generalizable to other periods. The replication crisis is in some ways evidence of this threat.

Treatment by selection interaction

If we wish to generalize our research findings across the population we need to have a sample that is more representative of the population. When treatment by selection interaction is operational the observed effect of IV is due to its combination with some aspect of a particular subject sample. For example, if we recruit participants for a vitamin supplement program, advertise predominantly in health food stores, and obtain a bulk of the participants from these stores, we may have treatment by selection interaction. Participants who visit health food stores and volunteer for this study might be different from those who don’t. That is, our sample might be biased and not representative of the population. In such a case, we might want to make sure that the sample is representative of the population and might want to recruit participants who do not visit health stores. We can also make statistical adjustments to make sure that overrepresentation from one segment of the population is controlled.

Priorities and tradeoffs

It is reasonable to assume that as researchers we have to make choices about balancing external and internal validity. We cannot reasonably expect one study to deal with all the threats simultaneously. Threats should generally be thought of as raising concerns about priorities and tradeoffs. Just as knowledge and science progress adversarially and cumulatively, it is reasonable to expect that research deals with threats over time. It provides us with the opportunity to deal with the most serious threat applicable to make real progress.

Sources

Campbell, D. T., & Stanley, J. C. (1963). Experimental and quasi-experimental designs for research on teaching. In N. Gage (Ed.), Handbook of research on teaching (pp. 171–246). Chicago: Rand-McNally.

Cook, T. D., & Campbell, D. T. (1979). Quasi-experimentation. Boston: Houghton Mifflin

Shadish, W., Cook, T., & Campbell, D. (2002). Experimental and quasi-experimental designs for generalized causal inference. Boston, MA: Houghton Mifflin.