Validity – Internal vs External

In the previous post, we talked about measurement validity. But validity also relates to the ability of the research design to provide evidence of a cause-and-effect relationship between the independent and the dependent variable. This type of validity is known as internal validity. Additionally, validity also relates to our ability to draw valid inferences from the research. Our ability to generalize results from the study population to the target population is known as external validity. Then there is ecological validity, the extent to which a research situation represents the natural social environment or the “real world”. Let’s look at them individually.

Internal validity

Internal validity refers to our ability to make causal inferences within the context of a given research study. That is, how confident are we that the change in the dependent variable (DV) is caused by to change in our independent variable (IV)? Such claims of internal validity are based on the procedures and operations to conduct the research, the choice of design, the way variables are measured, and which variables are measured. As you might have guessed, this is largely a function of research design and our ability to establish conditions of causality and control rival explanations.

Research design is a set of quantitative, qualitative, or combined procedures for collecting, analyzing, and reporting data in a research study.

So in terms of research design, internal validity is then the ability of the study design to generate credible assertions. Campbell and Stanley (1963) considered internal validity to be the sine qua non of good experimental design (p. 175). Why? We conduct experiments to test causal relationships between the independent variables and the dependent variables so that we can draw valid conclusions.

For example, we want to examine the effects of a preschool program on the cognitive growth of children. To do this we need to have two groups of children: one group who attends the preschool program and another who does not. Additionally, to isolate the effect of the preschool program. we also need to make sure that the two groups are equivalent to each other in all respects except for attending the preschool program. That is, the two groups are approximately the same on all measured and unmeasured characteristics, except for attending the preschool program.

Here attending the preschool program is our independent variable and the cognitive score is our dependent variable. The group that attends the preschool program is generally referred to as the experimental or treatment group while the group that does not attend the preschool program is referred to as the control group. If at the end of the preschool program, we find the cognitive score of the treatment group to be higher than that of the control group, we can conclude that all else being equal, attending the preschool program results in (causes) cognitive growth in children.

Caveat

However, the caveat remains that we have controlled for all other rival explanations (all else being equal) that could affect this cause-and-effect relationship. For example, we would like to make sure that the cognitive scores of both groups were at the same level before the start of the preschool program. That is, there are no preexisting differences. The groups are comparable in demographic characteristics. They are comparable on other confounding variables that can affect cognitive growth in children, outside the preschool program, such as the socioeconomic status of the families, average time spent by parents with their children, exposure to violence, etc.

We need to establish group equivalence as confounding variables often compete with the independent variables to explain the causal relationship. Their presence confounds our understanding of the relationship between the IV and DV and threatens the internal validity of the study. So it is clear that to establish an unequivocal causal relationship between two variables, change in the causal factor (IV) has to be produced or observed under conditions that are isolated from confounding factors that may produce a spurious correlation between IV and DV.

Increasing internal validity

To reduce threats to internal validity, confounding variables either need to be held constant (statistically) or need to be uncorrelated with the variation in the independent and dependent variables. In essence, this is the logic of a good experimental design and researchers often use different research designs to gain experimental control over these confounds. Random assignment is one such tool to produce equivalent groups. That is, in the above example, we could randomly assign children to either the treatment or control group. Since the assignment to either group is random, in theory, it should produce probabilistically equivalent groups. It is always a good idea to check for pretest equivalence on desired characteristics to make sure that randomization has worked in eliminating systematic differences between the groups.

Campbell and Stanley (1963) identified several threats to internal validity that can be controlled through experimental design. These designs vary in their ability to control threats depending on the features of the design and what the researcher can control. These designs include pre-experiments (post-test only, within group, between groups), true experiments (pre-test post-test control group, Solomon four group, post-test only control group), quasi-experiments, (interrupted time series, nonequivalent control group design, variation on time series or within-subject design), counterbalanced designs, and separate sample pre-test post-test designs.

Greater control on all other variables (confounds) except the variation in the independent variable is a clear strength of a research design to isolate the causal effect. However, it can also be a weakness in terms of the generalizability of the results. That is, such designs are often lower on external validity except for psychological realism (when causal processes represented in the laboratory situation are the same as nonlaboratory context). Sometimes even psychological realism might not be enough. Experimental designs using random assignment and a control group (also known as randomized control trials) provide greater strength for internal validity however, implementing such designs is not always possible for ethical or practical reasons. Hence, researchers often go for a quasi-experimental design and use statistical controls rather than experimental controls.

External validity

External validity refers to the extent to which the findings from the research study can be generalized across the samples, population, treatment, settings, time, and space. Put simply, to what extent does the causal relationship (direction and strength) hold stable across different contexts and samples? It is important to recognize the differences between external and construct validity. While construct validity refers to how well the observed measure captures the construct, that is, how generalizable is the measure to the construct, external validity refers to the generalizability of the causal finding across context. External validity is thus an inferential process. Generalization can often be thought of as a two-part process. One is the statistical generalization from the sample to the population (statistical inference validity) and the second is the nonstatistical generalization beyond the population from which the sample is drawn.

Shadish, Cook, and Campbell (2002) identified five principles that can aid researchers with causal generalization. These are:

  1. Surface similarity: Researchers should assess the apparent similarities between study operations and the prototypical characteristics of the target of generalization. For example, studies of the effects of secondhand smoke in the workplace seem more similar to the public settings at issue in policy debates than do studies of smoking in private residences.
  2. Ruling out irrelevancies: Researchers should identify those things that are irrelevant because they do not change a generalization. For example, the location of a cognitive science research lab is irrelevant to the finding that people tend to use groups of seven to remember things.
  3. Making discriminations: They should clarify key discriminations that limit generalization. For example, child psychotherapy works in the lab but might not work in the clinic.
  4. Interpolation and extrapolation: They should make interpolations to unsampled values within the range of the sampled instances and, much more difficult, they explore extrapolations beyond the sampled range. For example, the effects of toxic chemicals on small mammals will generalize to much larger and more biologically complex humans.
  5. Causal explanation: They develop and test explanatory theories about the pattern of effects, causes, and mediational processes that are essential to the transfer of a causal relationship (p. 353-354).

They argue that none of these principles are necessary or sufficient in themselves but causal inferences are not complete unless we can provide the knowledge required by these principles. In that sense, establishing evidence for external validity is the search for moderators and mediators that limit its generalization.

Some researchers have divided external validity into population validity, ecological validity, treatment variation validity, outcome validity, and temporal validity. Population validity is the degree to which the causal relationship can be generalized to and across the target population. Temporal validity refers to the extent to which the causal relationship could be generalized across time which includes seasonal and cyclical variations. Treatment variation validity is the extent to which the causal relationship is independent of variation in treatment. Outcome validity is the degree to which the causal relationship is stable across different but related dependent variables.

Ecological validity asks whether the research setting represents the real world. In other words, the causal relationship should be independent of the experimental setting. Lab settings can appear highly controlled and artificial making the participants aware that they are being studied rather than in a natural setting. For example, the emotional distress a participant feels reading a hypothetical scenario about romantic infidelity may differ drastically if they actually are the victim of a romantic infidelity. Long surveys can also lead to survey fatigue resulting in skewed responses.

Increasing external validity

Just as random assignment helps in increasing internal validity, a good representative sample concerning our unit of analysis obtained through probability random sampling helps in increasing the external validity of the study. Other design elements such as systematic, stratified random sampling, and cluster sampling also help in increasing the external validity. Replication with a new sample is another way to boost external validity, whether it is under a different setting, population, treatment, or time. Research using experimental designs should try to simulate real-world environments as much as possible.

The need to establish evidence for different types of validity should be evaluated based on the purpose of the study. For example, descriptive research might need an eye for ecological validity, utilitarian research might need external validity whereas testing an explanatory theory might need attention to construct validity rather than external validity.

Internal and external validity are often in contrast with each other. Often an experiment with strong external validity weakens its internal validity and contributes very little to advance our understanding because conclusions cannot be drawn about the relationship being studied. When constructing a research design, we need to be aware that efforts to maximize one type of validity may reduce or jeopardize other types. Hence the best way to evaluate the validity of a research study is to design it in a way that aligns it with the purpose for which the research is undertaken.

Sources:

Brewer, M. B., & Crano, W. D. (2014). Research design and issues of validity. In H. T. Reis & C. M. Judd (Eds.), Handbook of research methods in social and personality psychology (pp. 11-26). Cambridge University Press.

Campbell, D. T. (1957). Factors relevant to the validity of experiments in social settings. Psychological Bulletin, 54, 297–312.

Campbell, D. T., & Stanley, J. C. (1963). Experimental and quasi-experimental designs for research on teaching. In N. Gage (Ed.), Handbook of research on teaching (pp. 171–246). Chicago: Rand-McNally.

Cook, T. & Campbell, D. (1979). Quasi-experimentation: Design and analysis issues for field settings, Houghton Mifflin, Boston.

Corrigan, P. W., & Salzer, M. S. (2003, May). The conflict between random assignment and treatment preference: Implications for internal validity. Evaluation and Program Planning, 26(2), 109–121. doi:10.1016/S0149-7189(03)00014-4

Shadish, W., Cook, T., & Campbell, D. (2002). Experimental and quasi-experimental designs for generalized causal inference. Boston, MA: Houghton Mifflin.

Wilson, V. L. (1981). Time and the external validity of experiments. Evaluation and Program Planning, 4, 229–238.

Cite this article (APA)

Trivedi, C. (2024, March 26). Validity internal vs external. ConceptsHacked. Retrieved from https://conceptshacked.com/validity-internal-vs-external/

Chitvan Trivedi
Chitvan Trivedi

Chitvan is an applied social scientist with a broad set of methodological and conceptual skills. He has over ten years of experience in conducting qualitative, quantitative, and mixed methods research. Before starting this blog, he taught at a liberal arts college for five years. He has a Ph.D. in Social Ecology from the University of California, Irvine. He also holds Masters degrees in Computer Networks and Business Administration.

Articles: 37