CIAO

email icon Email this citation


The Role of the Congruence Method for Case Study Research

Alexander L. George

MacArthur Program on Case Studies
Georgetown University
March 18-22, 1997

Before describing and illustrating the congruence method in case study research we will place it in the broader context of discussions of the comparative method and, in particular, the method of controlled comparison. We employ the term "comparative method" in one of its customary usages as referring to comparative analysis of a relatively small number of cases by non-statistical means. "Controlled comparison," more specifically, refers to the method of studying two or more cases that resemble each other in every respect but one, thereby achieving or approximating the functional equivalent of an experiment which makes it possible to rely on experimental logic to draw causal inferences.

We shall discuss first the requirements for controlled comparison and emphasize the difficulty of meeting them. Our purpose in doing so is to point up the need to develop an alternative method for case study research that does not rely upon experimental logic. We believe it is urgent to develop such an alternative to the experimental paradigm because it is generally extremely difficult to find two cases that resemble each other in every respect but one. Not only is the possibility of establishing the control required for an experiment obviously out of the question for most questions that need to be researched in the fields of international relations and comparative politics, the familiar alternative of using statistical analysis to achieve the functional equivalent of an experiment runs into the fact that there is an insufficient number of cases for many problems that require study. Further, even in those cases in which experimentation is possible it is often regarded as ethically problematic and discouraged, if not forbidden.

Controlled Comparison: Its Requirements, Logic of Causal Inference, Limitations

1. The Problem of "Too Many Variables, Too Few Cases"

Many writers have noted this well-known problem that bedevils rigorous application of the comparative method and have discussed several possible ways of coping with, or minimizing it. A remedy often proposed is simply to redefine the research problem to make the number of cases large enough to permit statistical analysis or to reduce the number of variables. Other suggestions point to the possible utility of non-statistical analysis of a small number of cases.

Thus, for example, Smelser has suggested that the investigator may resort to the "replication of the suspected association at a different analytical level." As an illustration of this practice Smelser cited Durkheim's study of suicide among the military. Replication, he noted, makes possible a multiplication of observations at another level of analysis.

Lijphart addressed this problem in two articles. In the first one he discussed four ways of coping with the problem of "too many variables, too few cases." In his second article he acknowledges that some of his earlier suggestions provided a solution to this problem only in the sense that they made statistical analysis possible by enlarging in some way the number of cases or by reducing the number of variables (by focusing the comparison only on "key" variables, i.e., "a general commitment to theoretical parsimony"). The other solution he identified was what we have referred to here as the strategy of "controlled comparison," although he felt that one had often to settle for a large measure of control since two cases seldom resembled each other in every respect but one.

It should be noted that the problem of establishing "control" among cases arises from a desire to employ in small-n research the same logic of scientific inquiry as in experiments and statistical analysis. In controlled comparison, however, it is seldom the case that one can find cases that permit confident and valid use of the logic. Several ways of doing so, or attempting to do so, will be discussed. But first we need to recall the important conceptual and methodological role John Stuart Mill's methods have played in discussions of the comparative method and to take note of the severe difficulties of applying them.

Mill's Methods and Their Limitations

The essential logic of the comparative method, as numerous writers have noted, is derived from John Stuart Mill's disquisition in A System of Logic (1843) in which he discussed the "method of agreement" and the "method of difference." These are sometimes referred to as the "positive" and "negative" comparative methods. The (positive) method of agreement attempts to identify similarities in independent variables associated with a common outcome in two or more cases. The (negative) method of difference attempts to identify independent variables associated with different outcomes.

Mill himself emphasized the serious obstacles to making effective use of these methods in social science inquiry. He noted that the multiplicity and complexity of causes of social phenomenon make it difficult to apply the logic of elimination relied upon by the method of agreement and difference, thereby making it difficult to isolate the cause of a phenomenon. Mill judged the method of difference to be somewhat stronger than the method of agreement and he also proposed the method of concomitant-variation to deal with some of the limitations of the other two methods.

Mill, then, was anything but optimistic as regards the possibility of satisfactory empirical applications of these logics in social science inquiry. Other logicians and methodologists have since expressed even stronger reservations. However, since the logics associated with Mill's methods are integral to the strategy of controlled comparison, serious questions have to be raised about studies that employ this strategy. In evaluating the results of such studies, it is always necessary to judge, as best one can, how well the investigator has managed to achieve "control" among the cases, whether the logic of these methods has been correctly employed, and whether the causal inferences and theoretical conclusions drawn a from a study have been weakened by an inability to identify and/or control all the operative variables that may have influenced the outcomes of the cases.

The well-known (but often ignored or minimized) problems associated with using Mill's methods in controlled comparisons of small-n can be briefly reviewed. (Similar problems arise in statistical analysis of large N and are dealt with by various techniques for analysis of covariance, techniques which of course are not applicable in small-n controlled comparison studies.) The methods of agreement and difference both utilize the logic of what Mill called the "method of elimination." Mill explained that his use of the logic of elimination was analogous to the way in which it is employed in the theory of equations "to denote the process by which one or another of the elements of a question is excluded, and the solution is made to depend on the relation between the remaining elements only."

In the method of agreement the investigator employs the logic of elimination to exclude as candidate causes (independent variables) for the common outcome (dependent variable) in two (or more) cases those conditions that are not present in both cases. A cause or condition that survives this method of elimination can be regarded as possibly associated ("connected," in Mill's terminology) with the case outcome. An inherent weakness of this method of causal inference is that another case may be discovered later in which the same outcome was not associated with the condition/variable that survived the elimination procedure in the comparison of the two earlier cases. Logically speaking, therefore, that condition could not be regarded as a "necessary condition" for that type of outcome. Whether the condition in question might nonetheless be a "sufficient condition" for that outcome remains a possibility, but this causal relationship can be supported only by examining many other cases to ascertain whether that condition is always associated with that type of outcome.

In the method of difference in which two cases having different outcomes are compared the investigator employs the logic of elimination to exclude as a candidate cause (independent variable) for the variance in the outcome (dependent variable) any condition that is present in both cases. On the face of it the logic is quite simple; a condition present in both cases cannot account for the difference in case outcomes. On the other hand, any other condition under investigation in a case that survives this method of elimination can be regarded as possibly associated with the case outcome. However, this causal inference is only a conjecture since additional cases may be discovered later in which the condition in question is not associated with that particular case outcome.

In exercises of this kind the investigator cannot be sure that all of the possibly relevant independent variables have been identified or that the study has included a sufficient variety of cases of the phenomenon.

Hence, inferences in both methods of agreement and difference may be spurious and invalid. On the other hand, if a much larger number of independent variables are included, we may well encounter the problem already discussed, of "too many variables, too few cases." This is a dilemma which cannot be easily or adequately resolved so long as the investigator relies solely on the logic of elimination and attempts to find sufficiently comparable cases that provide the functional equivalent of experimental control.

This logic of causal inference for small n comparisons is highly problematic insofar as the phenomenon being investigated has complex, multiple determinants rather than, as in the simple examples of Mill's methods referred to, a single independent variable of presumed causal significance. Thus, in the example of the method of agreement cited above the investigator might eventually discover that a condition that was "eliminated" was associated with the outcome when and only when an additional condition, one not included in the initial study, was also present. Meanwhile, failure to discover this additional condition might lead the investigator to prematurely discard the first condition's significance on the ground that it was not always associated with the type of outcome in question. This highlights the possibility of "false negatives" when applying the logic of elimination that goes along with the more obvious possibility, already alluded to, of "false positives."

Another major difficulty encountered in attempting to employ the logic of elimination occurs when the phenomenon under investigation has alternative determinants-what Mill referred to as the problem of "plurality of causes" (what is referred to as "equifinality" in general systems theory), a characteristic of many social phenomena. For such phenomena the same type of outcome can emerge in different cases via a different set of independent variables. Thus, in the example of the method of agreement we cannot be certain that the type of outcome identified is associated only with a given independent variable. If that phenomenon is subject to "plurality of causes," we may encounter sooner or later one or more additional cases in which that type of outcome occurs in the absence of the conditions with which it was earlier associated.

Some specialists on comparative method have identified another variant of Mill's methods which they refer to as the "indirect method of difference." Ragin describes this variant as involving "a double application of the method of agreement." First, the investigator identifies instances of a similar outcome of a phenomenon to see if they display a similar independent variable. If they do, then instances in which that outcome is absent are examined to see if they lack the independent variable associated with the outcome. Ragin discusses the uses and limitations of this indirect method, noting that it "suffers some of the same liabilities as the method of agreement in situations of multiple causation" as well as in the case of phenomena that are affected by "conjunctural causation." More generally, Ragin issues a useful warning against "mechanical" application of Mill's methods.

We may take note briefly of the considerable controversy in recent years among specialists in comparative politics regarding the utility of Mill's methods for research in their field. A strong proponent is Theda Skocpol who strongly asserted its value for comparative historical analysis and stated it was the approach taken for her book States and Social Revolutions. She made no reference to Mill's own sober cautions regarding the difficulty of applying these methods in most research or social phenomena. However, Skocpol did recognize the "inevitable difficulties in applying the method according to its given logic" since "often it is impossible to find exactly the historical cases that one needs for the logic of a certain comparison." Recognizing this and other difficulties, she concluded: "Still, comparative historical analysis does provide a valuable check, or anchor, for theoretical speculation." And, continuing, she came close to recognizing that she had supplemented use of Mill's methods with what we call "process-tracing." Thus, Skocpol noted, comparative historical analysis making use of Mill's methods "encourages one to spell out the actual causal arguments suggested by grand theoretical perspectives ... ." In the book itself, however, and in a subsequent essay on comparative methodology, she did not explain how process-tracing served as a check on the limitations of Mill's methods or could be fruitfully combined with them. In a joint essay with Margaret Somers written later Skocpol now did recognize that Mill himself "despaired of the possibility of effectively applying the analytic methods he discussed to sociohistorical phenomena," but she argued that "complete retreat in the face of difficulties is surely unnecessary."

Skocpol's understanding and use of Mill's methods was nonetheless sharply challenged by a number of other scholars. Among them was Elizabeth Nichols. She, however, did not call attention to the importance of process-tracing as a method of compensating for the limitations of Mill's methods or recognize the ancillary role it played in Skocpol's study. This was left to Jack Goldstone who explicitly notes the importance of process-tracing in Skocpol's study and, more generally, in comparative history. Indeed, he emphasizes that "History in this sense is at the heart of comparative case-study methods ... . The key to comparative case-studies in macrosociology is this unraveling of historical narratives. I have called this procedure 'process-tracing'."

The Implications of Equifinality for Theory-Building

The fact that different causal patterns can lead to similar outcomes has profound implications for efforts to develop empirical theory (or general laws). Equifinality challenges and undermines the assumption on which so many efforts to develop general explanations for a phenomenon are based-namely, the assumption that similar outcomes in several cases must have a common cause. Such an assumption misdirects the attention of the investigator by leading him/her to believe that the task of empirical inquiry is to discover a single causal pattern for cases that have similar outcomes on the dependent variable. Instead, discarding this pernicious assumption, a major redefinition of the task of developing empirical theory is required when a phenomenon is governed by equifinality. The task becomes that of discovering different causal patterns that lead to similar outcomes. When a phenomenon is governed by equifinality, then the investigator's task is to produce a differentiated empirical theory that identifies different causal patters that produce similar outcomes. If this research task is taken seriously, it does not suffice for an investigator to content himself with a claim that he has at least discovered a common causal factor for all or many cases that have similar outcomes on the dependent variable. Such an explanation, even if justified, is incomplete, and moreover, it leaves unanswered the question of the causal weight of the common factor in the total explanation. (More on this later.)

Some investigators may attempt to deal with the challenge posed by equifinality by claiming no more than that the relationship embodied in the single causal proposition is a probabilistic one. However, a quantitative description of that probability is usually left unspecified since it would require considerable additional empirical research either on the total universe of relevant cases or a sample thereof.

The phenomenon of equifinality has important implications also for efforts to assess the ability of a deductive theory to make successful predictions. Sensitivity to the possibility that the phenomenon in question is subject to equifinality requires that consideration be given to the likelihood that some undetermined number of outcomes which the deductive theory predicts can be predicted as well, perhaps better, by another deductive theory.

Equifinality also calls attention to the possibility that successful predictions are not necessarily valid explanations, since another theory may be able to claim to explain as well to predict those outcomes. (For additional discussion of the implications of equifinality for theory development and the use and assessment of theory, see the paper on "Typological Theory.")

* * *

Enough has been said thus far to indicate special difficulties that analysis of covariance encounters in investigations that deal with a small number of cases not subject to statistical analysis. Various logical errors have been noted that can easily creep into efforts to establish associations of presumed causal significance.

Views Regarding the Utility of the Controlled Comparison Method

Given the special difficulties encountered with the controlled comparison method, it is not surprising that investigators should differ in their judgment of its general utility for theory development. Not all investigators, however, believe that the problem is so intractable under

any and all circumstances as to warrant abandoning controlled comparison studies altogether. Nonetheless, practically all efforts to make use of the controlled comparison method fail to achieve its strict requirements. This limitation is often recognized by investigators employing the method, but they proceed nonetheless to do the best they can with an admittedly imperfect controlled comparison. They do so because they believe (erroneously in our view) that they have no "scientific" alternative and no way of compensating for the limitations of controlled comparison. (We shall return to this problem later.)

Various suggestions have been made for finding some way to make do with imperfect control or to accept that it is inevitable. Smelser, for example, calls attention to "the method of heuristic assumption." This is a "crude but widely employed method of transforming potentially operative/independent variables into parameters," a method that has on occasion proven to be helpful and fruitful in a variety of investigations. Lijphart, while acknowledging that it is difficult to find case that are comparable enough and that one seldom can find cases similar in every respect but one, believes that "these objections are founded on a too exacting scientific standard" and that useful research can be accomplished by studies that approximate this standard as closely as possible.

On the other hand, other writers believe that the quest for controlled comparison should be abandoned in favor of a quite different approach. Przeworski and Teune distinguish between a "most similar" design (i.e., the closely matched case of controlled comparison) and a "most different" research design. The former, they argue, runs into serious difficulties in failing to eliminate rival explanations. A "most different" design, in contrast, deliberately seeks cases of a particular phenomenon that differ as much as possible, the research objective being to find similar processes or outcomes in the diverse cases. Przeworski has suggested that the utility of the "most different" design approach has contributed to considerable success of the recent literature on democratization, such as the works of O'Donnell, Schmitter, and Whitehead. These analysts, Przeworski maintains, were forced to distill from highly diverse cases a set of common factors that possessed great explanatory power. The reader may wonder whether the "most different" type of research design bears some resemblance to Mill's method of difference. However, investigators making use of it evidently do not rely mechanically on the logic of elimination to make causal inferences and typically appear to work with multiple variables that play themselves out as a part of process over time. (Further study is needed to establish whether they entail use of process-tracing.)

Other Ways of Achieving a Controlled Comparison

We have discussed in some detail the difficulty of implementing the "solution" offered by Lijphart and other scholars to the problem of "too many variables, too few cases"-namely, to find comparable cases so closely matched that they provide the functional equivalent of an experiment. However, it turns out that history seldom provides the investigator with cases that achieve the necessary "control." There are, however, rare exceptions.

A. The "Before-After" Research Design

There are two ways of trying to achieve a controlled comparison. One of these is the "before-after" research design. Instead of trying to find two different cases that are comparable in all ways but one, the investigator may be able to achieve "control" by dividing a single case into two sub-cases.

In this connection, Collier calls attention to the classic study by Donald Campbell and Julian Stanley in which they noted that the logic of experimental design can be approximated in "quasi-experiments." They had reference to "observational" studies of a phenomenon occurring in a natural setting in which an event or a choice occurs at some point in time that creates the equivalent or approximation of an experimental intervention. This permits the investigator to identify a "before-after" configuration within the sequential development of a longitudinal case. Particularly valuable in their discussion was the warning of pitfalls of too simple an application of this approach.

One of the assumptions or requirements of a "before-after" research design, not easily satisfied to be sure, is that only one variable changes at that given point in time, dividing the longitudinal case neatly in two. Another potential pitfall, this one emphasized by Campbell and Stanley, is that the values of the observed variables should not be examined only immediately before and after the event, but also well before and well after it. Collier sums up the pitfall as follows: "Causal inferences about the impact of discrete events can be risky if one does not have an extended series of observations." As Campbell and Stanley suggested and as subsequent research of that kind demonstrated, this type of quasi-experimental research design if imaginatively and carefully employed, can be extremely useful in policy evaluation research. Similarly one may note its resemblance to the method of "process-tracing" that will be discussed later.

B. The Use of a Counterfactual Case or Mental Experiment

Another way of attempting to achieve a controlled comparison when two historical cases closely resembling each other cannot be located is to match the given case with an invented one that does. The case is, of course, a hypothetical one derived through counterfactual analysis of the existing case or, as it is sometimes referred to, the "mental experiment." As James Fearon and others have noted, resort to counterfactual analysis, either explicitly or implicitly, is a common practice in many different types of research. And resort to mental experiments in the service of theory development has a long and often distinguished history.

However frequently it is employed, counterfactual analysis still lacks explicit criteria and standards for distinguishing good practice with this method from its highly speculative, less disciplined uses. This is not the place to attempt to offer a comprehensive list of standards for counterfactual analysis. A few may suffice to suggest relevant criteria. First, since a counterfactual case necessarily builds upon an existing case, it will be difficult to invent an acceptable one unless a plausible explanation for the existing case has already been constructed. This step is important, obviously, because the counterfactual varies what is thought to be the critical variable(s) that presumably accounted for the historical outcome. If the investigator has an erroneous explanation for the historical case, then the counterfactual analysis is likely to be flawed. Second, the relationship among variables hypothesized by the invented case must also be supported in similar fashion. Third, the independent variable that is varied in the existing case in order to produce an invented one must be autonomous; that is, it must be separable from other independent variables that have operated to produce the outcome in the first case. When several independent variables are interconnected, as is often the case for problems that engage the interest of social scientists, it becomes difficult to invent a usable new case via counterfactual analysis by varying only one variable, and the complexity of the interconnected variables may be difficult to identify reliably.

Fourth, if the explanation for the historical case consists of a series of events in sequence over time rather than a single, simple circumscribed event, then constructing an acceptable counterfactual becomes much more difficult. For this would require a counterfactual that involves a long, complex chain of causation involving many variables and conditions. Conversely, a counterfactual case is easier to construct if there were one or only a few decisive points in the historical case that determined the outcome.

The Need for an Alternative to Controlled Comparison: The "Within-Case" Method of Causal Inference

The discussion above of Mill's methods, comparative case studies, and other methods fall under what Charles Ragin has termed "variable-oriented" approaches; that is, they attempt to use comparisons between cases to establish the causal powers of particular variables. In contrast, the sections that follow constitute a "case-oriented approach," or methods of what we have termed "within-case" analysis. The approach here focuses not on the analysis of variables across cases, but on the explanation of the particular conjunction of variables or causal path that constitutes a case. This does not mean that within-case analysis is not appropriate to studies involving cross case comparisons or typological theories-indeed, within case analyses are essential to such studies and can significantly ameliorate the limitations of Mill's methods and typological theories. Equally important, however, is the fact that the modes of within-case analysis discussed below allow single case studies to contribute to theory development, a point that has often been unappreciated or misunderstood in analyses of case study methods. We focus in this paper on one method of within-case analysis, the congruence method, and we discuss elsewhere the method of process-tracing.

A sober conclusion emerges from the preceding discussion of obstacles to achieving the strict requirements of controlled comparison. As noted, the paradigm of experimental method has quite limited utility for small-n case study research. The method of causal inference based on the logic of elimination cannot be effectively employed in those small-n studies that do not meet the requirement of strict control. Causal inferences in experimental research are derived by observing the effect of manipulating a variable across cases on outcomes. Controlled comparison attempts to make use of the same mode of causal inference. It is correctly described, therefore, as employing a variable-oriented approach, also referred to by the author in previous publications as the "across-case" approach.

Small-n case studies, however, can and do often employ an alternative non-experimental procedure for making causal inferences-referred to in previous publications as the "within-case" approach to explanations. This mode of explanation assesses the causal relationship between variables within the single case and foregoes an effort to note the effect of a change in the experimental variable across cases. Within-case explanation does not make use of the logic of elimination associated with Mill's methods which is the basis for drawing causal interpretations in controlled comparisons.

Rather, the methodology of within-case explanation is akin to that of historical explanation of single cases. It makes use of "process-tracing" which differs from standard historical explanation couched in narrative form by attempting to convert descriptive historical explanations into analytical ones that are couched in theoretically relevant variables.

Writers on comparative method often allude to what is referred to here as within-case explanation without identifying the critical role it can and sometimes does play in making up for the limitations of Mill's methods and, more generally, those of controlled comparison. Ragin, for example, refers briefly to "interpretive analysis" by which, apparently, he means historical explanation, but he fails to recognize it as an alternative to causal inferences derived via efforts in controlled variable-oriented research. Ragin correctly emphasizes that "case-oriented" methods (in contrast to the variable-oriented cross-case method) require investigators to consider cases as "whole entities" rather than as collections of variables. But he fails to recognize the role of process-tracing within a given case "in interpreting specific cases and in pinpointing the combinations of conditions, the causal complexes, that produce specific outcomes ... ."

There are two distinct modes of within-case explanation: the congruence procedure and the process-tracing method. The first of these will be discussed in some detail in the present paper; a detailed treatment of process-tracing is deferred to another paper in preparation although the reader will find some reference to it in what follows as well in the paper on typological theory. For the present, we should like to emphasize-and will illustrate in detail in later publications-that Mill's methods can be used in conjunction with process-tracing. As in Theda Skocpol's study of revolutions, and in many other studies, Mill's methods provide a framework within which process-tracing is employed to assess the causal status of hypotheses suggested by these methods.

The Congruence Procedure

The congruence method can be employed in a single case study when the research objective does not require comparison with other cases. The congruence method can also be used in each of several case studies-each of which is an instance of a particular phenomenon-when these cases are insufficiently comparable for achieving the control required for a controlled comparison.

As this suggests, the congruence method is quite adaptable and can be useful for furthering a variety of research objectives in different research designs. Thus, recalling Eckstein's typology of different ways in which case studies can contribute to theory development, the congruence type of case methodology can be employed in a "disciplined-configurative" study, a "heuristic" (hypothesis-generative study), a "plausibility probe," or a "crucial case" (or "tough test").

The flexibility of the congruence method extends also to the use of one or more case studies to assess the predictive and/or explanatory performance of either a deductive or an empirical theory. It can also be used when the results of individual case studies are used as building blocks in the development of an empirical theory. (For discussion see the paper on typological theory.)

The congruence approach works with either a deductive or empirical theory that purports to predict or explain outcomes on the basis of specified initial conditions. Such a theory may be provided by existing formal or tacit theories; or it may be formulated by the investigator by drawing on the results of previous case studies (large N statistical or small n); or from quasi-experimental work. Or the theory may be postulated for the first time by the investigator on the basis of a hunch that it is an interesting theory whose predictive or explanatory potential should be assessed. We turn to a discussion of the uses of the congruence method when a theory, deductive or empirical, is being applied to explain or predict the outcome of a particular case.

Depending on the level of development of the theory being employed, its predictions may be abundant and precise, or they may be scarce and highly general.

Once a starting theory has been identified and one or more cases are singled out for examination, the investigator asks: given the value of the independent variable in this particular case, what prediction(s) can be made from the theory regarding the outcome of the dependent variable? The investigator uses a deductive theory or an empirical generalization to generate a prediction/explanation for the outcome of the dependent variable. If the outcome is consistent with the prediction, then there is at least a presumption or possibility of a causal relationship. (Before proceeding, we should note briefly and save for later discussion that a finding of consistency and a possible causal relationship may be sensitive to the level of concreteness-abstraction with which the value or variance of the dependent variable is defined by the investigator.)

Another general criterion for congruence tests has been termed "congruity," or similarities in the relative strength and duration of hypothesized causes and observed effects. This does not mean that causes must resemble their effects or be on the same scale, and researchers must avoid the common cognitive bias toward assuming this should be the case. For example, there is a temptation to assume that large or dramatic effects must have large and dramatic causes, but this is not necessarily true. Researchers must take into account theoretical reasons why the effects of hypothesized causes might be amplified, diminished, delayed, or sped up (through expectations effects). Once this has been done, it is possible to address the question of whether the independent and dependent variables are congruent; that is, whether they vary in the expected directions, to the expected magnitude, along the expected dimensions, or whether there is still unexplained variance in one or more dimensions of the dependent variable.

Although consistency is often taken as providing support for a causal interpretation (and, for that matter, for assessing deductive theories generally), this practice is obviously open to misuse and subject to challenge. Ways must be found to safeguard against unjustified, questionable imputation of a causal relationship on the basis of mere consistency, just as safeguards have been developed in statistical analysis to deal with the possibility of spurious correlation.

There are several distinctive ways in which this problem can be addressed. The investigator can employ process-tracing to identify a causal path (the causal chain) that depicts how the independent variable leads to the outcome of the dependent variable. Process-tracing will be discussed in detail later. Here it suffices to call attention to the close connection of process tracing with "causal mechanisms," the importance of which has received increasing emphasis in recent years in some branches of philosophy of science and in some branches of social science work.

The usefulness of combining the congruence method with process-tracing was demonstrated in the innovative study by Yuen Foong Khong, Analogies at War (Princeton University Press, 1992) which will be discussed later. Earlier examples of the use of process-tracing in case studies to elaborate (or assess) the causal standing of an explanation derived in the first instance by applying a deductive theory are to be found in the studies by Vinod Aggarwal (Liberal Protectionism, University of California Press, 1985) and David Yoffie, Power and Protectionism: Strategies of the Newly Industrializing Countries (New York: Columbia University Press, 1983).

Another way in which the investigator can attempt to deal with the limitations of the congruence method is to argue that the deductive theory or empirical generalization being employed is powerful and well validated, that it fits the case at hand extremely well, that it is not rivaled by competing theories or at least does better than conceivable alternative theories.

By invoking the superior standing of the theory employed and/or by resorting to process tracing, the investigator may be satisfied that the "within-case" approach relied upon suffices and that it does not need to be buttressed by "across-case" comparisons. The "within-case" method of causal interpretation is discussed elsewhere in this volume. Briefly, it is a non-experimental way of making a causal inference in a single case. In this respect it differs from "controlled comparison" in which the investigator tries to find cases that are similar in every respect but one, which then serves as an experimental variable that varies across the several cases. Controlled comparison employs a variable-oriented approach to causal inference and attempts to achieve the functional equivalent of an experiment. Since strictly controlled comparisons are seldom possible, we have emphasized the need for an alternative "within-case" approach that does not attempt to employ experimental logic. As noted here, the congruence method is compatible with either method.

When the investigator cannot have confidence in the adequacy of the within-case method of causal interpretation, he may supplement it by analytic procedures that provide the functional equivalent of orthodox control associated with experimental logic. This is accomplished by making use of counterfactual analysis and mental experiments. When this is attempted, quite obviously the congruence method no longer relies solely on the within-case approach but makes use also of across-case analysis. Depending on how the congruence method is employed, then, it is compatible with either within-case and across-case methods of causal inference and can serve to bridge them.

When proceeding to employ a counterfactual case for this purpose, use of the congruence procedure is required to pass a series of hurdles based on questions inspired by the logic of experiment in order to assess the plausibility of the causal inference suggested by the observed consistency between the independent and dependent variables in a single case. Two questions need to be identified and addressed for this purpose. First, "is the consistency spurious or of possible causal significance?" Second, "is the independent variable a 'necessary condition' for the outcome of the dependent variable? If so, how much explanatory/predictive power does it have?" It is important to address this last question since a condition may be "necessary" but still contribute little to the explanation or prediction of the outcome in question. We must ask, therefore, how much the independent variable contributes to the explanation, whether or not it qualifies as a necessary condition.

Except for tests of deterministic theories stated in terms of necessity and sufficiency, single theory congruence tests are not strong enough to merit "confirmation" or "falsification" of theories. More than one theory may appear to be equally congruent with the outcome, or the outcome may be caused by other factors not identified by any of the theories considered. Researchers thus have to be sensitive to the issues of spuriousness, causal priority, and causal depth in qualifying the strength of inferences made on the basis of congruence tests. Spuriousness occurs when the observed congruence of the cause C and effect E is artificial because both C and E are caused by some third factor Z (whether Z has or has not been identified as a competing theory):

Alternatively, the putative cause C is defined as lacking "causal priority" if C is necessary for E, but C is itself only an intervening variable wholly or largely caused by a necessary prior variable Z. In this instance, both Z and C are necessary for E, but C has no independent explanatory value:

A third possibility is that C can be defined as lacking "causal depth" if a third variable Z would have brought about E even in the absence of C. In this instance, it does not matter whether or not Z is related to C. In other words, Z has greater causal depth because it is necessary and sufficient for E, and Z may act through C or through some other variable X. In contrast to the example of causal priority, C is not in this instance a necessary condition for E.

Thus, the appearance of congruence, especially when only or primarily one theory is considered, does not suffice to support an inference of causality, nor does the lack of congruence deny a possible causal role. Moreover, even if a congruence test suggests that a variable played a causal role in a given case, this does not mean that this theory proposes causal factors that are necessary, sufficient, or causal in any sense in other cases where contextual and conjunctive variables are different.

How Plausible is the Claim of "Consistency Not Spurious"?

The possibility that consistency between the values of the independent and dependent variable in a given case is not spurious but possibly causal gains a measure of support if the relationship can be supported by a general law or statistical generalization. For example, a causal inference drawn from the observed consistency between an independent cognitive variable such as the actor's belief and some aspect of that individual's behavior can be supported by psychological theories of cognitive balance which call attention to the fact that individuals generally (at least under certain conditions) strive to achieve consistency between their beliefs and their actions. This, of course, is a very general theory. If more specific generalizations or theories could be adduced, the imputation of a causal relation would be strengthened. In general, the stronger and more precise the general theory, the more confidence we ought to attach to claims that consistency is not spurious.

Is the Independent Variable a "Necessary Condition" for the Outcome of the Dependent Variable?

Assuming that the consistency identified appears to be causal and not spurious, the investigator may wish to pursue the inquiry further by attempting to assess whether the independent variable is a "necessary condition" for the outcome in question. This question, of course, may be difficult to resolve. Efforts to do so will require the investigator to move beyond "within-case" controlled comparison. Ideally, one would try to find other cases in which the same type of outcome occurred in the absence of that independent variable. If such a case(s) were discovered, then the independent variable could not be regarded as a necessary condition.

When one or more comparable cases are not available, then the investigator can resort to analytical imagination to think of hypothetical cases that might help to judge whether the same type of outcome might occur in the absence of that independent variable. In other words, as noted earlier, the investigator resorts to counterfactual analysis and mental experiments in an effort to create a controlled comparison. Disciplined use of analytical imagination will at least provide a safeguard against the temptation to move too quickly and confidently from the earlier judgment that consistency was not spurious to the further inference that the independent variable is a necessary condition for the occurrence of that type of outcome. If the grounds for regarding the independent variable as being a necessary condition are shaky or dubious, as is often likely to be the case, then it is advisable to claim no more than that the type of independent variable in question appears to "favor"-make more likely-the occurrence of a certain type of outcome.

Efforts to discipline use of the congruence method must address another question as well: "Is the independent variable that is causally related to this particular outcome of the case also consistent with other possible outcomes?" In the analysis of a single case it must not be forgotten that history provides only one outcome of the dependent variable. Accordingly, it is easy to overlook the possibility that other outcomes, had they occurred, might also have been consistent with the value of that independent variable. Once again, if the investigator cannot locate cases in which the independent variable having the same value was accompanied by diverse outcomes, he/she can resort to disciplined imagination to assess this possibility. It will be useful for this purpose if the investigator immerses himself in the rich details of the historical case being examined. This may enable him to envisage with greater confidence that the outcome might well have gone in different directions even with the independent variable held constant, had variation occurred in other operative independent variables. When there is reason to believe this might have been so, it will be necessary to assign weaker general predictive and explanatory power to the independent variable in question. It should be noted that broadening the assessment of the causal status of the independent variable (or theory) in question requires the investigator to take into account that other independent variables imbedded in the case may have played a role in producing that outcome.

Still another question can be asked to further discipline and refine the effort to assess the causal significance of consistency. Thus, "Is it possible to conceive of any outcomes of the historical case that would not have been consistent with the independent variable?" Once again, by immersing one's self in the historical case the investigator might envisage a number of other possible outcomes interestingly different from the historical outcome that would also have been consistent with the implications of the independent variable. If so, then the independent variable (or deductive or empirical theory in question) may be part of the explanation, but its ability to discriminate among alternative outcomes and its predictive power are much weakened. On the other hand, if the investigator can not envisage other outcomes that could also plausibly occur in the case in question, then there would be reason to attribute stronger predictive power to the independent variable or theory of which it is a part.

Similarly, if all the conceivable outcomes would be consistent with the theory, then its explanatory power may be limited or negligible. Conversely, if other outcomes might have occurred which were not consistent with the theory, then the investigator has additional presumptive evidence of the explanatory power of the theory for the actual or the other conceivable outcomes identified.

A hypothetical example may be useful to illustrate and clarify how questions of this kind, which attempt to replicate the logic of controlled experiment, can contribute to making more refined and more valid causal interpretations in single case analysis.

In our hypothetical example, the first actor takes an action (independent variable XX) that appears to have a particular impact on the second actor's behavior (outcome A). The investigator finds that independent variable XX (but not YY or ZZ) is consistent with outcome A. The investigator now asks whether XX can explain and predict only outcome A. Or would outcomes B, C, and D-outcomes which did not occur in this historical situation-also have been consistent with XX? If so, while XX may be part of the explanation, its explanatory (and predictive) power is diminished since other explanatory variables are also needed to round out the explanation as to why the second actor's response was A (and not B, C, or D).

These interpretations of the explanatory power of XX are summarized in Figure 1.

FIGURE 1

A more refined analysis is possible. Suppose that although outcome A differs in interesting respects from outcomes B, C, and D, nonetheless all four outcomes share a certain characteristic in common-namely, for example, that all are "conciliatory" responses by the second actor to the first actor's action (though the precise nature of the conciliatory response varies). Suppose further that, in contrast, options G, H, and I are all "hard, refractory" responses to the first actor's behavior. If so, then XX acquires added explanatory and predictive power of a quite useful kind, for it does discriminate between conciliatory and refractory responses (though not by itself between variants of a conciliatory response). (This illustrates the observation made parenthetically above, p. ??, that a causal relationship may be sensitive to the level of concreteness-abstraction with which the investigator defines the value and variance of the dependent variable.)

From this hypothetical example we turn to a more general discussion of using the congruence mode to assess the causal role of an actor's beliefs in his/her decisionmaking.

Use of the Congruence Method to Assess the Causal Role of Beliefs in Decisionmaking

Specialists who have focused attention on decisionmaking approaches in the study of foreign policy have long emphasized the importance of cognitive variables. Attention has centered on the impact that a variety of general beliefs about international politics held by decisionmakers can have on their choices of policy. However, important methodological issues arise in attempting to assess the role that such beliefs play in the sequential steps of decision making: (1) the information processing that precedes the decision finally taken and (2) in the actual choice of policy. The foregoing discussion of the congruence mode is relevant for addressing these issues.

General support for the assumption that a policymaker's beliefs about international politics influence his decisions is provided by cognitive consistency theory. But as is now well known, an individual's beliefs and behavior are not always consistent with one another for various reasons. While a decisionmaker's beliefs play an important role in information processing that precedes his actual choice of action, variables other than these beliefs come into play to affect the choices made. The latter is the case insofar as the policymaker's decisions are likely to be influenced not merely by his own policy preference by also by the need to obtain sufficient support for whatever policy he decides upon, by the need for compromise, by domestic or international constraints on his freedom of action, etc., which may run in a direction that significantly modifies or is contrary to his preferred option.

It is more useful, therefore, to regard an individual's general beliefs as introducing two types of propensities, not determinants, into his decisionmaking:

(a) diagnostic propensities, which extend or restrict the scope and direction of information processing and shape the decision-maker's diagnosis of the situation to be dealt with, and

(b) choice propensities, which lead him to favor certain types of action alternatives over others (but which may give way or be altered in response to decisional pressures).

As a result, psychological consistency theory, although relevant, cannot by itself provide robust support for applying the congruence method to studies of the role of beliefs in decisionmaking. Causal interpretations in such studies must be disciplined by the methodological questions noted earlier which must be asked when the congruence method is employed.

We would like to add here, at the same time, that confidence that consistency between an individual's beliefs and actions is of causal significance is enhanced if it is encountered repeatedly in a sequence of decisions taken by an actor over a period of time. This observation played an important role in Stephen Walker's ingenious study of the role of Henry Kissinger's beliefs in the bargaining he conducted with North Vietnamese leaders. In this study Walker pioneered in developing highly systematic and explicit methods for employing the congruence procedure. He also addressed the important question whether Kissinger's actions were better explained by situational or role variables rather than by his cognitive beliefs. Walker advanced a plausible argument to the effect that Kissinger's operative beliefs were idiosyncratic in important respects and not easily accounted for by situational or role variables. That is, the set of Kissinger's beliefs and his policy actions consistent with those beliefs were probably not those anyone else in his position would have displayed. In this connection, Walker noted that the Nixon administration's policy on Vietnam was controversial and that there were policy preferences that competed with Kissinger's. Moreover, the role of national security adviser that Kissinger occupied at that time was not tightly defined. It permitted the incumbent considerable latitude. For these and other reasons, Walker concluded that Kissinger's role in the prolonged bargaining process with North Vietnamese leaders exemplifies both "action indispensability" and "actor indispensability", as defined by Fred Greenstein.

The causal role of beliefs in decisionmaking was the subject of an exemplary study by Yuen Foong Khong. For reasons he indicates in his study, Khong decided to focus not on operational code beliefs, as had Stephen Walker, but rather on the role historical analogies play in policymaking. Khong confronts the nettlesome problem of how the analyst can decide whether historical analogies are used by policymakers merely to justify decisions they take or whether analogies actually have a causal impact on information processing that precedes decisions and on the subsequent choice of a policy option. Influenced by George's "Causal Nexus" paper which Khong slightly modifies and elaborates, he assesses the role of several historical analogies held by top-level U.S. policymakers at the critical junctures of the Vietnam crisis when they decided in February 1965 to initiate slow-squeeze graduated air attacks on North Vietnam and in July 1965 to expand substantially the deployment of U.S. ground combat forces.

Khong compares the role in these two decisions of three historical analogies as drawn by U.S. policymakers from previous crises: Munich, the Korean War, and Dien Bien Phu. He finds evidence in historical materials and from interviews that each of these analogies was present in the minds of U.S. policymakers. However, by means of an ingenious and complex research strategy that makes use both of the congruence method and process tracing, Khong concludes that the Korean analogy played the most influential role in the making of these U.S. decisions to use slowly graduated air attacks and, later, to put in large-scale ground forces.

Only a brief account of the essence of his rich analysis can be presented here. First, we note that Khong took note of and approved the distinction between diagnostic propensities and choice propensities that are implicit in the beliefs held by policymakers. In fact, Khong considerably elaborates the diagnostic function that an actor's beliefs can play by distinguishing six different but closely related diagnostic tasks. Be it noted, too, that although he labels all six as "diagnostic" they include "choice" propensities. In effect, therefore, Khong collapses the earlier distinction between diagnostic and choice propensities. Khong emphasizes that historical analogies are often used by policymakers to perform diagnostic tasks.

His six diagnostic tasks, briefly stated here, are (1) a definition of the new situation, facilitated by comparing it with a past one; (2) a judgment of what is at stake; (3) an implicit prescription as to how the new situation should be dealt with, i.e., the "solution" to the problem or type of policy response needed; (4) an assessment of the moral acceptability of the implied prescription; (5) an assessment of the likelihood of its success; (6) an estimate or warning of the dangers and risks of the implicit policy should it be adopted.

Khong labels this set of diagnostic tasks the "Analogical Explanation (AE)" framework. Khong converts these six diagnostic tasks into a set of general standardized questions, which become a central feature of his research design, questions to be asked of each of the historical analogies. (p. 62.) The "answers" to these questions provide the data requirements for comparing the role the analogies played in information processing. The study, therefore, constitutes an explicit example of the method of "structured, focused comparison." It is only by asking the same general questions of each case that systematic comparison becomes possible.

Khong establishes the implications that each of the three historical analogies had for these diagnostic tasks via process-tracing by a careful analysis of the available historical record and through his interviews with U.S. policymakers. He then employs the congruence method in order to assess the implications of each analogy's "answer" to the six diagnostic tasks for the various policy options that were being considered at the time.

The question for Khong, then, was which of the various policy options under consideration were consistent with the diagnostic implications of the analogy and which were not. Khong employs a version of the congruence method discussed earlier in this chapter for each of the historical analogies. We will reproduce here only the one he presents for the Korean analogy. (p. 139.)

Having established the version of the diagnostic tasks each analogy provided, Khong then looks for congruity between an analogy's diagnosis and the several policy options that were under consideration by policymakers. According to Khong's analysis, the Korean analogy's answer to the six diagnostic tasks was highly consistent with the policy decision actually taken in December 1964-February 1965 period to employ a "slow squeeze" version of graduated air attacks. But, it was also consistent with another policy option that called for heavy, continuous bombing that was not taken. This left unanswered for the moment why the lesser version of air attacks was chosen. A further challenge for analysis was raised by Khong's finding that the Munich analogy had exactly the same implications regarding these two policy options. Similar results emerged when the congruence method was used to compare the implications of the Korean and Munich analogies for the various policy options under consideration in July 1965.

Therefore, as Khong notes, both of these two historical analogies supported the case for either of the two options. But, Khong argues persuasively that the Korean analogy was more influential in the two decisions of February and July. He arrives at this conclusion by attributing decisive importance to the different way in which the two analogies characterized the sixth diagnostic task. Thus the Korean analogy carried with it a strong fear that resort to the stronger of the two options in February 1965 and in July 1965 would trigger once again, as in the Korean War, Chinese intervention. This particular "vision" of the Korean War was deeply etched in the historical memory of U.S. policymakers in 1965. Khong cites ample evidence from archival and interviews in support of this observation. In contrast, the "lessons" of Munich contained no such warning of the dangers of making a hard response to aggressions by the Japanese and Germans in the '30's. Although the Munich analogy could account, as did the Korean analogy, for the rejection of the non-intervention options in 1965, it was unable to suggest why, among the intervention options, the least hard one was selected. (p. 190.)

In this exemplary study, Khong has shown how an imaginative, disciplined research design that combines congruence and process-tracing methods can be used to deal with the extremely complicated, difficult task of distinguishing between the justificatory and information processing roles of historical analogies in foreign policy decisionmaking. His study is the most rigorous and disciplined treatment we know of for dealing with the theoretical and methodological issues associated with determining whether historical analogies are being used by policymakers to justify their decisions or whether the analogies play a genuine causal role in the information processing that leads to the decisions taken. Khong states his conclusions with appropriate cautions, noting a number of limitations and questions that remain, but he has succeeded in raising the discussion of this problem to a new level of analytical sophistication.

Use of the Congruence Method In Studies of Deductive Theories that "Black Box" Decisionmaking and Strategic Interaction

We stated earlier in this chapter that the congruence method is applicable in a variety of research projects. It is used not only in working with theories and general hypotheses that focus on the causal role of beliefs in decisionmaking, which was discussed in the preceding section, but also in working with deductive theories that "black box" decisionmaking and/or strategic interaction. Such studies employ a deductive theory to make predictions of outcomes in a single case or in a number of cases too small to permit statistical analysis. The research objective is often to "test" the performance of the deductive theory in question and/or to identify and bound its scope. If its performance proves to be inadequate-i.e. a number of incorrect predictions which can not be attributed to measurement errors-then the question arises whether the failed predictions indicate that the internal structure/contents of the theory is flawed and in need of reformulation. The possibility that this may be so suggests that the congruence method may be used not to test the performance of a deductive theory but to develop and refine the provisional theory that is known or suspected to be as yet inadequately formulated.

These uses of the congruence method have been applied in several different kinds of international relations research, both in studies that work with structural realist, rational choice and/or game theories, all of which black-box decisionmaking and strategic interaction, and also in studies that study internal decisionmaking processes and the dynamics of strategic interaction directly. Use of the congruence method (though it is not known by this name) also appears to be frequently employed in small n case studies which focus on theories of macro-political processes-such as, for example, Theda Skocpol's States and Revolution.

We shall discuss first several studies that illustrate the use of congruence in I.R. studies that black-box decisionmaking and/or strategic interaction. This can best be described by distinguishing several steps involved in such studies. The first step is to formulate a more specific version of the deductive theory, whether structural realism, rational choice, or game theory. This requires formulating a version of the general theory in question that deals more specifically with the phenomenon that is being studied.

elaborate - [Examples: Posen, Aggarwal, Bueno De Mesquita]

A second step is to single out historical cases considered to be relevant to the research objective of the study, i.e. those cases whose outcomes will enable the investigator to apply the congruence method to test, assess, or refine the theory's predictive and explanatory power. Selection of cases is a critical decision in research design and it will be discussed in detail in a later chapter. Suffice it to note here not merely the necessity to avoid "selection bias" but, equally important, to be clear whether a representative sample of the universe of cases of the phenomenon is necessary to satisfy the research objective and for an acceptable statement of the nature and scope of the findings. It is a common misunderstanding to assume or to insist that all small n studies must somehow satisfy the requirement of a representative sample and that the findings of a small n study must be capable of projecting a valid probability distribution of outcomes for the entire universe. This will be addressed in more detail in another chapter.

A third step is to match the predictions/expectations of the theory with the outcomes of the cases to ascertain whether they are consistent. If consistency is noted, then the investigator should not fail to address the several questions that were discussed earlier in this chapter regarding the causal significance that can be properly inferred from congruence. Outcomes not consistent with the predictions/expectations of the theory should receive special attention. How can one account for these discrepant cases. How can the possibility of measurement error be correctly assessed and how can that be distinguished from the possibility that the failed predictions call for a reexamination of the internal composition and logic of the deductive theory?

A fourth step is possible. Process-tracing may be applied to the cases (as by Aggarwal and Yoffie, but not by B.D.M.) (a) to help assess whether the consistency is spurious or causal, and/or (b) to identify an intervening causal process or causal mechanism that connects the deductive theory (independent variable) with the outcome; and/or (c) to provide an explanation for the "deviant" cases in which the theory failed to predict outcomes.

As the third and fourth steps emphasize, one should not be satisfied merely with a finding of consistency. Since the data required for adequate process-tracing are often not available, the other checks regarding the causal significance of consistency noted earlier should be undertaken.

Uses of structural-realist theory to predict outcomes is in special need of supplementary process-tracing and/or these other checks. It should be recognized that Kenneth Waltz's structural-realist theory is not a fully developed deductive theory. It is capable of making only very general probabilistic predictions since it lacks quantification of its probabilistic claims. As a result, a finding that outcomes of the cases are consistent with its probabilistic predictions is not acceptable evidence that a causal relationship exists unless other explanations for the outcomes are considered and eliminated. And even when support for some kind of causal relationship can be mustered, its precise nature remains to be judged, as was suggested in our earlier discussion of questions to be asked regarding whether the independent variable is either a necessary or sufficient condition for the outcome in question and how much it contributes to a full explanation of the outcome.

Another way of characterizing partial, incomplete deductive theories based on structural-realism is to note that they lack "operationalization"-i.e. the fine-tuning and specification of the theory that permits case-specific rather than general probabilistic prediction of outcomes for each of the cases examined. The only fully operationalized variant of a structural realist theory of which we are aware is that developed by Bruce Bueno de Mesquita in The War Trap. (add some discussion)

In striking contrast to The War Trap is the case that Achen and Snidal offered for rational deterrence theory. Absent in their argument for such a theory was any effort to formulate the level of specification and refinement of the theory needed to make concrete predictions and, therefore, the theory they provided was a quite primitive deductive theory that was in effect non-falsifiable. That is, any outcome-whether deterrence succeeded or failed in particular cases-would be "explainable" by the vague rational deterrence theory they espoused. Even more disconcerting in the argument these authors made on behalf of the superiority of a rational deterrence theory was the lack of awareness of the requirements of a full-fledged, operationalized deductive theory.

Another limitation of deductive theories, even when operationalized, is that they generally fail to identify or provide a satisfactory account of the causal mechanism that links the theory to the outcomes in question. Proponents of deductive theories based on rational choice or game theory might say that a causal mechanism is implicit in the internal logic of such deductive theories and needs no further explication or demonstration if the theory generates successful predictions. Thus, in such an argument the assumption that rational choice operates may itself be regarded as the causal mechanism. (The position of rational choice theorists on this issue is unclear. See the discussion of the Ferejohn and Satz article in the chapter on process-tracing and causal mechanisms.) Other theorists may not find this a satisfactory disposal of the desirability for theory development of identifying causal mechanisms and causal processes.

We stated earlier that the congruence method applies not only to theories that focus on the causal role of beliefs in decisionmaking but, as has now been discussed, also to deductive theories associated with the structural realist theory of international relations and more generally to rational choice and game theories. (In additional work we will consider the role of congruence in small n case studies that have as their objective the development and testing of macro theory.)

 

CIAO home page