Invalid Instrumental Variables in Corruption Research: A Lament

A while back, I posted a critical commentary on Paulo Mauro’s widely-cited paper purporting to show that corruption lowers foreign investment and growth. My criticisms focused on Mauro’s use of a statistical technique called “instrumental variables” (or “IV”) analysis, which — when done properly — can help figure out whether a hypothesized explanatory variable actually causes an outcome of interest, or whether instead the observed statistical correlation is due to the fact that the alleged outcome variable actually influences the proposed explanatory variable (“endogeneity” or “reverse causation”).  But an IV analysis requires making certain strong and untestable assumptions about the relationships between the variables.  If those assumptions are wrong, the conclusions one draws about causation will be unsound (not necessarily wrong, but not worthy of credence on the basis of the analysis).

This may seem like an issue that only stats nerds should care about, but I actually think it’s important that other researchers, activists, and policy advisers understand the basics of the technique and how it can go wrong (or be misused).  I say this because a surprisingly large amount of the research on the causes and consequences of corruption — research that is often cited, individually or collectively, in discussions of what to do about corruption — relies on this technique. And, I hate to say it, but much of that research uses IV analysis that is clearly inappropriate.

I’ve been thinking about this issue recently because I’ve been going through the literature on the relationship between democracy and corruption for a paper I’m writing, and this issue crops up a lot in that literature. But I’ve seen essentially the same problems in lots of other research on corruption’s causes and consequences, so I’m reasonably confident that this is not an isolated problem.

Let me say a bit more about the essence of the statistical problem, how IV analysis is supposed to solve it, and why much of the IV analysis I’ve seen (focusing on the democracy-corruption context) is not worthy of credence:

Here’s the challenge: Suppose we want to figure out whether democracy reduces corruption, using cross-country data.  Even putting aside all the other concerns we might have about definition and measurement, we’ve got a problem: Suppose we find that there’s a statistically significant (negative) correlation between democracy and corruption. That doesn’t necessarily mean that democracy reduces corruption. It might mean that corruption causes countries to be less democratic (perhaps because corruption facilitates the subversion of democratic institutions, or because corrupt democracies are more vulnerable to anti-democratic coups). Or perhaps both democracy and (lower) corruption are the product of some underlying process that causes them to be associated with one another, even if democracy doesn’t lower corruption (indeed, even if democracy actually increases corruption, all else equal).

We can solve the statistical problem if we can find an “instrumental variable” (or, more succinctly, an “instrument”). To be valid, the instrument must (1) have a direct causal effect on the main explanatory variable (here, democracy), but (2) must not be correlated with the outcome variable (here, corruption), except through its effect on the main explanatory variable. That latter condition is called the “exclusion restriction.” Importantly, satisfaction of the exclusion restriction cannot be tested statistically. Rather, one needs to be confident, based on substantive knowledge of the area and general common sense, that the instrument couldn’t have any effect on the outcome variable except through its effect on the main explanatory variable.

So, the main question one should always ask when confronted with statistical research that uses IV analysis is whether the exclusion restriction is satisfied. (There are other questions as well, but this is usually the most important one.) And remember, this can’t be tested statistically, so don’t get distracted when a study’s authors start throwing out technical terminology and running fancy-sounding tests. You don’t need to know any math to evaluate whether the exclusion restriction holds — again, it’s a matter of substantive knowledge and common sense.

OK, now that we’re clear on that, let me run through some of the instruments that appear in the scholarly literature on the causal effect of democracy on corruption. Again, the key assumption for all of these studies is that the proposed instrument could not possibly have any effect on corruption, except through its effects on democracy:

  • Fraction of the population that speaks a European language (see here). [The implicit assumption is that European influence is associated with more democracy, but could not have any impact on corruption levels through any other channel.]
  • Whether the country is a former British colony (see here).
  • Fraction of Protestants in the population (see here).
  • Education, measured by secondary school enrollment rates (see here).
  • Latitude (see here, here, and here).
  • Whether the country has ever fought a war with a democracy (see here).

Is it plausible that any of these instruments satisfy the exclusion restriction? To my mind, the answer is so clearly no that I’m somewhat puzzled that the authors and journal editors apparently disagreed. I don’t think it’s all that difficult to think of plausible reasons why each of these variables might influence corruption, other than through the effect on democracy. European (or, more specifically, British) heritage may affect a range of political and legal institutions, including those that have direct effects of corruption independent of their effects on democracy. Latitude might have such effects as well. Many of these variables, as well as a country’s religious demographics, may also implicate cultural differences, or may proxy for other variation in historical experience, that may influence corruption levels. And it’s also worth emphasizing that all of these variables — especially but not exclusively latitude — may have direct effects on a country’s economic performance, which may in turn affect corruption levels. As for the last item on the list — whether the country has ever fought a war against a democracy — this is causally posterior to, not prior to, whether the country is itself a democracy, and so is invalid as an instrument. (While many people think that being a democracy makes a country less likely to fight wars with other democracies, no one thinks that whether a country has fought a war with another democracy influences whether that country is, or becomes, democratic.)

Again, while I’ve focused on papers discussing the democracy-corruption link (because that happens to be what I’ve been reading up on recently), one sees this same problem in countless other lines of corruption-related research as well. For example, in trying to assess the causal impact of press freedom on corruption (another area where there’s plausibly causation running in both directions), researchers have used as instruments:

  • Whether the country has a “common law” legal system (see here).
  • The degree of ethnic/linguistic fractionalization in the country (see here).
  • Whether the country is democratic (see here).
  • The percentage of the population speaking a European language (see here).

These instruments would only be valid if we thought that there was no conceivable way that a country’s legal system, ethno-linguistic heterogeneity, level of democracy, or European influence could possibly affect its (perceived) corruption level, except insofar as those variables influence the country’s level of press freedom. Is there anyone out there who thinks that any of those assumptions is remotely plausible? Anyone?

This may seem like nitpicking, but it’s really not.  If we’re serious about developing “evidence-based” approaches to anticorruption — as many in the policy community insist — then we’ve got to be scrupulous about the evidence. In my view, it would be better if academics just reported correlations and acknowledged that they cannot, without more, rigorously establish causation than to use fancy-sounding statistical tests that make it seem like they/we know more than we actually do from the quantitative analysis.

6 thoughts on “Invalid Instrumental Variables in Corruption Research: A Lament

  1. Clearly such poor instrumental variables undermine the validity of any claim to causality. However, can anyone think of any better ones? This is revealing of the intimate and complex relationship between corruption and democracy. Whilst instrumental variables have been proven to be effective for revealing the direction of causality in loose or simple relationships (see Mooij 2014 for example). And useful regarding clearly distinct social phenomena (such as high school employment and educational attainment). The fact that no instrumental variable to the democracy – corruption correlation seems to exist suggests that the two variables are inseparable, two sides of the same coin. That is to say,corruption is simply a lack of democracy and vice versa. Although this would suggest that the two are directly inversely correlated (which is not true), could this not be due to a difficulty in defining democracy quantitatively? In this sense, could it be that reductionist efforts to separate democracy and corruption in order to find some relation are counterproductive? Getting too “evidence based” may result in us loosing sight of the essence of our subject material, that is, fighting corruption IS strengthening democracy and vice versa.

    • I actually think I disagree with this pretty strongly, for a couple of reasons.

      First, the fact that we cannot find a good instrument to identify the causal effect of variable X on variable Y does NOT mean that X and Y are “inseparable, two sides of the same coin.” All it means is that we cannot necessarily interpret any observed statistical correlation as causal. To illustrate using an example where this is more obvious, consider the relationship between democracy and economic growth. There’s a huge endogeneity problem here, and it’s hard to find good instruments for democracy that would allow us to isolate the causal effect of democracy on growth (if there is one). But I don’t think anyone would sensibly conclude from this that “lack of economic growth is simply a lack of democracy and vice versa.”

      Second, even if we put aside the issue about causation for the moment and consider only correlations, it is simply not true that democracy is strongly and consistently correlated with low corruption. Countries that have had robust democracies for a very long time (mostly though not exclusively wealthy countries in Western Europe, North America, and Oceania) do seem to score substantially better on the various corruption perception indexes that we have. But if we exclude those super-established democracies, there doesn’t seem to be any robust correlation between democracy and (perceived) corruption in the rest of the data. So if anything, the correlations appear mostly inconsistent with he claim that “fighting corruption IS strengthening democracy and vice versa.” Of course, we don’t know for sure, because it’s so hard to tease out causality (that’s the main point of the original post).

      Now, you suggest that maybe the reason we don’t see a direct inverse correlation between democracy and corruption is that we haven’t been able to quantify democracy in the right way. Maybe. (One could say the same thing about corruption.) But (a) the papers on this topic use a range of democracy measures, many of which seem fairly plausible, and (b) I start to get nervous when our reaction to a failure to find what we expected (hoped?) to find is to declare that we’re just not measuring the variables in the right way. That’s sometimes the case, sure. But we always need to be careful that we’re not (to paraphrase Coase’s great line) torturing the data until it confesses.

      • Agreed, “two sides of the same coin” is possibly an exaggeration, but I’d maintain that the fact that we cannot necessarily interpret any observed statistical correlation as causal suggests that X and Y are so closely related (as well as spuriously associated with lots of other variables) as to be inseparable for all practical purposes. Despite it being a cliché, possibly the chicken / egg dilemma fits better.

        On your second point, is it really fair to exclude these super established democracies when looking for the democracy / corruption correlation? In fact, their substantially better score on the various corruption perception indexes may be able to tell us something about what makes them so democratically robust. Many of the traditional measures of democracy to me seem far from plausible; the absurd instrumental variables such as colonial heritage, latitude and fraction of Protestants in the population that you listed in the above post are a case in point. However, respect for human rights is increasingly considered an essential element of democracies. This is great as it can also be measured reasonably accurately and correlated with corruption. Higher levels of corruption are related to worse records of human rights protection (Todd Landman and Carl Jan Willem Schudel, Corruption and Human Rights 2007) and Daniel Kaufmann (Human Rights and Governance) found a significant correlation between the degree of civil and political rights violations and the prevalence of corruption.

        As such, I do believe that a refined, possibly human rights based, definition of democracy will reveal an increasingly close correlation with corruption. However, back to the original topic of the post, this doesn’t get us any closer to establishing causality which is probably the trickiest thing to prove even in the field of hard science. Hence, when it comes to the complexity of corruption and democracy, I wonder if attempting to “unpack the causal package” will ever produce meaningful results. So why try? When the two are widely accepted as things we want to encourage / discourage, isn’t correlation good enough?

        • Thanks for the reply. I very much enjoy exchanges like this, and your comments are useful and thought-provoking. Some quick, off-the-cuff reactions:

          First, on what we know about the correlation between democracy and corruption: I think we both agree, roughly speaking, on what we see in the data. Countries that score very high on democracy indexes (e.g. Polity, Freedom House) also tend to score very well on the control-of-corruption indexes (CPI, WGI). Beyond that, though, there’s not very much if any correlation at all, and there’s a huge amount of variance in the data. So, other than the clustering at the top, it’s not true that these variables “are so closely related … as to be inseparable for all practical purposes.”

          But, you say, isn’t that association at the very top important? Yes, absolutely. But it may still be important to understand why it exists. Is it because democractization (eventually) reduces corruption? Because low corruption is necessary for democracy to take hold and flourish? Because the corruption perception indexes are biased, such that the international “experts” are inclined to give good scores to long-established democracies in the West? Because the democracy indexes incorporate absence of corruption as a constituent element of democracy? (This is, by the way, the case for the Freedom House Political Rights Index, which is why it’s a mistake to use that as a measure of democracy when trying to assess the impact on corruption.) We’ve made some progress in trying to answer some of these questions, but not much, and the original point of the post is that the IV techniques that have been deployed have not been very helpful in addressing this problem.

          On measurement, just to be clear, none of these studies use colonial heritage, percentage of Protestants, latitude, etc., as _measures_ of democracy. Usually democracy is measured with something like the Polity score, the Freedom House score, or some more objective factor like electoral contestation or frequency of change in power brought about by elections. These other factors (colonial heritage, latitude, etc.) are proposed as _instruments_ for democracy, i.e. variables that are (causally) associated with democracy, but that (allegedly) don’t have any casual impact on corruption other than through the effect on democracy. A small point, but I think it’s important to be careful with the terminology here.

          Now, you next raise the provocative question whether human rights protection should be incorporated into the definition of democracy for purposes of these studies. I guess I’d say it depends on the question we’re asking. If the question is whether regular elections significantly reduce corruption, then I think I’d probably say that a “thinner” conception of democracy was more appropriate. Indeed, if lower corruption leads to better human rights protection, and human rights protection is baked into the definition of democracy, then the possibility of mistakenly concluding that democracy reduces corruption from the correlational data would be even greater. On the other hand, if our question is whether “liberal” societies–those that not only have regular elections but that also protect human rights–reduce corruption, then absolutely you’d want to use a “thicker” conception of democracy that incorporates human rights. But then we still have the problem of sorting out causality.

          But, you say, why do we even care so much about sorting out causality? Corruption is bad, democracy is good, so let’s fight the former and promote the latter, and not worry so much about what causes what. To a certain point, I’m with you. But, as a social scientist, I do think it’s important to understand how these relationships actually work. And from a more pragmatic perspective, I think it’s important for us to have a realistic (or as realistic as possible) understanding of the degree to which democratization will help (or hurt) progress on fighting corruption. My impression (and here I admit this is not grounded in any solid data, just casual observation) is that in some cases in the recent past reformers were overly optimistic about the degree to which democratization would reduce corruption, partly because of the correlations we’re talking about. There was a tendency, in some cases, to treat corruption as a transitional phenomenon. But in many new/partial democracies, the corruption problem stayed bad or even got worse, in some cases threatening the democratization project itself. Now, there’s an open question about whether, with enough time, democracy will lead to a steady reduction in corruption (at least for most countries), or whether this is not the case, and different and more aggressive action is needed to address the corruption problem (including forms of action that might make pro-democracy types very uncomfortable). There are also questions, germane to a different set of countries, about whether aggressive action to clean up corruption should precede democratization. I have my own views on these questions, and I suspect you do as well, but I also think that my views (and yours) can and should be influenced by better evidence on the relationship between democracy and corruption. And that’s why I think it matters that we do the best we can to figure out the answer–or if we can’t, that we’re honest about the fact that we don’t know what we don’t know.

          • Thanks for getting back to me, I’m learning a lot from this exchange. I also feel like we are arriving at some kind of synthesis. The measures of democracy are certainly a welcome advance on more dated ways to understand democracy (religion, latitude, and language were just a few examples I threw in to demonstrate how far off this can get). However, it is worth noting that it is only civil and political rights that are included. Is this a hangover from the cold war? Although socio-economic rights are still more controversial, I think we can agree that they are becoming more and more accepted as equally important as their civil-political cousins. In fact, the ECHR is swiftly moving towards indivisibility between the two sets of rights. Could this explain the stronger correlation between equality and corruption? And coming back to democracy and corruption, this could explain how the liberal but highly unequal democracies of Latin America and the Indian sub-continent throw our readings off. The fact that Freedom House include corruption in their democracy index is also an interesting and telling point.

            As you can see, I am persisting on the misinterpretation of democracy as the explanation for its loose at best correlation with corruption. This, I’ll admit, comes from my conviction of what I think democracy is (or at least should be) and herein probably lies the reason for people’s willingness to torture the data; when people sense that there should be a correlation, we’ll go to great lengths to find one. Is it dangerous to inject the social sciences with our subjective beliefs? This is a whole new question.

            The point you make in your last paragraph regarding forms of action that might make pro-democracy types very uncomfortable I find especially pertinent. The need to investigate causality can’t be denied yet I can’t seem to shake the feeling that reductionist techniques may lead us further and further into the squaring of circles. Despite today’s bias for large scale experimental designs based on quantitative data, I wonder if applying qualitative and holistic methodology might render more satisfactory results? However, this raises the question of “satisfactory for who?” which brings us back to defining democracy, something essential to the question which better evidence may not be able to help us solve. Either way however, as you say the most important is honesty on what we do and don’t know. Cheers!

  2. Matt, the points you raise I find about the misconception/misuse of IVs are justified.

    I’d like to stress a thing or two, purposely avoiding going into too much detail.
    It seems that the general readers of scientific papers that use IVs are easily neglecting the actual meaning of what the IV’s coefficient is indicating. Instrumental Variables are the standard technique to handle omitted variables, measurement error, and reverse causality. An instrument should be valid (uncorrelated with the error term) and informative (correlated with the endogenous regressor). Finding good instruments is the major challenge in applied
    work. Strength can be tested, validity cannot. If we assume that causal effects differ by individuals, IV estimates a local average treatment effect (LATE) for compliers.

    In general, one cannot simply compare the coefficient of the original variable and its IV in terms of its magnitude. The reason is that IVs indicate the treatment effect for the marginal participant, giving us the local average treatmend effect (LATE), whereas regular estimations of a linear model give us the average treatment effect (ATE). Now, if one can plausibly claim that these groups are almost identical, we don’t have a problem. However, let me use the famous Vietnam Draft Lottery example:
    IV estimates of effects of military service using the draft lottery estimate the effects of military service on men who served because they were draft-eligible, but would not otherwise have
    served. This obviously excludes volunteers and men who were exempted from military service for medical reasons, but it includes men for whom draft policy was binding.
    Consequently, we now have to change the interpretation of the estimator (for more information on ATE vs LATE and other IV related things, see: http://www.stata.com/meeting/dcconf09/dc09_nichols.pdf ).

    This said, I have a lot of sympathy for the IV approach and one that I agre with in general is the use of Alesina’s linguistic fractionalization index as an IV for corruption (I was surprised to see you mentioning that this index is used to instrument democracy and linking to Chowdhury’s paper. Usually, ELF is used to instrument corruption (as he mentions on p. 95), so I would’ve expected some extreme correlation, which does not show up). However, in my mind, I always discount the coefficients produced by IVs and rather look at the sign (and whether the magnitude changes enormously, which is sometimes an indication that something fishy is going on). If the sign is the same, then the claim is at least plausible, if it has passed the initial test one should apply before even starting: is this really a reasonable instrument?

    After all, believing in IVs is like religion. If a particular IV is used and supported more often by more prominent people, it is probably easier to go along with it and conform.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s