A while back, I posted a critical commentary on Paulo Mauro’s widely-cited paper purporting to show that corruption lowers foreign investment and growth. My criticisms focused on Mauro’s use of a statistical technique called “instrumental variables” (or “IV”) analysis, which — when done properly — can help figure out whether a hypothesized explanatory variable actually causes an outcome of interest, or whether instead the observed statistical correlation is due to the fact that the alleged outcome variable actually influences the proposed explanatory variable (“endogeneity” or “reverse causation”). But an IV analysis requires making certain strong and untestable assumptions about the relationships between the variables. If those assumptions are wrong, the conclusions one draws about causation will be unsound (not necessarily wrong, but not worthy of credence on the basis of the analysis).
This may seem like an issue that only stats nerds should care about, but I actually think it’s important that other researchers, activists, and policy advisers understand the basics of the technique and how it can go wrong (or be misused). I say this because a surprisingly large amount of the research on the causes and consequences of corruption — research that is often cited, individually or collectively, in discussions of what to do about corruption — relies on this technique. And, I hate to say it, but much of that research uses IV analysis that is clearly inappropriate.
I’ve been thinking about this issue recently because I’ve been going through the literature on the relationship between democracy and corruption for a paper I’m writing, and this issue crops up a lot in that literature. But I’ve seen essentially the same problems in lots of other research on corruption’s causes and consequences, so I’m reasonably confident that this is not an isolated problem.
Let me say a bit more about the essence of the statistical problem, how IV analysis is supposed to solve it, and why much of the IV analysis I’ve seen (focusing on the democracy-corruption context) is not worthy of credence:
Here’s the challenge: Suppose we want to figure out whether democracy reduces corruption, using cross-country data. Even putting aside all the other concerns we might have about definition and measurement, we’ve got a problem: Suppose we find that there’s a statistically significant (negative) correlation between democracy and corruption. That doesn’t necessarily mean that democracy reduces corruption. It might mean that corruption causes countries to be less democratic (perhaps because corruption facilitates the subversion of democratic institutions, or because corrupt democracies are more vulnerable to anti-democratic coups). Or perhaps both democracy and (lower) corruption are the product of some underlying process that causes them to be associated with one another, even if democracy doesn’t lower corruption (indeed, even if democracy actually increases corruption, all else equal).
We can solve the statistical problem if we can find an “instrumental variable” (or, more succinctly, an “instrument”). To be valid, the instrument must (1) have a direct causal effect on the main explanatory variable (here, democracy), but (2) must not be correlated with the outcome variable (here, corruption), except through its effect on the main explanatory variable. That latter condition is called the “exclusion restriction.” Importantly, satisfaction of the exclusion restriction cannot be tested statistically. Rather, one needs to be confident, based on substantive knowledge of the area and general common sense, that the instrument couldn’t have any effect on the outcome variable except through its effect on the main explanatory variable.
So, the main question one should always ask when confronted with statistical research that uses IV analysis is whether the exclusion restriction is satisfied. (There are other questions as well, but this is usually the most important one.) And remember, this can’t be tested statistically, so don’t get distracted when a study’s authors start throwing out technical terminology and running fancy-sounding tests. You don’t need to know any math to evaluate whether the exclusion restriction holds — again, it’s a matter of substantive knowledge and common sense.
OK, now that we’re clear on that, let me run through some of the instruments that appear in the scholarly literature on the causal effect of democracy on corruption. Again, the key assumption for all of these studies is that the proposed instrument could not possibly have any effect on corruption, except through its effects on democracy:
- Fraction of the population that speaks a European language (see here). [The implicit assumption is that European influence is associated with more democracy, but could not have any impact on corruption levels through any other channel.]
- Whether the country is a former British colony (see here).
- Fraction of Protestants in the population (see here).
- Education, measured by secondary school enrollment rates (see here).
- Latitude (see here, here, and here).
- Whether the country has ever fought a war with a democracy (see here).
Is it plausible that any of these instruments satisfy the exclusion restriction? To my mind, the answer is so clearly no that I’m somewhat puzzled that the authors and journal editors apparently disagreed. I don’t think it’s all that difficult to think of plausible reasons why each of these variables might influence corruption, other than through the effect on democracy. European (or, more specifically, British) heritage may affect a range of political and legal institutions, including those that have direct effects of corruption independent of their effects on democracy. Latitude might have such effects as well. Many of these variables, as well as a country’s religious demographics, may also implicate cultural differences, or may proxy for other variation in historical experience, that may influence corruption levels. And it’s also worth emphasizing that all of these variables — especially but not exclusively latitude — may have direct effects on a country’s economic performance, which may in turn affect corruption levels. As for the last item on the list — whether the country has ever fought a war against a democracy — this is causally posterior to, not prior to, whether the country is itself a democracy, and so is invalid as an instrument. (While many people think that being a democracy makes a country less likely to fight wars with other democracies, no one thinks that whether a country has fought a war with another democracy influences whether that country is, or becomes, democratic.)
Again, while I’ve focused on papers discussing the democracy-corruption link (because that happens to be what I’ve been reading up on recently), one sees this same problem in countless other lines of corruption-related research as well. For example, in trying to assess the causal impact of press freedom on corruption (another area where there’s plausibly causation running in both directions), researchers have used as instruments:
- Whether the country has a “common law” legal system (see here).
- The degree of ethnic/linguistic fractionalization in the country (see here).
- Whether the country is democratic (see here).
- The percentage of the population speaking a European language (see here).
These instruments would only be valid if we thought that there was no conceivable way that a country’s legal system, ethno-linguistic heterogeneity, level of democracy, or European influence could possibly affect its (perceived) corruption level, except insofar as those variables influence the country’s level of press freedom. Is there anyone out there who thinks that any of those assumptions is remotely plausible? Anyone?
This may seem like nitpicking, but it’s really not. If we’re serious about developing “evidence-based” approaches to anticorruption — as many in the policy community insist — then we’ve got to be scrupulous about the evidence. In my view, it would be better if academics just reported correlations and acknowledged that they cannot, without more, rigorously establish causation than to use fancy-sounding statistical tests that make it seem like they/we know more than we actually do from the quantitative analysis.