I will open this post with two apologies: First, this is going to be on a (seemingly) nerdy and technical subject (though one that non-technical folks who read statistical papers on corruption really need to understand). Second, this post is going to return to a subject that I wrote about two years ago, without adding much, except perhaps different examples and somewhat more intemperate language. But the issue is an important one, and one that I think needs more attention, both from the people who produce quantitative empirical studies on corruption and those who consume those studies.
The issue concerns a particular statistical technique sometimes deployed to figure out whether variable X (say, absence of corruption) causes variable Y (say, economic growth), when it’s possible that the correlation between X and Y might arise because Y causes X (or because some third factor, Z, causes both X and Y). The technique is to find an “instrumental variable” (an IV for short). To be valid, the IV must be sufficiently correlated with X, but could not conceivably have any affect on Y except through the IV’s casual effect on X. The actual estimation techniques used in most cases (usually something called “two-stage least squares”) involve some additional statistical gymnastics that I won’t get into here, but to get the intuition, it might help to think about it this way: If your instrumental variable (IV) correlates with your outcome variable (Y), and there’s no plausible way that your IV could possibly affect Y except by affecting your proposed explanatory variable (X), which then has an effect on Y, then you can be more confident that X causes Y. But for this to work, you have to be very sure that the IV couldn’t possibly affect Y except through X. This assumption cannot be tested statistically–it can only be evaluated through common sense and subject-area expertise.
OK, if you’ve slogged your way through that last paragraph, you may be wondering why this is important for corruption research, and why I’m so exercised about it. Here’s the problem:
Lots and lots of quantitative research on corruption (especially research relying on cross-country comparisons) is beset by questions about causation. To take one example, many researchers and policymakers want to understand corruption’s impact on various economic outcomes, such as income, growth, equality, investment, etc. Lots of papers find a negative correlation between measures of (perceived) corruption and these outcomes. For example, there’s a consistent and strong negative correlation between perceived corruption and per capita GDP. The problem is that we don’t know for sure whether this correlation implies that corruption causes lower national income. After all, low national income may, and probably does, worsen perceived corruption through a variety of channels. Many researchers therefore turn to instrumental variables to try to isolate corruption’s impact on income (or other economic variables).
The problem is that the IVs used in these studies are almost always invalid. The most popular “instruments” for corruption, in cross-country studies of corruption’s impact on economic outcomes, are probably:
- Ethno-linguistic heterogeneity
- Distance from the equator
- “Legal origin” (that is, whether the country’s legal system is based mainly on British common law, French Civil Law, German Civil law, or one of a handful of other systems)
- “Settler mortality” (the death rate of European settlers in the colonial period)
Now, it does seem to be the case that these variables do correlate with conventional measures of perceived corruption (with the possible exception of ethnic heterogeneity). The problem is with the notion that it’s inconceivable that any of these variables could possibly have any correlation with economic outcomes (income, growth, inequality, etc.) except through the channel of corruption. That idea is, on its face, too silly to be taken seriously by anyone, let alone by fancy professors publishing fancy articles in fancy academic journals. How could such smart people consistently make such an obvious mistake? I think the answer has to do with the history of the field, and the unfortunate tendency of researchers and journal editors to adopt the attitude of “If someone else used this instrument, for the same thing or something vaguely related, then it must be OK.”
With respect to the “ethno-linguistic heterogeneity” instrument, I suspect the problem has its roots in the fact that this instrument was used in an influential early paper–a paper that, while important for the field, was just wrong on this particular point.
As for the other three popular IVs (latitude, legal origin, settler mortality), these have been used in important (though controversial) research on the impact of institutions writ large on economic outcomes. To simplify a bit (using one of these three IVs as an illustration), when researchers found that settler mortality over a century ago is correlated with contemporary economic performance, even when controlling for modern geographic and disease conditions, these researchers concluded that the causal channel must be through institutional quality: Colonial-era settler mortality (they reasoned) must have had an impact on whether the Europeans established “settler colonies” or instead just tried to extract resources; this in turn had an effect on whether the colonial power set up “good” (participatory) institutions or “bad” (extractive) institutions in the colony; these colonial-era institutions affected post-independence institutions, which in turn affected modern economic performance. So, the argument continues, the correlation between broad measures of “institutional quality” and economic performance in the modern data is due in part to the causal impact of the former on the latter, because there’s no other possible channel through which colonial-era settler mortality could be correlated with modern economic performance (after one controls for modern geographic and health conditions). Again, that argument is contested, but it’s at least plausible, in part because “institutional quality” is defined so broadly and generally. (Indeed, one of the criticisms of this line of research is that it can’t tell us what kinds of institutional features really matter.) But corruption researchers seem to have picked up on this “settler mortality” variable, and other variables like latitude, and attempted to use them as instruments for corruption specifically. On top of that, so far as I can tell most of these researchers don’t bother controlling for modern conditions (such as disease burden). There’s a similar problem with the legal origin variable. It was initially suggested as a factor that (partly) explains differences in modern legal systems, but couldn’t plausibly be correlated with modern economic outcomes except through the impact of legal origin on the development of the modern legal system. That argument, controversial in its own domain, does not provide any basis for presuming that the correlation between legal origin and modern economic outcomes could only arise because legal origin had an effect on corruption specifically.
Some scholars who use these or similar instruments seem unaware that there’s any problem. Others recognize that there’s a problem, but then just brush it aside, often invoking the use of the same instruments in other papers as a justification. This is occasionally quite explicit. Indeed, I was recently reading a paper that purported to assess the impact of corruption on economic growth, and that paper used, as instruments for corruption, all four of the variables mentioned above (ethnic fractionalization, legal origin, settler mortality, distance from the equator). To his credit, the author noted that “it is questionable whether [these instruments] are relevant and exogenous because they may directly affect economic growth.” What he should have said, at that point, is that these instruments are therefore invalid and that he wouldn’t use them. What he said instead–and I found this just shocking–was this: “However, use of [these] variables as instruments of corruption in growth equations by other authors gives them reliability and credibility.” Um, no, no it doesn’t. That single sentence for me captured one of the biggest things wrong with the state of quantitative research on corruption today.
What’s the main takeaway message I want to convey in this post? I suppose there are two messages, for different audiences:
- For those of you who do quantitative research on corruption: Please, please, please stop using clearly invalid instruments, and don’t try to justify or excuse the use of bad instruments on the grounds that other people have used them too. (“If every other economist you knew jumped off a bridge….”)
- And for those of you who read, and sometimes cite or rely on, quantitative corruption research: Treat any paper that uses an instrumental variable technique with extreme skepticism, and scrutinize the instruments carefully. Remember, the question whether an instrument could only be correlated with the outcome variable (say, growth) is through its causal impact on the alleged explanatory variable (say, corruption) isn’t a technical issue, and you can evaluate it just as well as any Ph.D. statistician. If that assumption doesn’t make sense to you, then don’t believe the results, no matter how much fancy math the authors throw at you. And if the authors of the paper seem unaware that clearly invalid instruments are clearly invalid, this is a sign that the authors don’t really know what they’re doing, and you might be generally skeptical about everything else in the paper.
OK, end of rant (for now). Thanks for your patience.