I will open this post with two apologies: First, this is going to be on a (seemingly) nerdy and technical subject (though one that non-technical folks who read statistical papers on corruption really need to understand). Second, this post is going to return to a subject that I wrote about two years ago, without adding much, except perhaps different examples and somewhat more intemperate language. But the issue is an important one, and one that I think needs more attention, both from the people who produce quantitative empirical studies on corruption and those who consume those studies.
The issue concerns a particular statistical technique sometimes deployed to figure out whether variable X (say, absence of corruption) causes variable Y (say, economic growth), when it’s possible that the correlation between X and Y might arise because Y causes X (or because some third factor, Z, causes both X and Y). The technique is to find an “instrumental variable” (an IV for short). To be valid, the IV must be sufficiently correlated with X, but could not conceivably have any affect on Y except through the IV’s casual effect on X. The actual estimation techniques used in most cases (usually something called “two-stage least squares”) involve some additional statistical gymnastics that I won’t get into here, but to get the intuition, it might help to think about it this way: If your instrumental variable (IV) correlates with your outcome variable (Y), and there’s no plausible way that your IV could possibly affect Y except by affecting your proposed explanatory variable (X), which then has an effect on Y, then you can be more confident that X causes Y. But for this to work, you have to be very sure that the IV couldn’t possibly affect Y except through X. This assumption cannot be tested statistically–it can only be evaluated through common sense and subject-area expertise.
OK, if you’ve slogged your way through that last paragraph, you may be wondering why this is important for corruption research, and why I’m so exercised about it. Here’s the problem: Continue reading