Like many people out there, I’m both a huge fan of Nate Silver–and the rigorous quantitative approach to election forecasting that he popularized–and at the same time quite disappointed in his FiveThirtyEight website, where the posts (especially those not by Silver himself) often seem to be slapdash efforts by people who have a smattering of statistical knowledge but don’t really know much about the topics they’re writing about. A depressing recent example, germane to this blog, is a post from last week entitled “It Only Seems Like Politics Is More Corrupt.” I normally wouldn’t bother to comment on something so slight here (especially because the post appears to have been written by an intern, and I generally try to avoid beating up on people who are just starting out), but many of the errors in analysis are both sufficiently elementary, and sufficiently common in discussions of corruption trends in other contexts (and by people with much more experience and therefore less of an excuse), that it’s worth taking a moment to explain what’s wrong.
A quick summary: The author cites recent U.S. Gallup poll data showing that the percentage of Americans who believe that “corruption is widespread” throughout the government in the United States has increased from about 60% in 2006 to a little over 75% in 2013. However, the author argues, the data doesn’t support the idea that corruption in the U.S. has actually worsened. To support that claim, she points to two other data sources:
- U.S. Department of Justice statistics from 1992-2012 show that the number of cases prosecuted by the DOJ’s Public Integrity Section (as well as the number of convictions and number of cases awaiting trial) appears to have declined, or at least hasn’t increased.
- The U.S. score on the Transparency International Corruption Perception Index (CPI) hasn’t changed very much between 1995 and 2013 (although there’s concededly a slight downward trend).
Do these two data sources disprove the idea that corruption in the U.S. has worsened over the last eight years, or more generally that the U.S. public’s perception of corruption is inaccurate? In a word, no. There are so many elementary conceptual and statistical errors in this analysis, it’s difficult to know where to begin, but let me take a shot at cataloguing the most egregious problems:
Before getting to the data problems, it’s worth pointing out that there are some problems at the level of concept and definition, which make the interpretation of the data problematic:
- There is never any attempt made to figure out whether these different data sources are measuring the same kind of “corruption.” As many, including many contributors to this blog, have pointed out, corruption has a range of meanings. Who knows if the Americans surveyed in the Gallup polls were only thinking of the sorts of “corruption” that the US DOJ would prosecute, as opposed to a more capacious, expansive definition of “corruption”?
- Even if we assumed that these different data sources were working with a common understanding of “corruption,” there’s a separate question of how we measure corruption being “widespread.” Does this just mean number of incidents? Does the magnitude of the corruption matter? What if the total number of incidents of corruption hadn’t changed, but the corruption was at a higher level and involved more money or greater distortions of public policy?
But even if we put aside those issues, there are some big problems with the data. Because the post’s author looks to two separate datasets, let me say a few words on each.
First, does the DOJ data, which seems to show an apparent decline (or at least no major increase) in corruption prosecutions and convictions between 1996 and 2013, show that U.S. corruption has not worsened? No, it doesn’t:
- The pictures included with the FiveThirtyEight post seem to suggest very different trends for the public opinion data and the law enforcement data, but in fact the two data sets are on different time scales. If one restricts attention to the time period where the two data sources overlap (2006-2012), then it is far from clear that the two trend lines differ all that much. Just from eyeballing the graphs in the post, it looks like there’s a slight uptick in corruption perceptions from Gallup from 2006-2008, but from 2008-2012 the percentage of Americans saying government corruption is widespread hovers at or just below 75%. In the DOJ data, there does appear to be a slight upward movement in both cases and convictions between 2006 and 2008, with a notable spike in 2008, and not much movement afterward, though there does seem to be a very slight downward trend in corruption charges, along with a slight upward trend in the number of convictions.
- Law enforcement data is a notoriously unreliable measure of corruption levels, because cases and convictions reflect not only underlying rates of criminal activity, but also the effectiveness of law enforcement at detecting and penalizing this activity. The post acknowledges in passing that “the number of prosecutions might not be indicative of the overall problem,” before turning to the Transparency International data for “a more all-encompassing look at the issue” — but that hardly excuses the claim, in the preceding sentence, that the DOJ data shows that “the [corruption] problem certainly hasn’t grown” (my emphasis). I find it a little odd that a website supposedly devoted to rigorous statistical analysis would be so casual about such a fundamental and familiar source of inferential error.
- Related to the preceding point, there may be a significant time lag between changes in actual corruption and both (a) public perception of corruption and (b) law enforcement data on corruption cases and convictions. The former lag could be shorter or longer than the latter; I have no strong prior views about which is more likely. Either way, a difference in the shape of the trend lines (if such existed) would not necessarily show that public opinion was wrong: it could be that the public has picked up on a change that hasn’t yet been reflected in law enforcement data, or that public opinion is finally moving, having internalized (after a delay) a change in corruption levels that was reflected years earlier in law enforcement data.
Now, what about the TI data? Here, the FiveThirtyEight post acknowledges a slight downward trend in the United States’ CPI score, but observes (correctly) that the score hasn’t changed that much. Does that show that the corruption problem in the U.S. is not worsening, or that the perceptions of the U.S. public as reflected in the Gallup poll are wildly off? Again, no:
- Given the way the CPI score is constructed (particularly before 2012 — more on that in a moment), very large within-country changes might not show up in significant changes in the CPI score (and, likewise, changes in the CPI score do not necessarily indicate significant changes in perceived corruption). The pre-2012 CPI score is a relative measure; if the U.S.’s ranking relative to other countries didn’t change much, then its CPI score wouldn’t change that much either. That could happen if (1) trends in other countries roughly matched those in the U.S, or (2) the differences between the U.S. and most of the other countries to which it is compared in the CPI are large enough that within-country changes do not have a big impact relative ranking. The latter possibility seems especially likely.
- Continuing on the above point, TI itself could not be any more explicit that pre-2012 CPI scores are not comparable across time, for a variety of statistical and methodological reasons (some but not all of which I summarized in the previous point). (This a sufficiently important and common error in the use of the CPI that I may devote a future post specifically to that topic.) Also, when TI changed its methodology in 2012, it went so far as to change the scaling system in 2012 from a 0-10 scale to a 0-100 scale, explicitly to emphasize that the pre-2012 scores were calculated differently from the scores for 2012 and after and could therefore not be compared. But the FiveThirtyEight post appears to have disregarded this entirely, and rendered the scales comparable simply by multiplying the pre-2012 CPI scores by 10. Now, the fact that there wasn’t much movement from 2011 to 2012 in the U.S. CPI score may suggest this doesn’t matter much in this particular case, but it still seems extraordinarily sloppy.
- Somewhat surprisingly for a statistics-oriented website, the FiveThirtyEight post does not report the statistical margin of error (or confidence interval) for either the Gallup poll data or the CPI. (Both are available, though pre-2012, finding the CPI confidence interval would require digging into some of the background material.) The question at issue, after all, is whether the Gallup trend and the CPI trend are different — or, to put this another way, whether we can confidently reject the null hypothesis that the two trends are the same. The FiveThirtyEight post not only doesn’t attempt this, it doesn’t evince any awareness that this is the question it’s trying to answer.
- The CPI is itself a perception measure — it’s a “poll of polls” that aggregates other surveys (generally of international investors and other alleged experts) about perceived corruption in different countries. So even if we believed that there was a meaningful difference between the trend in the Gallup data and the trend in the CPI data, all this would show is that two perception measures diverge; it wouldn’t tell us which one was more likely to be correct.
Just to be clear, I am not arguing that the corruption problem in the U.S. has in fact gotten worse over the last eight years, nor am I arguing that opinion poll data is a more reliable measure of true corruption than law enforcement statistics or CPI-style perception indices. I am, at least for the moment, agnostic on both these questions. What I am arguing is that FiveThirtyEight’s post on this topic is so badly flawed as to be essentially useless, even as a “hey-look-at-this”-style blog post. I realize that perhaps this much criticism of a short post by a junior author might seem like overkill. But I think it’s worth highlighting the problems of this post as a cautionary example that highlights a number of recurring problems in analyses of corruption data, especially when making comparisons across data sources and/or across time: failure