As most readers of this blog are likely aware, last month Transparency International (TI) released the 2017 edition of its important and influential Corruption Perceptions Index (CPI). As usual, the publication of the CPI triggered a fair bit of media coverage, much of it focused on how various countries ranked, and how individual country scores had changed from one year to the next (see, for example, here, here, here, and here).
There’s a lot to say about the most recent CPI—I may devote a post at some point to TI’s interesting decision to focus the press release accompanying the publication of the 2017 CPI less on the index itself than on the connection between (perceived) corruption and a lack of adequate freedom and protections for the media and civil society. But in this preliminary post, I want to take up an issue that regular GAB readers will know has been something of a fixation of mine in past years: the emphasis—in my view mostly misplaced—on how individual country CPI scores have changed from year to year.
In prior posts, I’ve raised a number of related but distinct concerns about the tendency of some commentators—and, more disturbingly, of some policymakers—to attach great significance to whether a country’s CPI score has gone up or down relative to previous years. For one thing, the sources used to construct the CPI for any given country may change from year to year—and adding or dropping an idiosyncratic source can have a substantial effect on the aggregate CPI score. For another, even when the underlying sources don’t change, we don’t know whether those sources are on the same implicit scale from year to year. And even if we put these problems to one side, a focus on changes in the final CPI score can sometimes obscure the statistical uncertainty associated with the estimated CPI—these scores can be noisy enough that changes in scores, even those that seem large, may not be statistically meaningful according to the conventional tests. Although TI always calculates statistical confidence intervals, in prior years these intervals have been buried in hard-to-find Excel spreadsheets, and the changes in CPI scores that TI highlights in its annual press releases haven’t always been statistically significant by TI’s own calculations. In an earlier post, I suggested that at the very least, TI should provide an easy-to-find, easy-to-read table assessing which changes in country scores are statistically significant at conventional levels, preferably over a 4-year period (as 1-year changes are both harder to detect if trends are gradual, and less interesting).
Apparently some folks within TI were thinking along similar lines, and I was pleased to see that in the 2017 CPI includes a reasonably prominent link to a spreadsheet showing those countries for which the 2017 CPI score showed a “statistically significant difference” from that country’s CPI score in each of five comparison years (2012, 2013, 2014, 2015, and 2016).
I’ve still got some criticisms and concerns, which—in the spirit of constructive engagement—I’ll turn to in just a moment. But before getting to that, let me pause to note my admiration for TI as an organization, and in this case its research department in particular, for constantly working to improve both the CPI itself and how it is presented and interpreted. It’s easy for folks like me to criticize—and I’ll continue to do so, in the interests of pushing for further improvements—but it’s much more challenging to absorb the raft of criticisms from so many quarters, sift through them, and invest the necessary time and resources to adapt and adjust from year to year. So, in case any folks at TI are reading this, let me first acknowledge and express my appreciation for how much work (often thankless) goes into the creation and continued improvement of this valuable tool.
Having said that, let me now proceed to raising some comments, questions, and concerns about TI’s claims about countries that appear to have experienced statistically meaningful changes in their CPI scores over the last five years.
- First, just a small technical note: It would be nice if TI would include, along with the tables of allegedly statistically significant changes, a note about the statistical significance threshold TI is using. The usual default threshold is 5%, which means that the chances that the difference between the 2017 score and the base year score would occur by chance (due to random error in the measures), even if there were no actual change in the level of perceived corruption, is less than 5%. But in the confidence intervals that TI includes in its larger data table, the statistical confidence interval appears wider (10%). I’m sure if I went into the data I could back out the statistical confidence level used to determine which changes are “significant,” but TI would save researchers a lot of time by just including a note on the data table itself.
- Second, and much more importantly, it’s important to recognize that TI’s results do not imply that there has actually been a meaningful change in the CPI score of any of the countries for which there’s a “statistically significant change” in the CPI score between the base year (I’ll just focus on 2012 to keep things simple) and 2017. To understand why, it’s important to recognize, as I noted above, that the conventional test for statistical significance asks, “What’s the probability that the difference between these two (noisy) estimates of some unobserved value would be this large, if the value itself were the same in both cases?” More concretely, in this particular context, we’re asking, “What’s the probability that the difference between Country X’s 2017 CPI estimate and its 2012 CPI estimate would be this large, if in fact the real level of perceived corruption in Country X hadn’t changed at all?” If the frequency with which such a large gap in the CPI would occur solely due to random measurement error, absent any real change in perceived corruption, is less than 5%, then we call the change “statistically significant.” That approach works tolerably well when we’re testing a hypothesis specific to a particular country, or set of countries. (For example, if we want to know whether countries without a free press are perceived as more corrupt than countries with a free press, then we would ask, “What’s the probability that the average CPI score for free-press countries would be that much higher than the average CPI score for no-free-press countries, if in fact average corruption is uncorrelated with press freedom?”). But that’s not really what TI is doing when it identifies a handful of countries for which there has been a “statistically significant change” between 2012 and 2017. Rather, TI is looking at all 180 countries in the index and identifying those for which the 2012-2017 difference would occur by chance less than 5% of the time. (TI finds 15 such countries). But with 180 countries, even if no country experienced a genuine change in its level of perceived corruption, we’d expect at least some countries to exhibit an unlikely change. Think about it this way: Suppose I give you a coin, which might be fair or biased, and you decide to test it by flipping it five times. If it comes up heads every time, you can conclude with some confidence that it’s probably not a fair coin, because the probability of getting five heads in a row with a fair coin is very low (a little over 3%). Now suppose I give you 180 coins, which might or might not be fair, and you flip each of them five times. Suppose that for seven of those coins, you get heads on all five flips, and for eight of those coins, you get tails on all five flips. Can you conclude with great confidence that each of those 15 coins must be biased, because the odds of getting five heads (or tails) in a row with a fair coin is only about 3%? No, of course not. With 180 coins, the odds are that some of the flips will give you the statistically unlikely result, even if all the coins are fair. But that’s basically the situation we’ve got with TI’s list of countries with “statistically significant differences” in CPI score between 2012 and 2017: seven countries had a “statistically significant” increase in the CPI score, while eight had a “statistically significant” decline, but since there was no reason ahead of time to expect that these particular countries and not others would experience a change, it seems quite plausible that all we have here are the coins that randomly came out all-heads and all-tails of five straight flips. The results, so far as I can tell, are entirely consistent with the null hypothesis that no country experienced any meaningful change in its perceived corruption level since 2012.
- Third, while it’s great that TI is making the effort to try to identify which countries’ score changes are statistically meaningful, unfortunately TI did not do the other thing I’d recommended in that earlier post by performing the analysis only with a constant set of sources. For none of the 15 countries for which TI finds statistically significant changes was the 2017 CPI based on exactly the same sources as the 2012 CPI. Does this make a difference? Well, for most countries, maybe not: I did some quick-and-dirty calculations, and found that in just over half (8 of 15) cases, if the CPI scores are calculated using only the sources available in both 2012 and 2017, the difference between the scores doesn’t change by more than a point. But in 6 of 15 cases, when the calculations are made with only common sources, the magnitude of the CPI change between 2012 and 2017 is at least 3 points smaller than what TI reports in its “statistically significant change” table—and in one case (Myanmar) the magnitude of the reported improvement drops by 40% (from a 15 point improvement to a 9 point improvement). And of course the reduction in the number of sources means that the standard errors in all of these cases go up as well, though I haven’t yet had time to calculate the revised standard errors. (It’s worth noting in passing that for one country—Senegal—there are only two common sources used to calculate the CPI in both 2012 and 2017, and since TI usually requires at least three sources for a country to be included in the CPI in the first place, there’s a plausible case that one should ignore the results on Senegal’s changed score.) Now, the absolute values of the changes for all these countries are still large enough that it’s likely that they all still satisfy the conventional thresholds for statistical significance, but I hope in the future TI does these comparisons using only common sources. Otherwise it’s entirely possible that seemingly large changes are an artifact of changing sources. (The opposite is also possible – genuinely large changes might be masked by the dropping or adding of outlier sources too.)
- Fourth, though of less importance, some of the reported numbers in TI’s table just look a little screwy to me. They may be correct—I haven’t had a chance to check—but I figured it might be worth flagging. For example, the standard error for the CPI estimate for Saint Lucia in 2012 is 0.8 (the smallest in the set of 15 countries in the table), despite the fact that this estimate is based on only three sources. In 2017, with the estimate based again on three sources (two of which were also used in 2012), the standard error is 4.27—the highest of the 15 countries included. Eyeballing the table, it appears that this is because the 2012 scores are all very close together (70, 71, and 73), while in 2017 they are more dispersed due to a low outlier (47, 59, 60). This implies that between 2012 and 2017, our estimate of perceived corruption in Saint Lucia became not only worse, but dramatically less precise. That may well be the right conclusion, but it’s worth highlighting how sensitive these standard error estimates are to seemingly small changes in scores. Part of the issue might be the fact that the assumption used to calculate these standard errors, and thus to determine which changes are statistically meaningful, is that all of these sources are measuring the same underlying variable (“perceived corruption”), but are doing so with essentially random measurement error, and that the random errors in individual sources are independent from year to year. If those assumptions aren’t correct, then the calculations that follow from them may be misleading.
- Finally, it’s worth reiterating yet again that even if we put all the above concerns to one side, we still don’t know whether the individual sources are using a constant scale from year to year. They may well not be, particularly the ones that are focused on comparing countries to one another, rather than measuring change over time.
So, while TI is definitely making admirable progress in how it calculates and presents CPI data, I still don’t think that anyone should put much stock in within-country, year-to-year changes in CPI scores. To its credit, TI’s press release accompanying this year’s CPI does not place nearly as much emphasis on recent CPI movements as past CPI press releases did. And it’s worth highlighting that countries that previous CPI press releases emphasized as the biggest movers (Qatar in 2016, Brazil in 2015, China in 2014) do not appear on TI’s list of the countries with statistically significant changes in corruption between any of the baseline years (2012-2016) and 2017, which should serve as a reminder not to make too much of any single-year changes. (An aside: Somewhat bafflingly, the press release does identify Cote d’Ivoire as a country that has seen a significant improvement in its CPI score in the last several years, even though Cote d’Ivoire doesn’t show up on TI’s list of countries that have seen statistically significant improvements in their CPI scores since 2012. I’m not sure what’s up with that – maybe an error in either the press release or the table?) TI’s emphasis—rightly, in my view—appears to be shifting to more of a focus on the factors that correlate with (and perhaps contribute to) the overall perception of corruption in different countries. More on that in a later post, I hope.