A little while back, I expressed some skepticism about whether Transparency International’s Corruption Perceptions Index (CPI) scores can be compared across time, even after TI changed its methodology in 2012 and claimed that its new scores would now be comparable across years. More recently, I criticized TI’s 2014 CPI for burying the information on the margins of error associated with the CPI values, and for wrongly asserting that changes in the CPI score between 2013 and 2014 for certain countries (most notably China) were substantively meaningful. (In fact, not only does the change in China’s score between 2013 and 2014 seem not to be statistically significant, but the change was due almost entirely to the dropping of a source in which China did abnormally well in 2013, and an abnormally large movement in a single other source.) I decided to follow up on this by taking a closer look at the other ten countries that TI singled out as having experienced significant CPI changes (in either direction) between 2013 and 2014.
Upon closer examination, I’m even more certain that CPI scores cannot be compared over time. I’m also more confident in my judgment that TI has been unforgivably sloppy — and downright misleading — in how it, and its representatives, have portrayed the substantive significance of these CPI changes. It turns out that the problem I found with the China calculations was not unusual. For almost all of the eleven countries TI identified as big movers, the CPI changes were driven by (1) the addition or elimination of sources from year to year for particular countries, and/or (2) abnormally large (indeed, implausibly large) movements in a single source. Until TI fixes its methodology, the safest thing to do is to ignore year-to-year changes in the CPI. And for the sake of preserving its own integrity and credibility, TI should either (A) persuasively explain why I am wrong in my analysis of the data (in which case I will gladly concede error), or (B) issue some sort of retraction or correction to its earlier press releases, and either drop the claim that post-2012 CPI scores can be compared across time or fix its methodology going forward.
Allow me to elaborate my analysis of the data:
First of all, for seven out of the eleven countries that TI identified as having experienced substantial changes in perceived corruption between 2013 and 2014, much–and in some cases most–of the difference was due to the addition or elimination of a source in calculating the CPI. As I noted in my earlier post, most of the drop in China’s 2014 CPI score was due to the fact that a source on which China did unusually well in 2013 was not included in 2014. This is also true for three of the other four countries that TI claimed had experienced a significant worsening in perceived corruption (Angola, Rwanda, and Turkey). In fact, for Rwanda, the worsening of the CPI was due entirely to the fact that the 2014 CPI did not use a source on which Rwanda had scored especially well in 2013. Of the four sources that were used for Rwanda in both years, three exhibited no change and one actually showed a slight improvement. The same phenomenon was at work for three of the seven countries where TI claimed a big improvement in the CPI. For both Egypt and Swaziland, the 2013 CPI was calculated using a source on which those countries got scores notably lower than their scores on other sources, but the 2014 CPI did not include that source. And for Afghanistan, the 2014 CPI incorporated a new source, on which Afghanistan’s score was well above its average from the other sources.
Aside from changes caused by addition or subtraction of sources, most of the rest of the changes in the CPI scores were driven not by a consistent movement picked up in a number of sources, but rather an abnormally large — often an implausibly large — change in a single source. For example, consider Mali, which TI identifies as another country in which perceived corruption significantly decreased. In both 2013 and 2014, Mali’s CPI score was based on six sources. In four of those sources, Mali’s 2013 and 2014 scores were identical, and in one there was a modest improvement (+6 points on a 100 point scale); on the sixth source (Global Insight’s Country Risk Ratings), Mali’s improvement was enormous–a full 20 points (from 22 to 42). Now, it’s not impossible that there was in fact a big change that only this source picked up, and there’s a case to be made, I suppose, that averaging the sources and calculating the confidence interval should address any concerns. But my instinctive view is that if five out of six sources find no more than a small change, and one source detects a massive movement (covering 20% of the scale), it’s more likely that something screwy is going on with that one source.
And Mali isn’t the only country where the statistically significant change that TI reports appears to be driven by an idiosyncratically large change in a single source. For instance, St. Vincent & Grenadines–like Mali characterized by TI as a big improver–showed no change on two sources, but a very large (17 point) improvement on one (the Economist Intelligence Unit (EIU) index). In an admittedly closer case, Malawi–listed by TI as a country where perceived corruption got much worse–did indeed show significant worsening in two of the eight sources used to calculate its score (a 12 point drop on the World Bank Institutional Assessment, and a 17 point drop on the EIU). But on the other six sources, four showed no change, one had a very small (1 point) worsening, and one actually showed a modest (4 point) improvement.
This is not to say that every significant within-country 2013-2014 CPI change was the result of adding or dropping sources, or of big anomalous changes in only one or two sources. In Jordan, for example, although four of the seven sources used to calculate Jordan’s 2013 and 2014 scores showed no change, three other sources picked up improvements of roughly similar size (+8 on the IMD World Competitiveness Yearbook, +8 on the World Economic Forum (WEF) Executive Opinion Survey, and +10 on Global Insight). To me, that suggests a genuine change in perceptions. Cote d’Ivoire is a bit more ambiguous–much of the improvement from 2013 to 2014 appears due to a facially implausible 27-point jump on one source (the WEF survey), and four of the other seven sources show no change whatsoever. Still, the three remaining sources do all show improvements (of +1, +3, and +12), so there does seem to be evidence that some improvement in perceptions is occurring and getting picked up by multiple sources.
But for the most part, the changes in CPI scores that TI emphasizes in its press releases and other public statements do not seem to be based on reliable evidence of true changes in corruption perceptions. Most often, the changes are driven by the addition or subtraction of sources from year to year, and/or by implausibly and idiosyncratically large changes in a single source. To me, this seems like a very good reason not to pay any heed to year-to-year changes in CPI scores, even given TI’s welcome changes to its methodology in 2012. (By the way, the numbers I’m using for all of the above discussion come from TI’s data for the 2013 and 2014 CPI, available here and here (click on the “download info package” link below the results table in both cases). I hope others will double-check my calculations. It is entirely possible that I have made some errors, and if so I will happily and promptly correct them.)
Now, maybe there are further adjustments that TI could make in its approach that would allow it to make meaningful cross-year comparisons. In a future post, I might try to be a bit more constructive by trying to suggest some thoughts along those lines. But for now I’ll just double down on my critique: a careful examination of the underlying data reveals that CPI scores should not, as a general matter, be compared across time, and TI’s public statements about which countries improved or worsened significantly are not well-grounded in TI’s evidence, and ought to be retracted or qualified as soon as possible, in order for TI to preserve its credibility.
This is disturbing. A majority of readers will never look past TI’s simplified index or color-coded map, so they are using these inaccurate ratings to draw conclusions about trends in these countries. I hope TI does not dismiss this as a harmless statistical error, because these kinds of inflated figures can cause serious harm. In undergrad, my development economics professor talked about the shift from using a development index based solely on GDP per capital to the human development index, which was a much more accurate measure of a country’s development because it considered factors such as child mortality, literacy, and access to vital resources. He argued that the years it took for institutions like the World Bank and U.N. to switch from the traditional GDP index to the human development index actually caused harm to developing countries everywhere because institutions overestimated the success of existing programs and were unable to design targeted programs based on the needs of society. Similarly, overestimating a country’s progress in its anticorruption efforts can actually cause more harm than good by worsening corruption, especially if institutions, businesses, and individuals are relying on these figures (and they likely are given TI’s prominence in the field) in deciding what anticorruption programs to implement or evaluate, what investments to make, what businesses to open, etc.
I’m very much in agreement, especially with your last point about how erroneous conclusions about a country’s progress in fighting corruption can be counterproductive. And the problem goes in both directions: both overestimating progress and underestimating its progress can be destructive. I think how the alleged move in China’s CPI score illustrates this: It was widely reported as evidence that China’s current anticorruption drive is not working. But that conclusion isn’t warranted by the data. And the reverse could just as easily have occurred: Suppose that the 2013 China source that got dropped in 2014 was one on which China performed unusually _poorly_, instead of unusually well. Then, the headline story would likely have been about how China’s anticorruption crackdown was paying off, leading to a big improvement in its CPI score. Neither conclusion is warranted by the data.
Your first point, and the comparison to the HDI, is also useful, though of course there we’re talking about a somewhat different issue. The analogue in the CPI context might be the complaint (which has been discussed on this blog and elsewhere) that the CPI might only be picking up perceptions of certain _kinds_ of corruption (those that are most visible to the various expert observers whose assessments are used to construct the CPI). To TI’s credit, they at least openly acknowledge this issue in the information they provide about the CPI on their website. I wish, though, that they’d provide a similar disclaimer regarding the problem of year-to-year comparisons.
I entirely agree with Matthew’s comment. In 2005, Lambsdroff, the inventor of CPI tried to make CPI figures comparable over time by making data sources consistent. I lost track of his paper, he was able to sample something like 30-31 countries with consistent data sources. As long as data sources are not consistent over time, it will be wrong to compare one year’s CPI data with the next year. I am also wondering why TI is claiming its CPI scores from 2012 onwards are comparable, after up scaling from 0-10 to 0-100.
On your last point, in defense of TI, the change in scale (from 0-10 to 0-100) was precisely to signal that the methodology changed in 2012, so pre-2012 CPI scores are not comparable to CPI scores from 2012 or later.
But on the larger issue here, we are very much in agreement. I also remember Professor Lambsdorff making the point that CPI figures were only comparable when the underlying data sources stayed constant (though like you, I couldn’t find the paper where he said this — he’s written so much, including on the CPI, it’s hard to track these things downs!). Perhaps if TI wants to continue to draw inferences from year-to-year CPI changes, they should do a special analysis that recalculates the scores for each country using only those sources that are available in both of the comparison years.
Journalists and pundits still aren’t getting the message. Hence this excerpt from an op-ed in the February 20, 2015, New York Times headlined “Indonesia’s Corruption Fighters in the Fight of Their Lives” by Carol Giacomo:
“Reducing corruption in Indonesia is critical if the country is to grow as quickly as Mr. Joko says it must to meet the needs of its people. Commission officials say their prosecution actions have returned millions of dollars to state coffers. Transparency International, which annually rates countries on corruption in their public sectors, says Indonesia has improved its performance on the organization’s “corruption perception index” from 1.9 in 2003 to 34 in 2014; a score of 100 means the public sector is very clean. But Indonesia still ranks 107th out of 175 countries. For comparison, China is 100th on the list of nations, and Russia is 136th.” http://www.nytimes.com/2015/02/20/opinion/indonesias-corruption-fighters-in-the-fight-of-their-lives.html?utm_source=Active+Subscribers&utm_campaign=4bba872b96-MR_021815&utm_medium=email&utm_term=0_35c49cbd51-4bba872b96-64132121&_r=0
As Matthew noted when I showed it to him, what makes this so egregious is that Ms. Giacomo doesn’t realize that TI changed its scoring scale from 1 – 10 to 1 – 100 in 2012, making the comparison between a 2003 and 2014 score all the more meaningless.