As most readers of this blog are likely aware, last month Transparency International (TI) released the 2017 edition of its important and influential Corruption Perceptions Index (CPI). As usual, the publication of the CPI triggered a fair bit of media coverage, much of it focused on how various countries ranked, and how individual country scores had changed from one year to the next (see, for example, here, here, here, and here).
There’s a lot to say about the most recent CPI—I may devote a post at some point to TI’s interesting decision to focus the press release accompanying the publication of the 2017 CPI less on the index itself than on the connection between (perceived) corruption and a lack of adequate freedom and protections for the media and civil society. But in this preliminary post, I want to take up an issue that regular GAB readers will know has been something of a fixation of mine in past years: the emphasis—in my view mostly misplaced—on how individual country CPI scores have changed from year to year.
In prior posts, I’ve raised a number of related but distinct concerns about the tendency of some commentators—and, more disturbingly, of some policymakers—to attach great significance to whether a country’s CPI score has gone up or down relative to previous years. For one thing, the sources used to construct the CPI for any given country may change from year to year—and adding or dropping an idiosyncratic source can have a substantial effect on the aggregate CPI score. For another, even when the underlying sources don’t change, we don’t know whether those sources are on the same implicit scale from year to year. And even if we put these problems to one side, a focus on changes in the final CPI score can sometimes obscure the statistical uncertainty associated with the estimated CPI—these scores can be noisy enough that changes in scores, even those that seem large, may not be statistically meaningful according to the conventional tests. Although TI always calculates statistical confidence intervals, in prior years these intervals have been buried in hard-to-find Excel spreadsheets, and the changes in CPI scores that TI highlights in its annual press releases haven’t always been statistically significant by TI’s own calculations. In an earlier post, I suggested that at the very least, TI should provide an easy-to-find, easy-to-read table assessing which changes in country scores are statistically significant at conventional levels, preferably over a 4-year period (as 1-year changes are both harder to detect if trends are gradual, and less interesting).
Apparently some folks within TI were thinking along similar lines, and I was pleased to see that in the 2017 CPI includes a reasonably prominent link to a spreadsheet showing those countries for which the 2017 CPI score showed a “statistically significant difference” from that country’s CPI score in each of five comparison years (2012, 2013, 2014, 2015, and 2016).
I’ve still got some criticisms and concerns, which—in the spirit of constructive engagement—I’ll turn to in just a moment. But before getting to that, let me pause to note my admiration for TI as an organization, and in this case its research department in particular, for constantly working to improve both the CPI itself and how it is presented and interpreted. It’s easy for folks like me to criticize—and I’ll continue to do so, in the interests of pushing for further improvements—but it’s much more challenging to absorb the raft of criticisms from so many quarters, sift through them, and invest the necessary time and resources to adapt and adjust from year to year. So, in case any folks at TI are reading this, let me first acknowledge and express my appreciation for how much work (often thankless) goes into the creation and continued improvement of this valuable tool.
Having said that, let me now proceed to raising some comments, questions, and concerns about TI’s claims about countries that appear to have experienced statistically meaningful changes in their CPI scores over the last five years.
- First, just a small technical note: It would be nice if TI would include, along with the tables of allegedly statistically significant changes, a note about the statistical significance threshold TI is using. The usual default threshold is 5%, which means that the chances that the difference between the 2017 score and the base year score would occur by chance (due to random error in the measures), even if there were no actual change in the level of perceived corruption, is less than 5%. But in the confidence intervals that TI includes in its larger data table, the statistical confidence interval appears wider (10%). I’m sure if I went into the data I could back out the statistical confidence level used to determine which changes are “significant,” but TI would save researchers a lot of time by just including a note on the data table itself.
- Second, and much more importantly, it’s important to recognize that TI’s results do not imply that there has actually been a meaningful change in the CPI score of any of the countries for which there’s a “statistically significant change” in the CPI score between the base year (I’ll just focus on 2012 to keep things simple) and 2017. To understand why, it’s important to recognize, as I noted above, that the conventional test for statistical significance asks, “What’s the probability that the difference between these two (noisy) estimates of some unobserved value would be this large, if the value itself were the same in both cases?” More concretely, in this particular context, we’re asking, “What’s the probability that the difference between Country X’s 2017 CPI estimate and its 2012 CPI estimate would be this large, if in fact the real level of perceived corruption in Country X hadn’t changed at all?” If the frequency with which such a large gap in the CPI would occur solely due to random measurement error, absent any real change in perceived corruption, is less than 5%, then we call the change “statistically significant.” That approach works tolerably well when we’re testing a hypothesis specific to a particular country, or set of countries. (For example, if we want to know whether countries without a free press are perceived as more corrupt than countries with a free press, then we would ask, “What’s the probability that the average CPI score for free-press countries would be that much higher than the average CPI score for no-free-press countries, if in fact average corruption is uncorrelated with press freedom?”). But that’s not really what TI is doing when it identifies a handful of countries for which there has been a “statistically significant change” between 2012 and 2017. Rather, TI is looking at all 180 countries in the index and identifying those for which the 2012-2017 difference would occur by chance less than 5% of the time. (TI finds 15 such countries). But with 180 countries, even if no country experienced a genuine change in its level of perceived corruption, we’d expect at least some countries to exhibit an unlikely change. Think about it this way: Suppose I give you a coin, which might be fair or biased, and you decide to test it by flipping it five times. If it comes up heads every time, you can conclude with some confidence that it’s probably not a fair coin, because the probability of getting five heads in a row with a fair coin is very low (a little over 3%). Now suppose I give you 180 coins, which might or might not be fair, and you flip each of them five times. Suppose that for seven of those coins, you get heads on all five flips, and for eight of those coins, you get tails on all five flips. Can you conclude with great confidence that each of those 15 coins must be biased, because the odds of getting five heads (or tails) in a row with a fair coin is only about 3%? No, of course not. With 180 coins, the odds are that some of the flips will give you the statistically unlikely result, even if all the coins are fair. But that’s basically the situation we’ve got with TI’s list of countries with “statistically significant differences” in CPI score between 2012 and 2017: seven countries had a “statistically significant” increase in the CPI score, while eight had a “statistically significant” decline, but since there was no reason ahead of time to expect that these particular countries and not others would experience a change, it seems quite plausible that all we have here are the coins that randomly came out all-heads and all-tails of five straight flips. The results, so far as I can tell, are entirely consistent with the null hypothesis that no country experienced any meaningful change in its perceived corruption level since 2012.
- Third, while it’s great that TI is making the effort to try to identify which countries’ score changes are statistically meaningful, unfortunately TI did not do the other thing I’d recommended in that earlier post by performing the analysis only with a constant set of sources. For none of the 15 countries for which TI finds statistically significant changes was the 2017 CPI based on exactly the same sources as the 2012 CPI. Does this make a difference? Well, for most countries, maybe not: I did some quick-and-dirty calculations, and found that in just over half (8 of 15) cases, if the CPI scores are calculated using only the sources available in both 2012 and 2017, the difference between the scores doesn’t change by more than a point. But in 6 of 15 cases, when the calculations are made with only common sources, the magnitude of the CPI change between 2012 and 2017 is at least 3 points smaller than what TI reports in its “statistically significant change” table—and in one case (Myanmar) the magnitude of the reported improvement drops by 40% (from a 15 point improvement to a 9 point improvement). And of course the reduction in the number of sources means that the standard errors in all of these cases go up as well, though I haven’t yet had time to calculate the revised standard errors. (It’s worth noting in passing that for one country—Senegal—there are only two common sources used to calculate the CPI in both 2012 and 2017, and since TI usually requires at least three sources for a country to be included in the CPI in the first place, there’s a plausible case that one should ignore the results on Senegal’s changed score.) Now, the absolute values of the changes for all these countries are still large enough that it’s likely that they all still satisfy the conventional thresholds for statistical significance, but I hope in the future TI does these comparisons using only common sources. Otherwise it’s entirely possible that seemingly large changes are an artifact of changing sources. (The opposite is also possible – genuinely large changes might be masked by the dropping or adding of outlier sources too.)
- Fourth, though of less importance, some of the reported numbers in TI’s table just look a little screwy to me. They may be correct—I haven’t had a chance to check—but I figured it might be worth flagging. For example, the standard error for the CPI estimate for Saint Lucia in 2012 is 0.8 (the smallest in the set of 15 countries in the table), despite the fact that this estimate is based on only three sources. In 2017, with the estimate based again on three sources (two of which were also used in 2012), the standard error is 4.27—the highest of the 15 countries included. Eyeballing the table, it appears that this is because the 2012 scores are all very close together (70, 71, and 73), while in 2017 they are more dispersed due to a low outlier (47, 59, 60). This implies that between 2012 and 2017, our estimate of perceived corruption in Saint Lucia became not only worse, but dramatically less precise. That may well be the right conclusion, but it’s worth highlighting how sensitive these standard error estimates are to seemingly small changes in scores. Part of the issue might be the fact that the assumption used to calculate these standard errors, and thus to determine which changes are statistically meaningful, is that all of these sources are measuring the same underlying variable (“perceived corruption”), but are doing so with essentially random measurement error, and that the random errors in individual sources are independent from year to year. If those assumptions aren’t correct, then the calculations that follow from them may be misleading.
- Finally, it’s worth reiterating yet again that even if we put all the above concerns to one side, we still don’t know whether the individual sources are using a constant scale from year to year. They may well not be, particularly the ones that are focused on comparing countries to one another, rather than measuring change over time.
So, while TI is definitely making admirable progress in how it calculates and presents CPI data, I still don’t think that anyone should put much stock in within-country, year-to-year changes in CPI scores. To its credit, TI’s press release accompanying this year’s CPI does not place nearly as much emphasis on recent CPI movements as past CPI press releases did. And it’s worth highlighting that countries that previous CPI press releases emphasized as the biggest movers (Qatar in 2016, Brazil in 2015, China in 2014) do not appear on TI’s list of the countries with statistically significant changes in corruption between any of the baseline years (2012-2016) and 2017, which should serve as a reminder not to make too much of any single-year changes. (An aside: Somewhat bafflingly, the press release does identify Cote d’Ivoire as a country that has seen a significant improvement in its CPI score in the last several years, even though Cote d’Ivoire doesn’t show up on TI’s list of countries that have seen statistically significant improvements in their CPI scores since 2012. I’m not sure what’s up with that – maybe an error in either the press release or the table?) TI’s emphasis—rightly, in my view—appears to be shifting to more of a focus on the factors that correlate with (and perhaps contribute to) the overall perception of corruption in different countries. More on that in a later post, I hope.
Pingback: The New Corruption Perceptions Index Identifies Countries with Statistically Significant Changes in Perceived Corruption–Should We Credit the Results? | Matthews' Blog
Here is my take on Nepal and CPI: http://www.myrepublica.com/news/37097/?categoryId=81
. . .
Is the CPI past it’s use-by date?
Asking because statistical analysis like that above is (way) above my paygrade. But advent of organisations such as TNJ, the Tax Justice Network, and its studies showing empirical evidence of illicit financial flows that reflect an opposing picture – first world countries are by far the biggest beneficiaries of corruption.
TJN for example report $1tn in flows out of Africa over the last half century. Yet TI paint first world countries with the colour yellow, described in its map keys as “very clean.” TI might claim that the CPI remains a survey focused on public sector corruption, and that illicit financial flows primarily concern the public sector. This would ignore the fact that first world governments have long promised to crack down on bribery and other forms of corruption.
When the CPI first started in 1996 it was an admirable attempt to introduce some comparative ranking around a difficult subject. Addressing corruption “perceptions” was an understandably imprecise approach in a pre-web era when official stats were difficult to come by on a reliable and up-to-date basis. The question must be asked, is the CPI still fit-for-purpose when even TI has more reliable indicators such as the Corruption Barometer Index – which records the percentage of people who report paying an actual (as opposed to perceived) bribe.
In my own country, New Zealand, once again perceived as “very clean” we have only just voted out a leader who once championed the cause of turning us into the “Switzerland of the South Pacific” – and who denied the need to change Foreign Trust laws even as the Panama Papers scandal broke – and with the leaker picking out former PM John Key by name. Despite the denials, public pressure saw an allegedly independent inquiry recommend changes to Foreign Trust laws, after which 12,000 foreign companies deregistered themselves.
As if this were not fishy enough, a request I made to the John Key government for policy advice given to his predecessor, Helen Clark, on her introduction of foreign trust laws in 2009 was denied, because it failed the “public interest” test.
When I asked the head of the NZ TI chapter about the foreign trust laws and causes for concern at the lack of transparency behind their introduction by Clark, and their continuance by Key, in relation to her bid for the Secretary General of the United Nations, and Key’s support for her bid, there was no reply. When I complained about the lack of response in a public comment, a TINZ official claimed they had not received the email query, despite it being sent to the address given on their website. They also explained a lack of a Twitter account as being due to online criticism, a curious stance given most other chapters make themselves available by that channel.
TINZ is a registered charity, whose latest available annual report notes related party payments to the TINZ CEO via the company in which she owns shares, on the basis that “it does not have public accountability..” despite TINZ attracting significant funding from the same government that TI declares to be very clean.
A final minor note – the annual report for TINZ financial year ending June 2017 with a due date of December last year appears to be more than two months overdue, for the first time in many years.
My reason for raising these particulars is to provide background around the growing difficulties facing the “perception” index at the top of the table, with New Zealand once again equal-pegging with Denmark. I am not suggesting that the CPI be tossed aside. Gaps between perception and reality are also instructive.
But the question must be asked whether Transparency International should continue to treat the CPI as its main report, or move focus to more evidence-based approaches.