Guest Post–Assessing Corruption with Big Data

Today’s guest post is from Enestor Dos Santos, principal economist at BBVA Research.

Ascertaining the actual level of corruption is not easy, given that it is usually a clandestine activity, and much of the available data is not comparable across countries or across time. Survey data on corruption experience can be helpful, but it is often limited to very specific kinds of corruption (such as petty bribery). Researchers and analysts have therefore, quite reasonably, tended to rely on subjective corruption perception data, such as Transparency International’s well-known Corruption Perceptions Index (CPI). (The CPI aggregates corruption perception data from a variety of other sources, mostly expert assessments.) But conventional corruption perception measures (including those use to construct the CPI) have well-known problems, including limited coverage (with respect to both years and countries) and relatively low frequency (usually annual). And they rely on the perceptions of a handful of experts, which may not necessarily be representative. These limitations mean that while traditional perception measures like the CPI may be useful for some purposes, they are not as helpful for others, such as measuring the impact of individual events or news reports on corruption perceptions, or how changes in corruption perceptions affect government approval ratings.

To address these concerns, a recent study by BBVA Research, entitled Assessing Corruption with Big Data, offered an alternative, complementary type of corruption perceptions measure, based on Google web searches about corruption. To construct this index, we examined all web searches classified by Google Trends in the “Law and Government” category for individual countries, and calculated the proportion of those searches that contain the word “corruption” (in any language and including its misspellings and synonyms). Our index, which begins in 2004, covers more than 190 countries and, unlike traditional corruption indicators, is available in real-time and with high-frequency (monthly). Moreover, it can be reproduced very easily and at very low cost.

Here are some of our main findings: Continue reading

On the Political Subtext of Definition Debates, Part 2: Measurement or Moralism?

In my last post, I conjectured that a great deal of what would seem like a dry methodological question—How should we define and measure corruption?—is actually shot through with political-ideological considerations. The reason, I further conjectured, is that “corruption” is both (1) a descriptive sociological term, used to categorize a set of related behaviors, and (2) an evaluative moral term, used to characterize certain behaviors (or people or governments or institutions or countries) as “bad” or “blameworthy.” The fact that the same term has these different functions, coupled with the fact that the word “corruption” is particularly (though not uniquely) ambiguous and open-ended, means that attempts to come up with definitions and measurements that are appropriate for some purposes may seem to others wrongheaded, even offensive.

My illustration of this difficulty in the my last post concerned debates over whether “corruption” should be defined (say, by advocacy organizations or researchers) principally as “the abuse of public power for private gain,” or instead should be defined to include purely private sector corruption (“abuse of entrusted power for private gain”). My admittedly speculative conjecture was that many (not all) who argue for the latter position do so not so much because of (plausible) arguments for analytical equivalence, but rather due to an implicit—and in my view incorrect—belief that focusing on public sector corruption suggests a neoliberal/libertarian skepticism of activist government.

Here I want to suggest a similar sort of ideological subtext in debates over whether the definition of corruption (and the sorts of corruption that the leading indicators should seek to capture) ought to be limited to what we might think of as the “direct” or “first-order” dishonest acts by the responsible officials (such as taking bribes or embezzling funds), or whether measures of corruption should also incorporate the activities that facilitate corruption (such as providing safe havens for stolen assets), as well as the ways in which the rich and powerful seek to influence public policy through legal means (such as lobbying and campaign donations). This has come up more than a couple of times in the last few months at various conferences and roundtable discussions I’ve attended. The context is typically a criticism—often impassioned—of Transparency International’s Corruption Perceptions Index (CPI) and the associated graphics (such as the color-coded country map) that are used to illustrate the index results. The criticism usually runs as follows (and here I’m paraphrasing, but I think fairly and accurately): Continue reading

More on the 2017 Corruption Perceptions Index, and the Relationship Between Media/Civil Society Freedom and Corruption

The rest of the anticorruption commentariat (and the mainstream media) may have already moved on from the publication of Transparency International’s 2017 Corruption Perception Index (CPI), but I wanted to follow up on my other posts from earlier this month (here and here) to discuss one other aspect of the new CPI. The general overview, press release, and other supporting materials that accompanied the latest CPI stress as their main theme the importance of a free press and a robust, independent civil society in the fight against corruption. As TI states succinctly in the overview page for the 2017 CPI, “[A]nalysis of the [CPI] results indicates that countries with the least protection for press and non-governmental organisations (NGOs) also tend to have the worst rates of corruption.” And from this observation, TI argues that in order to make progress in the fight against corruption, governments should “do more to encourage free speech, independent media, political dissent and an open an engaged civil society,” and should “minimize regulations on media … and ensure that journalists can work without fear of repression or violence.” (TI also suggests that international donors should consider press freedom relevant to development aid or access to international organizations, a provocative suggestion that deserves fuller exploration elsewhere.)

Speaking in broad terms, I agree with TI’s position, and I’m heartened to see TI making an effort to use the publicity associated with the release of the CPI to push for concrete improvements on a particular area of importance, rather than simply stressing the bad effects of corruption (such as the alleged adverse impacts on inequality and poverty), or devoting undue attention to (statistically meaningless) movements in country scores from previous years. Whether TI succeeded in leveraging the CPI’s publicity into more attention to the freedom of the media and civil society is another story, but the effort is commendable.

That said, I spent a bit of time digging into the supporting research documents that TI provided on this issue, and I find myself in the uncomfortable position of finding the proffered evidentiary basis for the link between a free press/civil society and progress in the fight against corruption problematic, to put it mildly—even though my own reading of the larger academic literature on the topic makes me think the ultimate conclusion is likely correct, at least in broad terms. That latter fact, coupled with my recognition that the materials I’m evaluating are advocacy documents rather than academic research papers, makes me reluctant to criticize too harshly. Nonetheless, on the logic that it’s important to hold even our friends and allies accountable, and that in the long term promoting more careful and rigorous analysis will produce both more suitable policy prescriptions and better advocacy, I’m going to lay out my main difficulties with TI’s data analysis on the press freedom-corruption connection: Continue reading

Adjusting Corruption Perception Index Scores for National Wealth

My post two weeks ago discussed Transparency International’s newly-released 2017 Corruption Perceptions Index (CPI), focusing in particular on an old hobby-horse of mine: the hazards of trying to draw substantive conclusions from year-to-year changes in any individual country’s CPI score. Today I want to continue to discuss the 2017 CPI, with attention to a different issue: the relationship between a country’s wealth and its CPI score. It’s no secret that these variables are highly correlated. Indeed, per capita GDP remains the single strongest predictor of a country’s perceived corruption level, leading some critics to suggest that the CPI doesn’t really measure perceived corruption so much as it measures wealth—penalizing poor countries by portraying them as more corrupt, when in fact their corruption may be due more to their poverty than to deficiencies in their cultures, policies, and institutions.

This criticism isn’t entirely fair. Per capita income is a strong predictor of CPI scores, but they’re far from perfectly correlated. Furthermore, even if it’s true that worse (perceived) corruption is in large measure a product of worse economic conditions, that doesn’t mean there’s a problem with the CPI as such, any more than a measure of infant mortality is flawed because it is highly correlated with per capita income. (And of course because corruption may worsen economic outcomes, the correlation between wealth and CPI scores may be a partial reflection of corruption’s impact, though I doubt there are many who think that this relationship is so strong that the causal arrow runs predominantly from corruption to national wealth rather than from national wealth to perceived corruption.)

Yet the critics do have a point: When we look at the CPI results table, we see a lot of very rich countries clustered at the top, and a lot of very poor countries clustered at the bottom. That’s fine for some purposes, but we might also be interested in seeing which countries have notably higher or lower levels of perceived corruption than we would expect, given their per capita incomes. As a crude first cut at looking into this, I merged the 2017 CPI data table with data from the World Bank on 2016 purchasing-power-adjusted per capita GDP. After dropping the countries that appeared in one dataset but not the other, I had a 167 countries. I then ran a simple regression using CPI as the outcome variable and the natural log of per capita GDP as the sole explanatory variable. (I used the natural log partly to reduce the influence of extreme income outliers, and partly on the logic that the impact of GDP on perceived corruption likely declines at very high levels of income. But I admit it’s something of an arbitrary choice and I encourage others who are interested to play around with the data using alternative functional forms and specifications.)

This single variable, ln per capita GDP, explained about half of the total variance in the data (for stats nerds, the R2 value was about 0.51), meaning that while ln per capita GDP is a very powerful explanatory variable, there’s a lot of variation in the CPI that it doesn’t explain. The more interesting question, to my mind, concerns the countries that notably outperform or underperform the CPI score that one would predict given national wealth. To look into this, I simply ranked the 167 countries in my data by the size of the residuals from the simple regression described above. Here are some of the things that I found: Continue reading

The New Corruption Perceptions Index Identifies Countries with Statistically Significant Changes in Perceived Corruption–Should We Credit the Results?

As most readers of this blog are likely aware, last month Transparency International (TI) released the 2017 edition of its important and influential Corruption Perceptions Index (CPI). As usual, the publication of the CPI triggered a fair bit of media coverage, much of it focused on how various countries ranked, and how individual country scores had changed from one year to the next (see, for example, here, here, here, and here).

There’s a lot to say about the most recent CPI—I may devote a post at some point to TI’s interesting decision to focus the press release accompanying the publication of the 2017 CPI less on the index itself than on the connection between (perceived) corruption and a lack of adequate freedom and protections for the media and civil society. But in this preliminary post, I want to take up an issue that regular GAB readers will know has been something of a fixation of mine in past years: the emphasis—in my view mostly misplaced—on how individual country CPI scores have changed from year to year.

In prior posts, I’ve raised a number of related but distinct concerns about the tendency of some commentators—and, more disturbingly, of some policymakers—to attach great significance to whether a country’s CPI score has gone up or down relative to previous years. For one thing, the sources used to construct the CPI for any given country may change from year to year—and adding or dropping an idiosyncratic source can have a substantial effect on the aggregate CPI score. For another, even when the underlying sources don’t change, we don’t know whether those sources are on the same implicit scale from year to year. And even if we put these problems to one side, a focus on changes in the final CPI score can sometimes obscure the statistical uncertainty associated with the estimated CPI—these scores can be noisy enough that changes in scores, even those that seem large, may not be statistically meaningful according to the conventional tests. Although TI always calculates statistical confidence intervals, in prior years these intervals have been buried in hard-to-find Excel spreadsheets, and the changes in CPI scores that TI highlights in its annual press releases haven’t always been statistically significant by TI’s own calculations. In an earlier post, I suggested that at the very least, TI should provide an easy-to-find, easy-to-read table assessing which changes in country scores are statistically significant at conventional levels, preferably over a 4-year period (as 1-year changes are both harder to detect if trends are gradual, and less interesting).

Apparently some folks within TI were thinking along similar lines, and I was pleased to see that in the 2017 CPI includes a reasonably prominent link to a spreadsheet showing those countries for which the 2017 CPI score showed a “statistically significant difference” from that country’s CPI score in each of five comparison years (2012, 2013, 2014, 2015, and 2016).

I’ve still got some criticisms and concerns, which—in the spirit of constructive engagement—I’ll turn to in just a moment. But before getting to that, let me pause to note my admiration for TI as an organization, and in this case its research department in particular, for constantly working to improve both the CPI itself and how it is presented and interpreted. It’s easy for folks like me to criticize—and I’ll continue to do so, in the interests of pushing for further improvements—but it’s much more challenging to absorb the raft of criticisms from so many quarters, sift through them, and invest the necessary time and resources to adapt and adjust from year to year. So, in case any folks at TI are reading this, let me first acknowledge and express my appreciation for how much work (often thankless) goes into the creation and continued improvement of this valuable tool.

Having said that, let me now proceed to raising some comments, questions, and concerns about TI’s claims about countries that appear to have experienced statistically meaningful changes in their CPI scores over the last five years. Continue reading

The Bayesian Corruption Index: A New and Improved Method for Aggregating Corruption Perceptions

As most readers of this blog are likely aware, two of the most widely used measures of corruption perceptions—Transparency International’s Corruption Perceptions Index (CPI) and the Worldwide Governance Indicators (WGI) corruption index—are composite indicators that combine perceived corruption ratings from a range of different sources (including private rating agencies, NGOs, international development banks, and surveys of firms and households). The CPI takes a simple average of the available sources for each country; the WGI uses a somewhat fancier “unobserved component model” (UCM) which assumes that each source’s score is a noisy signal of the “true” level of perceived corruption; the UCM differs from a simple average in a few ways, perhaps most notably by giving less weight to “outlier” sources, though in practice the WGI and CPI are highly correlated, and the WGI’s creators report that the results for the WGI turn out not to change very much if one takes a simple average rather than using the WGI.

These composite indicators have a number of well-known problems, which I won’t bother going into here. Rather, the main purpose of this post is to introduce readers to an alternative index, developed by Samuel Standaert at Ghent University, which he calls the “Bayesian Corruption Index” (BCI). Standaert introduced the BCI in a 2015 article, but so far as I can tell it has not attracted much attention. The BCI certainly doesn’t solve all the problems of the traditional aggregated corruption perceptions indicators (more on this below), but it’s definitely an improvement, and deserves wider use. Let me first say a bit about how the BCI differs from the WGI, why I think it’s an advance over the WGI and CPI, and what some of its limitations are. Continue reading

The 2016 CPI and the Value of Corruption Perceptions

Last month, Transparency International released its annual Corruption Perceptions Index (CPI). As usual, the release of the CPI has generated widespread discussion and analysis. Previous GAB posts have discussed many of the benefits and challenges of the CPI, with particular attention to the validity of the measurement and the flagrant misreporting of its results. The release of this year’s CPI, and all the media attention it has received, provides an occasion to revisit important questions about how the CPI should and should not be used by researchers, policymakers, and others.

As past posts have discussed, it’s a mistake to focus on the change in each country’s CPI score from the previous year. These changes are often due to changes in the sources used to calculate the score, and most of these changes are not statistically meaningful. As a quick check, I compared the confidence intervals for the 2015 and 2016 CPIs and found that, for each country included in both years, the confidence intervals overlap. (While this doesn’t rule out the possibility of statistically significant changes for some countries, it suggests that a more rigorous statistical test is required to see if the changes are meaningful.) Moreover, even though a few changes each year usually pass the conventional thresholds for statistical significance, with 176 countries in the data, we should expect some of them to exhibit statistical significance, even if in fact all changes are driven by random error. Nevertheless, international newspapers have already begun analyses that compare annual rankings, with headlines such as “Pakistan’s score improves on Corruption Perception Index 2016” from The News International, and “Demonetisation effect? Corruption index ranking improves but a long way to go” from the Hidustan Times. Alas, Transparency International sometimes seems to encourage this style of reporting, both by showing the CPI annual results in a table, and with language such as “more countries declined than improved in this year’s results.” After all, “no change” is no headline.

Although certain uses of the CPI are inappropriate, such as comparing each country’s movement from one year to the next, this does not mean that the CPI is not useful. Indeed, some critics have the unfortunate tendency to dismiss the CPI out of hand, often emphasizing that corruption perceptions are not the same as corruption reality. That is certainly true—TI goes out of its way to emphasize this point with each release of a new CPI— but there are at least two reasons why measuring corruption perceptions is valuable: Continue reading