Guest Post–Assessing Corruption with Big Data

Today’s guest post is from Enestor Dos Santos, principal economist at BBVA Research.

Ascertaining the actual level of corruption is not easy, given that it is usually a clandestine activity, and much of the available data is not comparable across countries or across time. Survey data on corruption experience can be helpful, but it is often limited to very specific kinds of corruption (such as petty bribery). Researchers and analysts have therefore, quite reasonably, tended to rely on subjective corruption perception data, such as Transparency International’s well-known Corruption Perceptions Index (CPI). (The CPI aggregates corruption perception data from a variety of other sources, mostly expert assessments.) But conventional corruption perception measures (including those use to construct the CPI) have well-known problems, including limited coverage (with respect to both years and countries) and relatively low frequency (usually annual). And they rely on the perceptions of a handful of experts, which may not necessarily be representative. These limitations mean that while traditional perception measures like the CPI may be useful for some purposes, they are not as helpful for others, such as measuring the impact of individual events or news reports on corruption perceptions, or how changes in corruption perceptions affect government approval ratings.

To address these concerns, a recent study by BBVA Research, entitled Assessing Corruption with Big Data, offered an alternative, complementary type of corruption perceptions measure, based on Google web searches about corruption. To construct this index, we examined all web searches classified by Google Trends in the “Law and Government” category for individual countries, and calculated the proportion of those searches that contain the word “corruption” (in any language and including its misspellings and synonyms). Our index, which begins in 2004, covers more than 190 countries and, unlike traditional corruption indicators, is available in real-time and with high-frequency (monthly). Moreover, it can be reproduced very easily and at very low cost.

Here are some of our main findings: Continue reading