There is no shortage of buzz about Big Data in the anticorruption world. It’s everywhere — from public efforts like Transparency International’s public procurement analysis to cutting-edge private-sector FCPA compliance programs implemented by Ernst & Young. TI has blogged about Big Data and corruption, with titles like “Can Big Data Solve the World’s Problems, Including Corruption?” and “The Potential of Fighting Corruption Through Data Mining.” Ernst & Young’s conclusion is more definite: “Anti-Corruption Compliance Now Requires Big Data Analytics.”
In previous posts, contributors to this blog have written about how the anticorruption community was excited about social media-style apps (“crowdsourcing”) in anticorruption efforts. Apps like iPaidABribe allow citizens to report their encounters with corrupt officials, generating a fertile data set for anticorruption activists. Big Data is a related effort: activists can mine huge amounts of data for patterns that reveal corrupt activity, making it a powerful tool for transparency. However, as the name suggests, Big Data requires massive amounts of data in order to be useful.The anticorruption community should throw its weight behind proposals to open up data sets for Big Data analysis. As with crowdsourced anticorruption efforts, the excitement surrounding Big Data could quickly turn into disappointment unless this tool can be integrated into the broader anticorruption effort.
For those who have somehow been able to avoid being bombarded with prognostications about how Big Data changes everything, I should define the phrase. Big Data is an analytic method that entails the use of huge amounts of data to uncover previously-unknown correlations (perhaps suggesting causal relationships). For instance, Big Data techniques could uncover patterns of fraud and bribery in public procurement by combing through datasets on government bidding processes (e.g. which firms bid on a job, for how much, and who had the winning bid), contracting firms’ financial disclosures, beneficial ownership of contracting firms, public officials’ tax and family records, complaints to the authorities about bribery from competing contractors, and so on. Big Data could find correlations between different data sets that may be meaningful — perhaps when certain firms always bid together one of them always wins, or perhaps certain industries like IT are particularly at risk for corrupt behavior. An illustrative example of work in this vein comes from the Corruption Research Center Budapest, which explains:
As each awarded contract is tracked and represented visually by a tie between public body and private companies, a clear pattern emerges, which links certain state organisations to firms which are repeatedly awarded contracts. These “network ties” are then weighted by the corruption risk of the underlying contracts using the authors’ novel corruption indicator, the Corruption Risk Index (CRI). Unsurprisingly, the relationships which are found to have a high risk of corruption generally involve state bodies and companies who are awarded contracts on an implausibly frequent basis.
(A link to the full research paper describing the methodology can be found here.)
These new projects are similar to Checkbook NYC, which has been around since 2010 and whose motto is “Government Spending at Taxpayer’s Fingertips.” Checkbook NYC is a portal established by the Comptroller of the City of New York to let citizens monitor the city’s $70 billion dollar annual budget using its own internal accounting data. Citizens can download the data and use Big Data analysis techniques to reveal patterns that might indicate corruption — a powerful deterrent. Checkbook NYC is open-source software that other cities can copy and use for themselves.
However, there is a serious impediment to Big Data’s continued success as a corruption-fighting tool. As its name suggests, Big Data requires massive amounts of data in order to be useful. All of the NYC Comptroller’s accounting data, all of Hungary’s public procurement data — getting these datasets out of their bureaucratic silos and onto the world wide web for citizen analysis is not trivial. Activists often face resistance from entrenched players whose interests are not served by the increased transparency that Big Data makes possible. But here is where the anticorruption community can lend its support. Below are a few efforts worth watching and promoting:
- Open Contracting. Already described in some detail above, the Open Contracting movement seeks to make it easier for civil society groups to monitor their government’s procurement processes, sometimes using Big Data techniques. While many governments have agreed to make their procurement data public, others are releasing incomplete data sets or not opening up at all. The new Open Contracting Data Standard encourages governments of the world to release their public procurement data in a standardized and streamlined manner, and makes it cheap and easy for them to do so. The logic is something like that of Checkbook NYC’s decision to make its software open source — by making it so easy to open up city spending data, NYC has made it difficult for other cities to argue that it is too expensive or too difficult to open up their own books. Once the data is available, Big Data analytics can be used to expose transactions that are at high risk of corruption — pointing concerned citizens in the right direction.
- Open Corporates. Another movement is working to convince governments around the world to open up their corporate registries for public analysis. As it currently stands, corporations registered in many countries keep information about who owns these companies private, contributing to the ability of corrupt actors to enjoy the fruits of their dirty work. As Transparency International’s “Unmask the Corrupt” campaign puts it, “corrupt politicians and businesspeople continue to escape justice and enjoy luxury lifestyles funded by stolen public money, smuggling illicit funds through shadowy secret companies.” A recent New York Times series about property ownership in Manhattan illustrates not only the luxurious lifestyles but also the infuriatingly complex the shield of shell companies that corrupt actors can afford with their citizens’ money. A group called OpenCorporates is hard at work scraping together a database of all beneficial owners of all of the registered companies in the world — the kind of data aggregation that made the New York Times piece possible. Affiliated citizens groups are using Big Data to draw even more scrutiny to certain sectors, such as the Oil and Gas industry through a project called OpenOil that crunches through thousands of databases in real time to show even the least data-literate citizen exactly how oil ventures are structured, revealing surprising and sometimes troubling connections.
- Open Aid. International aid flows are notoriously plagued by corruption, so much so that groups like the World Bank employ hundreds of corruption investigators and still fail to prevent corruption in too many of their projects. Investigators’ jobs would be made easier if concerned citizens were monitoring projects alongside them, perhaps using Big Data techniques to keep an eye on the funds. More than 300 aid organizations, including Oxfam and the Catholic Agency for Overseas Development, have put up datasets as part of the International Aid Transparency Initiative for citizens to analyze. As with procurement data, aid data can be used to identify patterns of corruption that would otherwise evade resource-strapped corruption fighters. By analyzing data released by the World Bank about over 600 different projects, a researcher at AidData identified a set of seemingly disparate factors — such as social group affected and geographic area covered — that characterized corruption-prone projects and could be used to guide investments in the future.
These open data initiatives are crucial for the ongoing efficacy of Big Data as an anticorruption tool. Much valuable data that could be used in the fight against corruption is still trapped in the archives of aid agencies and governments, either because they lack the capacity to free it or they lack the desire to be so closely scrutinized. But data is being released, and at an increasing pace, due to the demands of civil society groups that want to add Big Data to their toolkit. This trend is encouraging, and it should continue. Although Big Data analytics and the open data movement seem like cutting-edge technological approaches to fighting corruption, they share the same basic spirit of transparency that has animated anticorruption activists for decades.