Big Data and Anticorruption: A Great Fit

There is no shortage of buzz about Big Data in the anticorruption world. It’s everywhere — from public efforts like Transparency International’s public procurement analysis to cutting-edge private-sector FCPA compliance programs implemented by Ernst & Young. TI has blogged about Big Data and corruption, with titles like “Can Big Data Solve the World’s Problems, Including Corruption?” and “The Potential of Fighting Corruption Through Data Mining.” Ernst & Young’s conclusion is more definite: “Anti-Corruption Compliance Now Requires Big Data Analytics.”

In previous posts, contributors to this blog have written about how the anticorruption community was excited about social media-style apps (“crowdsourcing”) in anticorruption efforts. Apps like iPaidABribe allow citizens to report their encounters with corrupt officials, generating a fertile data set for anticorruption activists. Big Data is a related effort: activists can mine huge amounts of data for patterns that reveal corrupt activity, making it a powerful tool for transparency. However, as the name suggests, Big Data requires massive amounts of data in order to be useful.The anticorruption community should throw its weight behind proposals to open up data sets for Big Data analysis. As with crowdsourced anticorruption efforts, the excitement surrounding Big Data could quickly turn into disappointment unless this tool can be integrated into the broader anticorruption effort.

For those who have somehow been able to avoid being bombarded with prognostications about how Big Data changes everything, I should define the phrase. Big Data is an analytic method that entails the use of huge amounts of data to uncover previously-unknown correlations (perhaps suggesting causal relationships). For instance, Big Data techniques could uncover patterns of fraud and bribery in public procurement by combing through datasets on government bidding processes (e.g. which firms bid on a job, for how much, and who had the winning bid), contracting firms’ financial disclosures, beneficial ownership of contracting firms, public officials’ tax and family records, complaints to the authorities about bribery from competing contractors, and so on. Big Data could find correlations between different data sets that may be meaningful — perhaps when certain firms always bid together one of them always wins, or perhaps certain industries like IT are particularly at risk for corrupt behavior. An illustrative example of work in this vein comes from the Corruption Research Center Budapest, which explains:

As each awarded contract is tracked and represented visually by a tie between public body and private companies, a clear pattern emerges, which links certain state organisations to firms which are repeatedly awarded contracts. These “network ties” are then weighted by the corruption risk of the underlying contracts using the authors’ novel corruption indicator, the Corruption Risk Index (CRI). Unsurprisingly, the relationships which are found to have a high risk of corruption generally involve state bodies and companies who are awarded contracts on an implausibly frequent basis.

(A link to the full research paper describing the methodology can be found here.)

These new projects are similar to Checkbook NYC, which has been around since 2010 and whose motto is “Government Spending at Taxpayer’s Fingertips.” Checkbook NYC is a portal established by the Comptroller of the City of New York to let citizens monitor the city’s $70 billion dollar annual budget using its own internal accounting data. Citizens can download the data and use Big Data analysis techniques to reveal patterns that might indicate corruption — a powerful deterrent. Checkbook NYC is open-source software that other cities can copy and use for themselves.

However, there is a serious impediment to Big Data’s continued success as a corruption-fighting tool. As its name suggests, Big Data requires massive amounts of data in order to be useful. All of the NYC Comptroller’s accounting data, all of Hungary’s public procurement data — getting these datasets out of their bureaucratic silos and onto the world wide web for citizen analysis is not trivial. Activists often face resistance from entrenched players whose interests are not served by the increased transparency that Big Data makes possible. But here is where the anticorruption community can lend its support. Below are a few efforts worth watching and promoting:

  1. Open Contracting. Already described in some detail above, the Open Contracting movement seeks to make it easier for civil society groups to monitor their government’s procurement processes, sometimes using Big Data techniques. While many governments have agreed to make their procurement data public, others are releasing incomplete data sets or not opening up at all. The new Open Contracting Data Standard encourages governments of the world to release their public procurement data in a standardized and streamlined manner, and makes it cheap and easy for them to do so. The logic is something like that of Checkbook NYC’s decision to make its software open source — by making it so easy to open up city spending data, NYC has made it difficult for other cities to argue that it is too expensive or too difficult to open up their own books. Once the data is available, Big Data analytics can be used to expose transactions that are at high risk of corruption — pointing concerned citizens in the right direction.
  2. Open Corporates. Another movement is working to convince governments around the world to open up their corporate registries for public analysis. As it currently stands, corporations registered in many countries keep information about who owns these companies private, contributing to the ability of corrupt actors to enjoy the fruits of their dirty work. As Transparency International’s “Unmask the Corrupt” campaign puts it, “corrupt politicians and businesspeople continue to escape justice and enjoy luxury lifestyles funded by stolen public money, smuggling illicit funds through shadowy secret companies.” A recent New York Times series about property ownership in Manhattan illustrates not only the luxurious lifestyles but also the infuriatingly complex the shield of shell companies that corrupt actors can afford with their citizens’ money. A group called OpenCorporates is hard at work scraping together a database of all beneficial owners of all of the registered companies in the world — the kind of data aggregation that made the New York Times piece possible. Affiliated citizens groups are using Big Data to draw even more scrutiny to certain sectors, such as the Oil and Gas industry through a project called OpenOil that crunches through thousands of databases in real time to show even the least data-literate citizen exactly how oil ventures are structured, revealing surprising and sometimes troubling connections.
  3. Open Aid. International aid flows are notoriously plagued by corruption, so much so that groups like the World Bank employ hundreds of corruption investigators and still fail to prevent corruption in too many of their projects. Investigators’ jobs would be made easier if concerned citizens were monitoring projects alongside them, perhaps using Big Data techniques to keep an eye on the funds. More than 300 aid organizations, including Oxfam and the Catholic Agency for Overseas Development, have put up datasets as part of the International Aid Transparency Initiative for citizens to analyze. As with procurement data, aid data can be used to identify patterns of corruption that would otherwise evade resource-strapped corruption fighters. By analyzing data released by the World Bank about over 600 different projects, a researcher at AidData identified a set of seemingly disparate factors — such as social group affected and geographic area covered — that characterized corruption-prone projects and could be used to guide investments in the future.

These open data initiatives are crucial for the ongoing efficacy of Big Data as an anticorruption tool. Much valuable data that could be used in the fight against corruption is still trapped in the archives of aid agencies and governments, either because they lack the capacity to free it or they lack the desire to be so closely scrutinized. But data is being released, and at an increasing pace, due to the demands of civil society groups that want to add Big Data to their toolkit. This trend is encouraging, and it should continue. Although Big Data analytics and the open data movement seem like cutting-edge technological approaches to fighting corruption, they share the same basic spirit of transparency that has animated anticorruption activists for decades.

9 thoughts on “Big Data and Anticorruption: A Great Fit

  1. Chris, you mention that corrupt governments might not want to release a lot of this data over fear of what it would show. That leads to two natural questions:

    1. Is big data likely to have the biggest effect in reducing corruption in places where it was already fairly low? In a mostly non-corrupt setting, or at least one where those appropriating funds do not want it to be corrupt (like with international aid agencies), it would be in the interests of those with the data to open it up so people could try to find evidence of corruption. But the most corrupt states would have little incentive to release their data.

    2. That in turn begs the question of how to create an incentive. What if eligibility for certain types of aid or investment was tied to the public availability of some of these datasets?

    • You both raise good points about how the use of big data depends upon the availability of that data, which, in turn, depends upon willingness to release it. My related question follows up on what Sarah asked above and concerns your call to integrate big data into wider anticorruption efforts. Even if you can incentivize the release of the data, how can you promote its effective use by enforcers that may lack the will or expertise to operationalize the info? It sounds like you see big data analysis as a tool for procurement, budgetary outlays, and investigations, all of which are largely managed by national authorities. I understand that there is value to transparency in and of itself and that it can be a tool for advocacy groups (maybe you can talk more about this). But the primary benefits – as I see them – require investigators, procurement officials, etc. committed to using the information. Enforcers that lack the will may also lack the capacity to use big data – even with the aid of technology, you still must have specialized knowledge to evaluate the patterns discovered. The World Bank’s program officers, much less its small team of corruption investigators, do not manage procurement for Bank-financed contracts. Country offices do. Thus, the Bank can only make use of big data tools in ex post audits of bid documents once a project has already (allegedly) gone wrong. How might you incentivize and enable the _use_ of big data as a preventative mechanism or a national-level investigative tool?

      • Liz, totally agree. Such the is the challenge of technological tools like Big Data analytics. As you point out, however, I think one of the big benefits of open data sets is the possibility that civil society monitors will use process the data themselves, thus augmenting the capacity of the government/institutional prosecutors and such. For instance, the Corruption Research Center Budapest is a civil society group that identified patterns of corruption that prosecutors can act on — and perhaps will be forced to act on by the very fact that this information is now public. I’m not sure that even I am totally convinced by that argument, but I guess it’s the one that has to be made if we imagine that the government itself is going to drag its feet in using this tool.

        In the opposite scenario, an ambitious and powerful anticorruption fighter will find this tool extremely helpful — the SEC, for instance, now uses Big Data analytics to crunch through hundreds of thousands of investment accounts looking for instances in which the customer is being taken for a ride by their advisors — a move that prompted a flurry of law firm “client alerts” and anxious op eds, and presumably put the kibosh on a lot of improper behavior. But even here I have to say that advisors who want to game the system still have plenty of ways to get around the harsh glare of Big Data analytics… they just adapt.

    • Yeah I think your first point is spot-on, and that’s the trickiest part of this whole Open Government movement — it really has a hard time working its way down to the jurisdictions that could probably use it the most. Notably, however, the UN’s new 15-year goals (I think they are called the Sustainable Development Goals now) include provisions about open data (“measurements”) that might accelerate its adoption in the poorest countries. The incentive structure you mentioned in your second point is exactly what I would like to see more of, and I think international aid is a perfect area to start. More agencies should make compliance with the International Aid Transparency Initiative a precondition to the receipt of aid — it’s pretty cheap and easy to do, but if the aid recipient really doesn’t have the capacity to comply then perhaps the aid package could come with a portion that must be spent on hiring consultants, etc., to come into compliance. http://www.aidtransparency.net/

  2. Good post, Chris. Having been combing through the public contracts database here in Mexico, the contrast between mere ´transparency´ and actually having accessible data that can be manipulated and analyzed has become very apparent to me. It´s a distinction that needs to be made clearly as countries try to figure out how to make transparency work (and to ward off cyncism and fatigue when transparency alone doesn´t seem to generate immediate dividends). To Mexico´s credit, its various reforms have made bid award information and in most cases headline contracts for federal agencies freely available online. But most of the information is in PDFs, and there is litte metadata beside the contract amount, date and awarding agency to crunch. The good news is that governments ought to have independent incentive to reform these systems beyond the perceived integrity gains so that their procurement agents can benefit from information sharing with other buyers — I thought the first chapter of Transparencia Mexicana´s most recent publication, Feigenblatt´s article on ¨Open Government as a Tool for Strategic Procurement,¨ does a good job of arguing for these benefits. (http://www.tm.org.mx/a-new-generation-for-public-control/). Consequently, this is an area where it seems to me that skills training and platforms provided by donors could be both relatively welcome and have the potential to make an immediate difference.

    • Great contribution, thanks Daniel! Yeah this is the right stuff — and I think it even gets at a question that Liz and I were talking about in the above comments, which is why a government agency might be incentivized to begin using these tools. It sounds like from your example that the added efficiency for a procurement agency to adopt such a system might be enough to get a government to built it out, at which point at least one government agency is a stakeholder in the system and perhaps it won’t take long for prosecutors and investigators, especially those within the procurement agency, to start using it for anticorruption purposes. (Although it sounds like we still have a ways to go before the data disclosures are of a decent quality, but I guess that’s what the OCDS is supposed to hurry along).

      • Related to incentives, this is more of a Small Data than a Big Data point, but I wrote last year on the article below, which discusses Indian state e-government implementation that made some interesting points about adoption of open government techniques. In short, elected officials in states with competitive elections strategically avoided adopting e-government access and data tracking for government services that were the most profitable for bribery, because these bribes were their main source of campaign finance. It gives you an idea of the duality of political and personal interests driving decision making: Elected officials may adopt transparency policies that generate institutional performance gains if they perceive that the political dividends of those changes (voters like me!) will outweigh the more direct dividends of the rents that opacity allows. Proponents of the creation and publication of government’s big data need to work to alter this calculus in favor of transparency as well, perhaps through carrots like the aid funding Sarah mentions, as well as through voter awareness of the issue and why it matters.

        Click to access Bussell-EGovernance_and_Corruption_in_the_States.pdf

  3. Pingback: Combating Corruption with Technology | Life-long Learning 2221

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.