Breakthrough in the Use of Artificial Intelligence to Fight Corruption

Whatever peril or promise the future of artificial intelligence holds, Brazilian, Colombian, and Italian researchers show it is a powerful tool for targeting corruption investigations.

Each year Colombia and Italy let thousands of contracts for goods, services, and public works, and each year some percentage is awarded thanks to bribery, conflict of interest, or other corrupt behavior. Each year Brazil’s central government transfers millions of dollars to the countries’ 5,500 plus municipal governments, and each year employees of some governments steal a portion.

Corruption is discovered through audits or whistleblowing, but a significant percentage goes undetected. The work done in Brazil, Colombia, and Italy shows how AI helps governments to deploy their investigative resources to boost the odds of finding a much larger percentage.

The city of Boston faced an analogous problem. Some of its 3,000 plus restaurants are violating the city’s hygiene and sanitation rules, and each year health inspectors nail a few. But because inspectors do not have any way to target the most likely violators, most inspections are for naught; most restaurants pass inspection with flying colors. Like the problem of finding corrupt contracts or corrupt municipal governments, Boston needed a way to sort through the 3,000 eateries to determine which ones were likely unsanitary.

This is the kind of prediction problem where AI shines. To know where to send investigators, Boston turned to Supervised Machine Learning, a form of AI. Analysts fed (unavoidable pun) a computer information on the characteristics of all city restaurants: location, size, revenues, and so forth along with any previous citations for health code violations. Researchers ran different SML programs, tweaking each to see which best predicted the likelihood the restaurant had been cited for violations. Once they found a program good at predicting inspection results on the historical data, the health department began using restaurant characteristics to predict those likely to fail a future inspection and thus those inspectors should prioritize.

The researchers’ tweaks in effect “taught” the machine to make better predictions, hence the “supervision” in the expression “supervised machine learning.” The details on Boston’s experience are here. As with other areas where accurate predictions improve outcomes by optimizing resources (here), early analysis showed SML would help Boston inspectors catch a far higher percentage of unsanitary restaurants and waste less time inspecting clean ones (here).

The hurdle with SML is always data. Boston had information on 34,879 previous restaurant inspections. Not only the restaurant characteristics, but more importantly the inspection’s outcome. Did it pass or fail? And if it failed, the number of violations and how serious they were.

While Bostan officials used SML to direct inspectors to restaurants likely to be violating the city’s health regulations, Colombian and Italian researchers employed SML to point investigators to contracts likely infected by corruption. They had information on public contracts comparable to that Boston had on restaurant characteristics: what companies bid on the contract, the winner, who else bid and what they bid, whether the contract the like.

What they did not have was information equivalent to health code violations. Comparable data would be contracts where there was a conviction for corruption arising from the contract, but as the Italian researchers lamented, there were simply too few convictions for that data to be useful. Colombian authors faced a similar scarcity of conviction data.

As a substitute, the Italian authors obtained non-public information listing every company or individual who had been investigated for a corruption-related crime (data details here). The Colombian researchers used two similar measures:

i) whether a company had ever been fined by the agency that oversees public contracting (the Controlaría General de la República), in these cases the fine was almost certainly imposed as a result of a finding of some form of malfeasance if not corruption; and

ii) whether a company had ever breached a contract with a municipality, including a range of deficient performance such as the use of substandard material in construction or unexcused late performance.

These outcome indicators did not show whether the contract was corruptly awarded. Rather, the criminal history and Controlaría audit data showed whether a company with a history of corruption or malfeasance won the contract; the Colombians’ past contract breach showed whether the company failed to fulfill the terms of the contract. Conviction data would have “taught” the machine to predict those contracts likely awarded corruptly. What the Colombian and Italian machines “learned” instead was how to predict if a company with a questionable background won the contract. The breach indicator taught it whether the winning company was likely to perform poorly.

The use of SML results is a significant improvement over randomly choosing contracts to audit. Its utility should, as the Italian authors argue, spur authorities to invest in developing better information on outcomes to enhance its predictive power. A first step would be to ensure any court or administrative agency finding corruption in a contract is included in the database.

Reassuringly, the variables that best predict the winning bidder will have a criminal background accord with conventional statistical analysis finding a link between them and corruption (here and here). These analyses show that the more discretion the procuring agency enjoys in deciding the winner, the more likely the decision will be corrupt. SML predicts that greater discretion increases the chances a firm with a proclivity for corruption will win a tender. Thus, in Italy the use of criteria besides prices to select the winner (MEAT, most economically advantageous tender) is an excellent predictor. In Colombian it is bids let through sole sourcing.

The Colombian authors taught the machine to predict a third outcome as well: whether the final price government paid exceeded the winning bid price or whether the time for performing the contract was extended. Neither was reported in the contract database, but both could easily be calculated from the data that was. This outcome is a measure of problematic rather than corrupt contracts. Governments rarely have sufficient staff to adequately oversee the performance of construction, IT, and other contracts; all take months if not years to perform and can go awry in a thousand ways. The value of knowing which ones are likely to run into trouble and thus where to best deploy oversight resources cannot be overstated.

Both the Colombian and Italian authors warn that machines are perhaps not the only ones learning from the use of SML in procurement. Corrupt officials and their accomplices may be learning too. Seeing what predictors governments use to decide to which contracts to pay close attention, they may modify their behavior accordingly. The Italian authors speculate because it is well-known that the authorities pay special attention to contracts awarded under emergency circumstances, Italian crooks do not bid on them. Confirming that suspicion, the authors report the use of emergency procedures is of no value in predicting the chance a firm with a criminal background wins the contract. *

While the Colombian and Italian authors used SML to predict what public contracts merited a closer look, the Brazilian authors used SML to predict which municipalities were likely to steal money the central government transferred to them for health, education, and other social services. The predictors were data on some 150 characteristics of the municipalities including information on the private sector and measures of financial development, human capital, local politics, public spending, and natural resources’ dependency. The authors were fortunate to have a useful, telling outcome measure: the results from 1800 plus audits of municipal governments detailing “irregularities” in their handling of public funds.

The best predictor was private sector competition. Where there were more firms in the area and the share of the relevant market more equally spread, it was less likely the municipal government filched central government monies. Where construction services accounted for a significant percentage of private sector activity, the opposite was the case. More construction predicted higher levels of theft. Surprisingly, the size and composition of the public sector, local politics, public spending, and natural resources’ dependency all had low predictive power.

There are portions of the three papers those with limited quantitative skills can make out, and all deserve a close read by those concerned with procurement corruption. The sections on LASSO and RIDGE models, NDCG (normalized discounted cumulative gain) and other technical details of the different SMK models are slow going despite helpful explanatory asides in all three papers. An area where a bit more explanation would be welcome is in interpretating the numbers showing which inputs are the best predictor. But in fairness, the authors are reporting important research findings to an audience of data scientists, statisticians and econometricians, and computational engineers. Not corruption fighters or policymakers.

The hope is this post will prompt more efforts not only to bring the authors’ findings to a larger audience but to spur others to build on their important work. It represents a major step forward in the fight against corruption.

The authors and citations to their papers follow:

Brazil: Colonnelli, Emmanuel, Jorge Gallego, and Mounu Prem. “What Predicts Corruption,” chapter 16 in The Economics of Crime, Elgar, 2022, pp. 345 -373; earlier version at https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3330651

Colombia: Jorge Gallego, Gonzalo Rivero, and Juan Martínez. “Preventing Rather than Punishing: An Early Warning Model of Malfeasance in Public Procurement,” International Journal of Forecasting vol. 27, 2021, https://www.sciencedirect.com/science/article/pii/S0169207020300935.

Italy: Decarolis, Francesco and Christina Giorgiantonio. “Corruption Red Flags in Public Procurement: New Evidence from Italian Calls for Tenders,” EPJ Data Science vol. 11, 2022, https://epjdatascience.springeropen.com/articles/10.1140/epjds/s13688-022-00325-x

——-

*This appears to be the case with bid rigging cartels. For years simple analysis of the bids revealed patterns suggesting collusion. When the cartels learned about the analysis, they scrambled the bid prices to hide their tracks, forcing analysts responded to devise more complex analysis (here). In the battle against procurement cartels, SML is also showing promising results (examples here and here). More to come in a future post.

6 thoughts on “Breakthrough in the Use of Artificial Intelligence to Fight Corruption

  1. I’m not sure if what is described really requires AI, that looks just like risk-based approach. Even before ChatGPT, e.g., ProZorro in Ukraine used risk-based automatic indication of high-risk procurements

  2. Thanks for the comment.

    I did not know about the ProZorro screen. How many red flags does it use? How many contracts does the analysis show are risky? A long list or a manageable number? And if a long list, does the program rank them in terms of how risky there are?

    The problem I have seen with the red flag analyses I have reviewed is the list of red flags is often long and produces many, many problematic contracts which the analyses don’t prioritize them. Granted it narrow the list where an audit would likely pay off. But how well? This is where I think SML has the upper-hand. Especially as more and better violation data becomes available.

    Ferwerda, Deleanu, and Unger’s best-fit econometric model needed only eight of the 32 red flags they tested to explain corruption relatively well. They report a pseudo R2 of 0.4 for them. ProZorro results able to identify the important red flags?

    Again, thanks for weighing in. Sure GAB readers would welcome more info on ProZorro’s method (hint, hint).

  3. That’s an old post, but still worth clearing that:

    • A Corruption Risk Indicator (CRI) by Fazekas and all has been in use more than ten years now, where red flags just add up so the highest the score, the highest the risk. The model has been promoted in many countries, as it won IMF, World Bank competitions and EBRD trained the GPA countries on it.
    • ProZorro was also inspired by this, when it was pioneered by Fazekas in Digiwhist, a research project.
    • Yes, it’s an algorithm, but I would stop short of calling it AI.
    • The case for prevention is especially strong if the monitoring is linked with performance management and each contracting authority is judged by it. Still, this never happens- not even in proZorro. No contracting agent was ever fired for handing over public contracts on the basis of single bidding, , though a public manager could very well set competitive procurement as a performance indicator and implement it.
    • And this comes to the last conclusion, that what we miss are governments which really want to do it, not algorisms- people should trust science that once political will exists we will provide the tools.
    • Thanks for the comment. I am afraid it shows how poorly I did in trying to explain what the authors were doing with AI rather than the shortcomings of the technique.

      You are right to single out Fazekas’ work on corruption risk indicators and public procurement. It is clearly a step forward.  When choosing which among a large number of procurements should be audited, counting the number of red flags beats selecting one or more at random. But adding up red flags is only a first step. As Kenny and Musatova explained in their World Bank Policy Research Working Paper, “the ubiquity and apparent randomness” of red flags in Bank procurements . . . “suggests that their roll-out as a monitoring tool requires additional thought as to interpretation, context and use.”

      Context is everything with procurement data. Take one of the key red flags, the number of firms submitting a bid. If an auditor must choose between one of two procurements to examine and all she knows about them is that one drew several bids and the other only one, selecting the procurement with one bidder makes sense.  After all, a common form of procurement corruption is to draft the tender in a way that excludes all but one company from the bidding.

      But the auditor will always know more than the number of bids the tender drew. She will know the industry or sector is involved. Suppose the tender that drew the one bid was for a patented pharmaceutical while the one that drew several bids was for road rehabilitation. It would make sense in this case to investigate the road contract. Bid rigging is rampant in the roads sector of many countries (and U.S. states I should add) so the number of bidders is meaningless. At the same time, pharmaceutical manufacturers often enter into country-wide exclusive distribution agreements making the lack of more than one bid readily explainable.

      The example is of course oversimplified. In most procurements there will likely be several red flags. The time for submitting bids may have been shortened, the time between bid opening and award unusually long, and so forth. When all have multiple red flags, how does our auditor choose which procurement or procurements to examine? As I understand it, the contribution of Fazekas and colleagues was to focus on numbers. The more red flags, the more likely the procurement was corruptly awarded.

      In their 2016 paper, Ferwerda and colleagues go a step further. They show that not all red flags should be weighted equally. (“Corruption in Public Procurement: Finding the Right Indicators” available through ResearchGate). Using an ordered probit model, they report that eight explain corruption relatively well (a pseudo R2 of 0.4).

      AI takes this analysis a step further. An ordered probit model is limited to finding linear relationships among the red flags. It also cannot account for interactions among the red flags. AI can do both. Doesn’t this make it something more that an “algorithm?”

      You are right that procurement staff should not be sanctioned simply because one of more of the procurements they were responsible for had several red flags. Likewise sanctioning procurement staff for letting a tender that AI predicts was corruptly let is a step too far. But just as Boston uses AI to decide which restaurants to inspect, AI should be used to determine which procurements should be audited.

Leave a reply to Grigory Mashanov Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.