Improving Anti-Money Laundering Models with Synthetic Data

As readers of this blog are well aware, an effective anti-money laundering (AML) regime is crucial for fighting grand corruption, as well as other organized criminal activity. A key part of the AML system is the requirement that banks and other financial institutions identify suspicious transactions and file so-called suspicious activity reports (SARs) with the appropriate government agencies. This is an enormous task, given the volume of financial transactions that banks need to monitor and the challenge of identifying which of those transactions ought to be considered suspicious. Banks spend billions on AML compliance every year, and have developed complex automated systems to assist them in flagging suspect transactions, but existing systems’ ability to efficiently sort suspicious from innocent transactions is limited by the sheer complexity of the task. (False positive rates with current systems, for example, frequently top 90%.)

Many believe that artificial intelligence (AI) systems, such as those employing machine learning (ML), hold enormous promise for improving AML compliance and reducing cost. ML algorithms scrutinize vast datasets to identify patterns that can be used to fashion predictive models. In the AML context, ML algorithms identify those transaction characteristics (or complex combinations of transaction characteristics) that are associated with money laundering, and use these patterns to more efficiently and effectively identify suspicious transactions.  

But some commentators have suggested reasons for skepticism, or at least caution. For example, Mayze Teitler recently wrote on this blog about a number of challenges to operationalizing AI-derived algorithms in the AML context, primarily those arising from limitations in the data on which those algorithms are based. As Mayze correctly pointed out, ML algorithms require vast datasets from which to learn, and the data demands are compounded by the relatively rarity of known money laundering cases in the existing datasets.

Despite these concerns, I am more bullish than Mayze regarding the promise of AI-based AML systems. Many of the challenges and concerns regarding the development of effective AI systems in the AML context can be overcome through the use of synthetic data.

Continue reading