Research Sheds Light on How AI Predicts Compound Potency in Drug Development

Published on: 

The immense volume of available chemical compounds creates a significant challenge in the drug discovery process.

A new analysis provides insight on the underlying mechanisms that power the use of artificial intelligence (AI) in pharmaceutical research. The study, published by Nature Machine Intelligence, noted prior confusion in how AI reaches its conclusions and sought to develop a method that shows how some AI applications operate.1

They found that AI primarily remembers known data and sparsely uses learned information from specific chemical interactions in the prediction of drug potency. The study authors note that predicting the potency of compounds in drug design is a popular machine learning application.

“Graph neural networks (GNNs) predict ligand affinity from graph representations of protein–ligand interactions typically extracted from X-ray structures. Despite some promising findings leading to claims that GNNs can learn details of protein–ligand interactions, such predictions are also controversially viewed,” the study authors wrote.2

In determining which drug molecule is most effective, efforts focus on investigating to find efficient active substances that can combat diseases. The compounds frequently attach to enzymes or receptors, causing a specific chain of physiological actions. These actions can also inhibit undesirable responses within the body, including an excessive inflammatory response, according to the study.

The immense volume of available chemical compounds creates a significant challenge in the drug discovery process. As such, investigators incorporate scientific models to evaluate which molecules have the best chance of binding to their respective target protein.

This is where machine learning applications, such as GNNs, come into play. GNN models are trained to show complexes formed between proteins and chemical compounds.

"How GNNs arrive at their predictions is like a black box we can't glimpse into," said Dr. Jürgen Bajorath, a researcher from the LIMES Institute at the University of Bonn, the Bonn-Aachen International Center for Information Technology (B-IT) and the Lamarr Institute for Machine Learning and Artificial Intelligence in Bonn, in a press release.

These researchers collaborated with investigators from Sapienza University in Rome to evaluate in whether GNNs can actually learn protein-ligand interactions and use these data to accurately estimate how strongly an active substance attaches to its target protein.

Using a specially developed method called “EdgeSHAPer,” investigators screened six different GNN architectures to assess whether the GNNs can learn the most vital interactions among compounds and proteins to predict ligand potency as intended, or whether the AI uses a different method to reach its conclusion.

"The GNNs are very dependent on the data they are trained with," said study first author Andrea Mastropietro, PhD candidate from Sapienza University in Rome, in a press release.


The GNNs were trained with graphs culled from the structures of protein-ligand complexes, with previously established modes of action and compound binding strengths. They then used the EdgeSHAPer analysis to evaluate how the GNNs produced seemingly promising predictions.

They found that most GNNs primarily focused on ligands while learning little about protein-drug interactions.

“To predict the binding strength of a molecule to a target protein, the models mainly 'remembered' chemically similar molecules that they encountered during training and their binding data, regardless of the target protein,” Bajorath said in the release. “These learned chemical similarities then essentially determined the predictions."

The investigators said it’s generally untenable for GNNs to learn chemical interactions between active substances and proteins, with these predictions significantly overrated because, “forecasts of equivalent quality can be made using chemical knowledge and simpler methods.”

Despite this, the researchers note that AI holds promise in some areas of pharmaceutical research, as two GNN models showed a clear tendency to learn more interactions as the potency of test compounds grew. They noted these GNNs may be able to improve in the desired direction via modified representations and training techniques.

"The development of methods for explaining predictions of complex models is an important area of AI research. There are also approaches for other network architectures such as language models that help to better understand how machine learning arrives at its results," Bajorath said in the release.


1. Artificial intelligence: Unexpected results. EurekAlert. News release. November 13, 2023.

2. Mastropietro, A., Pasculli, G. & Bajorath, J. Learning characteristics of graph neural networks predicting protein–ligand affinities. Nat Mach Intell (2023).