The Evolving Role of AI in Shifting the Bottleneck in Early Drug Discovery
Testing in humans in clinical trials and the regulatory approval process itself are candidates for technology solutions where artificial intelligence is playing a role.
The cost of drug discovery—the process of R&D that brings new drugs to patients to address unmet medical needs—has grown exponentially over the past few decades. It takes 10-12
The days of blockbuster drugs that generate more $1 billion dollars in sales per year seem to be receding as pressure on drug pricing increases and personalized medicine, i.e., the better understanding of which patients will benefit most from which drugs is, appropriately, reducing market sizes. There is a continuing need for novel technology to help streamline drug discovery, reduce time and cost, and also, importantly, to accelerate the delivery of new therapies to patients who sometimes desperately need them.
To achieve significant improvement, every step in the drug discovery process needs to be innovated. For the purposes of this article, I’ll focus on the preclinical stages of drug discovery from drug target discovery and validation, and the search for, and optimization of, molecules that can interact with drug targets to modulate the diseases they enable. Testing in humans in clinical trials to demonstrate safety and efficacy, and even the regulatory approval process itself are also candidates for technology solutions where artificial intelligence (AI) can and is playing a role.
Where data are available, AI has a huge role to play here. In the past 20 years, development of high throughput biology techniques have created large data sets of biological information through geonomics, proteomics, metabolomics etc., (AKA the multi “omics”) that although heralded as a breakthrough at the time, in fact, generated a glut of information that was almost impossible to decipher in a systematic way. Then modern AI came onto the stage, and with billions of investment dollars, has started to accelerate our ability to collapse the dimensionality of multi-omics into understandable correlations that
But that creates a new bottleneck: AI-discovered drug targets are at this point largely identified as long strings of symbols that describe the sequence of building blocks that make up proteins (that is “letters” that define their nucleic acid bases and corresponding amino acid sequences). Modern drug discovery demands that we know how those long strings fold to create three dimensional structures that mediate the function of those drug targets, and how best to design drugs to interact with them.
Physical experimental techniques to determine structure are slow and expensive. X-ray crystallography, NMR, and more recently cryo-EM are the gold standards to determine structure but with an average timeline of six months and $50-$250K+ to determine just one structure at a time. However, they have been around for a long time and progressive structure determination by these physical methods combined with careful forethought in curating publicly accessible databases ultimately has created a
Enter AI again. AI-based structure prediction has progressed rapidly over the past few years with incredible results shown with
Now we have the structures of the drug target proteins—either physically and/or virtually—the next step is to find chemical compounds that bind to them and modulate the diseases states they cause. High throughput screening of large compound libraries using expensive automation that would fill a large lab and chemical combinatorial libraries of ~2 million compounds were considered a breakthrough just before the turn of the century and have since evolved to
Even so, this is still only screening a very small portion of the
Notice I have not been calling these compounds “drugs” or “drug candidates.” The search for compounds that bind drug targets is just the beginning of the next stage in drug discovery where these compound “hits” have to be synthesized and validated in multiple biological assays in a lab, and then begins a long and expensive process of optimizing these hits into leads and ultimately drug candidates ready for preclinical and human testing.
Binding to the drug target is just one of 15–20 chemical parameters that need to be optimized in a final drug candidate (e.g., potency, selectivity, solubility, permeability, toxicity, pharmacokinetics, etc.). This process generally takes 3–5 years (~26% of the development timeline of the full drug discovery process mentioned above).
It’s also the starting point of significant failure in the process of drug discovery where repetitive design-make-test-analyze (DMTA) cycles produce compounds that have one or more of many potential fatal flaws that may ultimately result in not being able to identify a drug candidate at all. This happens in
The iterative DMTA cycle in drug discovery is still very much a manual process of synthesis and testing and, unlike all of the steps described earlier, does not have access to large, well-curated publicly available databases to build AI models on. This is due to a mix of reasons ranging from data confidentiality to historic lack of consistency in how manual synthesis and testing is conducted and reported, leading to a lack of reproducibility. Lastly, these data points are expensive to produce by current physical methods.
So now we are at the next major bottleneck in the discovery process. AI isn’t positioned well here as in the previous bottlenecks highlighted because the data aren’t readily available, and generative AI techniques are challenged by the sheer size of chemical diversity that needs to be searched without good starting points.
What is the future?
For AI to play a significant role in breaking the molecular discovery bottleneck, it will have to be combined with new ways of rapidly generating smaller, highly accurate and supervised data sets. The “
At its most basic, this requires three core integrated components:
- AI tools that can make accurate prediction on small data sets.
- Chemistry automation to make designed compounds.
- Biology automation systems to test the synthesized compounds.
So where are we with these components? Novel tools that use interactive human knowledge as “rules” in
Robust and reliable automated chemistry has however lagged due to the various material states of chemicals and reactions (e.g., liquids, solids, gases, viscous gels, harsh corrosives, hazardous reagents etc.). General automated synthesis platforms are only now beginning to emerge and offer the potential of greatly expediting chemical synthesis over the classically labor-intensive manual methods that are used today.
High throughput experimentation (
Parallel advances in computational chemistry may well help this process along. Increasing access to high performance computing is enabling
Another area in which AI will have impact on the bottleneck of molecular discovery is in the application of large language models (LLMs). An AI-driven automated lab will require extensive experimental planning from synthetic design of target molecules to designing and implementing automated procedures to making and testing them.
This requires a significant amount of coordinated expertise in AI, computational chemistry and informatics, medicinal chemistry, and biology working together in rapid iterative sprints toward a drug candidate. While LLMs such as
In a similar way to interacting with ChatGTP, a scientist would be able to ask an integrated LCM/AI/lab automation platform: “I want a drug candidate that binds to target A with potency X but does not bind to target B which would cause toxicity.” The LCM then designs and implements the complex set of synthesis and testing experiments that will attempt to answer that request, or at least take an iterative step in that direction and make a recommendation as what to do next.
Given the current excitement around LLMs and the rate at which this technology is progressing, together with a new future of AI and automation-driven research, one could envision in the relatively near future molecular discovery being accessible to a much wider audience of molecular “designers” asking increasingly challenging questions. Highly interactive AI and data generations tools like these will exponentially leverage human ingenuity.
Imagine how that would drive innovation in the future.
About the Author
Nathan Collins is the cofounder and head of Strategic Alliances and Development at
Newsletter
Stay current in clinical research with Applied Clinical Trials, providing expert insights, regulatory updates, and practical strategies for successful clinical trial design and execution.
Related Articles
- How the NIMBLE Study Supported Adherence With Quarterly Dosing of Cemdisiran
September 18th 2025
- Everything to Know About FDA’s Push Towards Radical Transparency in 2025
September 17th 2025