Using AI & Machine Learning to Better Understand Data and Manage Risk


Applied Clinical Trials

Basheer Hawwash, Principal Data Scientist at Remarque Systems Inc., writes of the potential artificial intelligence, more specifically machine learning, has to transform clinical trials.

Against a backdrop of stricter regulatory standards and increased emphasis on trial oversight and patient safety, the clinical development landscape is more competitive than ever. Clinical trials have also increased in complexity, driven in large part by the shift toward biomarker-guided drug development and value-based outcomes. With about 7,000 medicines in development globally, nearly three-quarters of which have the potential to be first-in-class treatments,[1]we are living in an era of medical innovation that is on the verge of technological disruption. 

Artificial intelligence (AI), and more specifically machine learning, has the potential to transform clinical trials - and healthcare in general - by deriving critical new insights from the vast amount of data generated during the course of healthcare delivery.[2]In January 2019, then-FDA Commissioner Dr. Scott Gottlieb stressed the importance of modernizing the clinical trial process by pairing real-world data with advances in machine learning, stating that,“new approaches and new technologies can help expand the sources of evidence that we use to make more reliable treatment decisions.[3]

Recruitment and medication adherence tracking are two areas of clinical trial management where artificial intelligence solutions have made headway.[4]Risk-based monitoring (RBM) is another area where machine learning is already transforming data into insights and evidence that can be used to guide decision-making. In an environment where the volume, velocity, and variety of study data generated are increasing exponentially,  machine learning algorithms are poised to become indispensable tools for safer, more efficient clinical trial management. And yet, adoption of machine learning-driven RBM technologies has been slow. 

What are artificial intelligence and machine learning?

Artificial intelligence is broadly defined as the science and engineering of making intelligent machines. AI can use utilize different techniques, including statistical models, systems that rely on if-then statements, and machine learning to name a few.2

Machine learning is an AI technique that can be used to design and train software algorithms to learn from and act on data.2This may involve searching data for trends, patterns, and anomalies which might not be obvious to a human observer. 

IncorporatingAI and machine learning in RBM

The overarching objective of RBM is to focus trial oversight on preventing or mitigating risks to both data quality and processes to ensure patient safety and trial integrity. An essential feature of RBM is that it is dynamic-as data is collected and analyzed on an ongoing basis, monitoring findings and new insights can be used to facilitate continual improvement in both study conduct and trial oversight. 

In some RBM technologies, the application of AI through machine learning is being used to support data processing, analytics, visualization, and decision-making. With increasing data volume and integration of data from multiple sources, machine learning is becoming critical for finding the right data at the right time, and providing actionable insights to sponsors, CROs, and clinical monitors. 

Automated,not ‘autonomous’

While the use of machine learning in RBM technologies creates an opportunity for researchers to process, analyze, and visualize more data than ever before, it is not intended to replace human resources. RBM technologies powered by machine learning are only as effective as their algorithms, and their outputs still need to be interpreted and contextualized by appropriate trial management staff. In other words, “automated” is not synonymous with “autonomous.” Machine learning is a complement to, not a substitute for, human judgment. In fact, machine learning algorithms “learn” from their users by observing their interaction with the results and gathering feedback. This will further help improve the algorithm’s accuracy and effectiveness. This is especially important for clinical trials on rare diseases, where historical data is lacking.

Leveraging the technology for quality risk management

For sponsors, CROs, and clinical monitors who are using an RBM software with built-in machine learning, here are five key factors to consider during implementation and execution of this AI-driven technology:

1.    Begin with good training

At a fundamental level, the linchpin of a successful machine learning outcome is training. Just as a spam filter must be trained to recognize legitimate email from junk mail and voice recognition software must listen to countless hours of dialogue to parse speech with accuracy, RBM software machine learning algorithms must be trained with enough data to do their work. Keep in mind that it’s not just volume but also quality of data that matters. When training an algorithm, it’s important to train it with the right data and then verify that the results are valid.

Machine learning-based data libraries focused on particular drug classes, therapeutic areas, and disease states can be used not only for algorithm training, but also for bolstering the data against which clinical trial results are being compared.

2.   Know what you are measuring

Machine learning algorithms are designed to generate signals that help researchers monitor risks without having to manually examine each individual data point. Since the algorithms themselves may be quite complex, researchers need to understand what each algorithm is measuring in order to interpret the results correctly.

Ideally, algorithm results are visualized in a manner that is easy to both interpret and act on with built-in workflows and drill-down capabilities to view raw data if necessary. Results can be displayed in a variety of ways.For example, anomaly detection algorithms might flag abnormal data and generate workflows tailored to the needs of the particular users responsible for certain patients or sites. Researchers can then investigate the discrepancy and determine how best to proceed. Clustering algorithms, on the other hand, group patients or sites into similar clusters, enabling the examination of a patient in the context of other similar patients to detect outliers.

3.   Remember that signals need to be interpreted in the context of the clinical trial

Each clinical trial is unique-in its patient population, endpoints, and risk thresholds-so the algorithms used to process and analyze study data must also be unique. While the algorithms may be based on those used for other trials within the same therapeutic area, they must be adjusted to address the specific needs of a particular trial. What is the algorithm measuring? What is it notmeasuring? Which potential patient results are to be expected? What constitutes a red flag and what can safely be ignored?

Any given set of data can be used to generate a wide range of insights, especially in trials with a large volume or variety of data. These insights only become valuable if they are relevant to the objectives of the clinical trial in question. For example, a slight elevation in blood pressure in a study participant may be important to flag in a cardiology trial but may be immaterial in an oncology study. Researchers must consider results in light of a study’s specific objectives to gauge whether that data warrants further action.

In the optimal scenario, the machine learning system enables researchers to configure and customize algorithms to a particular study, set them to run on a schedule, and establish thresholds for signal generation. The algorithms generate workflows and/or data visualizations, which provide researchers with insights that help them make informed decisions. But at the end of the day, the researchers themselves are responsible for assigning value to the results generated and determining how to act on them.


4.    Utilize best practices to facilitate regulatory compliance

The FDA requires that any action taken related to clinical trial data must be monitored and audited, and machine learning algorithms are no exception. Researchers can meet this requirement by maintaining a trace log that records when an algorithm is created, changed, and executed. Other best practices of algorithm management include:

  • Maintaining a detailed log illustrating each step the algorithm takes in processing data and identifying potential issues.

  • Utilizing a notification system to alert the administrator(s) when the algorithm initiates, when it finishes, and whether any problems occurred. 

  • Comparing snapshots of data across various time periods to validate results and ensure that the algorithm is still functioning as expected.

  • On occasion, algorithms may need to be updated based on new inputs and findings, including new data sources, user interaction, and feedback. These updates need to be carefully documented and communicated to the users to manage expectations. 

5.   Use common sense

Machine learning algorithms are not infallible, and their accuracy and efficacy are dependent on the quality of data. In computer science, “garbage in, garbage out” describes the concept that flawed input data produces nonsense output. Keep in mind that machine learning algorithms will generate signals based on any valid raw data format, regardless of what those data are. In the context of a clinical trial, a false positive or a false negative could have significant downstream impact on patient safety and study integrity.

To that end, it is important to never act on a result without questioning it first. Algorithms are designed to facilitate decision-making, not dictate action, so if something doesn’t make sense, validate it. Researchers need to bring their own knowledge, judgment, and insights to bear when evaluating the outputs of machine learning. Comparing algorithm results to known benchmarks may be useful for validating the underlying algorithms. These benchmarks could be snapshots of the raw data which were used to train the algorithms and track the accuracy of signal detection.

Transformingclinical trials with AI

Advanced technologies with machine learning algorithms are now able to facilitate many aspects of RBM by accessing, harmonizing, and analyzing data from diverse sources to identify patterns and trends and pinpoint anomalies and potential points of failure. 

When implemented and managed appropriately, these RBM technologies are powerful tools for monitoring risk and improving both patient safety and data quality. Comprehensive RBM technologies approach risk from all angles, alerting users when risks approach predefined thresholds and when safety or data quality is at risk of compromise. In addition to making on-site monitoring more efficient, RBM software provides ongoing oversight of patients, sites, and the study as a whole to facilitate informed decision-making. 

Utilizing a machine learning-driven RBM technology may just be your first step in incorporating AI into clinical development. As sponsors and CROs continue to seek ways to drive efficiency, optimize end-to-end study processes, and accelerate the commercialization pathway, machine learning and other AI techniques will continue to transform the clinical trial landscape.  Here are just a few ways AI-driven technologies can make the clinical development process more intelligent:

  • Combining vast volumes of data from many sources to determine in which indication a drug candidate is most likely to succeed

  • Identifying appropriate clinical trial patients and predicting which patients are at risk of dropping out

  • Measuring drug responses

  • Monitoring drug adherence

  • Predicting site performance

  • Enabling siteless virtual trials

Regardless of the technology, AI is designed to augment and assist-not replace-human intelligence, and it is the researcher who is ultimately responsible for the technology’s success. By considering the five key factors above, sponsors and CROs can help ensure that their RBM software or other AI technology functions as expected and delivers on its promises.

Dr. Basheer Hawwash is the Principal Data Scientist, Remarque Systems, Inc.



  1. PhRMA. 2018 Biopharmaceutical Research Industry Profile and Toolkit. Available at

  1. U.S. Food and Drug Administration. Artificial Intelligence and Machine Learning in Software as a Medical Device. Available at

  1. Health IT Analytics. FDA: Real-World Data, Machine Learning Critical for Clinical Trials. Available at

  1. CB Insights. The Future of Clinical Trials: How AI & Big Tech Could Make Drug Development Cheaper, Faster & More Effective, August 2018. 
Related Content
© 2024 MJH Life Sciences

All rights reserved.