Enhancing Clinical Data Validation with AI: A Comprehensive Approach to Accuracy and Efficiency


Merck presentation at SCOPE 2023 highlights company’s AI-assisted data validation approach.

Clinical research has experienced a dramatic increase in data volume, velocity, and variety, especially with the emergence of COVID-19. Companies like Merck incorporate AI and machine learning (ML) into their data validation processes to effectively manage this data explosion. At SCOPE 2023, Merck (Christopher Lamplugh, associate VP and head of global data management & standards, and Rakesh Maniar, executive director & head of eClinical technologies, global data management & standards) discussed AI usage in clinical data. This article provides an overview of their presentation, including the challenges faced, the methodology adopted, and the outcomes achieved by implementing AI in clinical data validation while ensuring technology does not impede clinical trials.

Background and challenges

Clinical trials generate massive amounts of data that must be accurately validated, analyzed, and reported. Traditional manual data validation processes are time-consuming and error-prone, leading to delays and increased costs. Merck recognized the need for a more efficient and accurate approach and leveraged AI and ML technologies to improve its data validation processes. However, merging new technologies with existing practices requires a deep understanding of the potential challenges and a well-defined strategy.

Methodology and model training

Merck began by defining the user requirements and identifying the specific challenges it aimed to address with the AI model. These challenges included reducing the time spent on manual data validation, improving prediction accuracy, and ensuring that actionable insights were generated from the data.
Merck used historical data sets to train the AI model. This involved annotating and labeling the data to create a "ground truth" from which the model could learn. An analogy, for example, was described as if the model were trained on images showing cats wearing masks, it would learn to recognize a cat even when its face was partially obscured.

Through iterative training and validation cycles, Merck improved the AI model's prediction accuracy up to 89% after 14 cycles. Data managers were crucial in refining the model by providing feedback on its predictions and suggesting additional features to enhance its performance. In one instance, including a sequence number feature significantly improved the model's ability to identify discrepancies in data records based on a series of comments.

Key takeaways and outcomes

Merck's AI-assisted data validation approach offered several benefits:

1. Increased efficiency and accuracy: By automating the data validation process, Merck reduced the time spent on manual tasks, enabling data managers to focus on more strategic activities.

2. Improved prediction accuracy: Through continuous training and feedback, the AI model's accuracy improved over time, resulting in more reliable and actionable insights.

3. Enhanced knowledge retention: The AI model's ability to learn from historical data and institutional knowledge allowed Merck to mitigate the impact of employee attrition and maintain a high level of expertise.

4. Streamlined user experience: By incorporating user experience considerations into the AI model, Merck ensured that technology would not hinder clinical trial processes.

However, Merck also recognized that AI and ML technologies could not replace all rule-based checks and that combining both approaches would be necessary for optimal results.

My opinion

While AI and ML technologies have undoubtedly revolutionized clinical data validation processes by increasing efficiency and accuracy, it is important to consider the limitations and potential risks of relying solely on these technologies. It is important to note that the dependence on AI could lead to overconfidence in its predictions, potentially overlooking human intuition and expertise, which could result in missed discrepancies or incorrect assumptions. Additionally, AI models require substantial historical data for accurate training, which might only sometimes be readily available, particularly in fast-evolving domains like clinical research. On a positive note, combining AI and human expertise can create a more robust and reliable system for data validation, as Merck did by developing a process to build and optimize AI models with human input methodically.


As demonstrated in this article, Merck incorporated AI into clinical data validation processes to significantly improve efficiency, accuracy, and resource allocation. By understanding the challenges, adopting a well-defined methodology, and focusing on user experience, companies can successfully integrate AI-assisted data validation into their clinical research processes while maintaining the quality and integrity of their clinical trials.

Moe Alsumidaie, MBA, MSF, is a thought leader and expert in the application of business analytics toward clinical trials, and regular contributor to Applied Clinical Trials.

© 2024 MJH Life Sciences

All rights reserved.