Industry has urgent need for an objective system that evaluates the efficacy of AI tools in clinical settings.
A recently published article on JAMA has brought attention to the need for nationwide assurance laboratories that objectively evaluate the use of AI in healthcare.1 The authors begin by highlighting the creation of the Coalition for Health AI (CHAI) in December 2022, which followed a call for more rigorous evaluation of healthcare-related AI earlier in 2022. They noted that for areas in which AI models are under regulatory oversight, there exists a framework for development of safe, reliable, and effective AI.
“However, any AI model falling outside of regulation, such as models for early detection of disease, automating billing procedures, facilitating scheduling, supporting public health disease surveillance, and other uses beyond traditional clinical decision support, should still follow similar rigor in its development, testing, and validation, as well as performance monitoring, when considering development and integration of decision support and/or administrative capabilities,” the authors wrote.
In April 2023, the CHAI community, which includes academia and patient advocacy groups, released a draft blueprint for trustworthy AI implementation, which included the potential next step of developing the vision for these assurance laboratories.2
The authors examined past research on trustworthy AI, which found more than 200 recommendations for reporting performance of models or describing characteristics of the source data via “model cards” and “data cards.”3 “While many randomized clinical trials or other types of scientific studies have evaluated the performance of AI models, each uses a different set of evaluation criteria, making it difficult to compare algorithms,” the authors wrote.
The plan for a potential nationwide assurance network, according to the authors, was broken down into areas of a shared resource for development and validation, comprehensive evaluation of AI models, transparent reporting, promoting regulatory guidance, and ongoing monitoring.
“A network of assurance labs could also provide monitoring of ongoing performance of AI models to ensure their intended objectives are achieved, in addition to offering services supporting federal regulation,” the authors wrote. “Such a network would help clinicians verify the appropriateness of AI models developed for use in health care delivery, whether those models are embedded in systems offered by electronic health record vendors or offered separately by third-party developers or created by the health care organization itself.”
These potential labs could provide different levels of evaluation, which range from model performance bias to human-machine teaming. “These labs could partner with model developers to help remediate specific areas (eg, bias) for improved performance and adherence to best practices,” the authors wrote.
There is also consideration for a national registry of tested AI tools, which would include the previously mentioned evaluations and be made available to stakeholders, including the general public. The assurance labs could also be leveraged in implementing regulatory guidance.
While the need for a system like this is urgent, the authors identified potential limitations and alternative approaches. Rather than a public, nationwide approach, there is potential in local assurance labs, national government-operated labs, commercial labs, or empowering solution developers for local testing. The authors also propose a modest start with a small number of assurance labs experimenting with diverse approaches.
“A public-private partnership to launch a nationwide network of health AI assurance labs could promote transparent, reliable, and credible health AI,” the authors concluded.
Stay current in clinical research with Applied Clinical Trials, providing expert insights, regulatory updates, and practical strategies for successful clinical trial design and execution.
Unifying Industry to Better Understand GCP Guidance
May 7th 2025In this episode of the Applied Clinical Trials Podcast, David Nickerson, head of clinical quality management at EMD Serono; and Arlene Lee, director of product management, data quality & risk management solutions at Medidata, discuss the newest ICH E6(R3) GCP guidelines as well as how TransCelerate and ACRO have partnered to help stakeholders better acclimate to these guidelines.
Managing Side Effects and Dosing in Off-Label GLP-1 Use with Help from Real-World Evidence
July 18th 2025Shipra Patel, global therapeutic area section head, endocrinology, global head, pediatrics, Parexel, explains how real-world data is helping researchers navigate gastrointestinal side effects, dose flexibility, and long-term tolerability in off-label GLP-1 use.
Effect of AI/ML, Real World Evidence and Master Protocols on Trial Success
July 7th 2025How the application of artificial intelligence, broader use of real-world evidence, decentralized clinical trials, master protocols, and risk-based quality monitoring, together with strong ethical oversight and increased collaboration, are contributing to better healthcare delivery and strengthening the role of clinical research in driving global health progress.