Industry has urgent need for an objective system that evaluates the efficacy of AI tools in clinical settings.
A recently published article on JAMA has brought attention to the need for nationwide assurance laboratories that objectively evaluate the use of AI in healthcare.1 The authors begin by highlighting the creation of the Coalition for Health AI (CHAI) in December 2022, which followed a call for more rigorous evaluation of healthcare-related AI earlier in 2022. They noted that for areas in which AI models are under regulatory oversight, there exists a framework for development of safe, reliable, and effective AI.
“However, any AI model falling outside of regulation, such as models for early detection of disease, automating billing procedures, facilitating scheduling, supporting public health disease surveillance, and other uses beyond traditional clinical decision support, should still follow similar rigor in its development, testing, and validation, as well as performance monitoring, when considering development and integration of decision support and/or administrative capabilities,” the authors wrote.
In April 2023, the CHAI community, which includes academia and patient advocacy groups, released a draft blueprint for trustworthy AI implementation, which included the potential next step of developing the vision for these assurance laboratories.2
The authors examined past research on trustworthy AI, which found more than 200 recommendations for reporting performance of models or describing characteristics of the source data via “model cards” and “data cards.”3 “While many randomized clinical trials or other types of scientific studies have evaluated the performance of AI models, each uses a different set of evaluation criteria, making it difficult to compare algorithms,” the authors wrote.
The plan for a potential nationwide assurance network, according to the authors, was broken down into areas of a shared resource for development and validation, comprehensive evaluation of AI models, transparent reporting, promoting regulatory guidance, and ongoing monitoring.
“A network of assurance labs could also provide monitoring of ongoing performance of AI models to ensure their intended objectives are achieved, in addition to offering services supporting federal regulation,” the authors wrote. “Such a network would help clinicians verify the appropriateness of AI models developed for use in health care delivery, whether those models are embedded in systems offered by electronic health record vendors or offered separately by third-party developers or created by the health care organization itself.”
These potential labs could provide different levels of evaluation, which range from model performance bias to human-machine teaming. “These labs could partner with model developers to help remediate specific areas (eg, bias) for improved performance and adherence to best practices,” the authors wrote.
There is also consideration for a national registry of tested AI tools, which would include the previously mentioned evaluations and be made available to stakeholders, including the general public. The assurance labs could also be leveraged in implementing regulatory guidance.
While the need for a system like this is urgent, the authors identified potential limitations and alternative approaches. Rather than a public, nationwide approach, there is potential in local assurance labs, national government-operated labs, commercial labs, or empowering solution developers for local testing. The authors also propose a modest start with a small number of assurance labs experimenting with diverse approaches.
“A public-private partnership to launch a nationwide network of health AI assurance labs could promote transparent, reliable, and credible health AI,” the authors concluded.