From Regulation to Data Aggregation: Three Machine Learning Trends to Watch
Foundational issues must be addressed to advance best practices.
For over a decade, we’ve discussed the potential of machine learning (ML) in clinical research to objectively gather and analyze data, optimize trial design, and accelerate drug development. While the opportunities of these technologies get a lot of buzz, there is still a long way to go when it comes to proving they can deliver on their promise and ensuring their development is sustainable long-term. We now find ourselves at a crossroad to improve confidence in ML among pharmaceutical sponsors and clinicians, while finding alternative ways to keep pace with the data-hungry nature of these algorithms.
Three key trends will direct the future of ML: regulatory guidance, an emphasis on model traceability as a means to build trust, and new data aggregation and analysis approaches that may help make ML innovation more practical and cost-effective.
Evolving regulatory guidance
Until recently, federal oversight over ML’s development has been limited, with developers defining best practices based on their own experience. While leaving the development of scientifically sound models up to the discretion of the data scientists has helped spur innovation, it has also led to faulty or biased models. These not only taint the reputation of ML overall, but also can have serious consequences for a patient’s care. To address the challenges that come with autonomy, FDA, Health Canada, and the UK’s MHRA have jointly identified ten guiding principles to inform the development of Good Machine Learning Practice (GMLP). From data-specific guidance such as ensuring data sets are representative of the intended patient population, to identifying opportunities for improved cross-industry collaboration, these guidelines aim to promote safe, effective, and high-quality ML, while also considering its complex and iterative nature.
GMLP is an important first step to encouraging the adoption of proven, quality practices, and its evolution will be important to watch. Because these are suggestions rather than required standards, ML companies still hold the reins and can decide to what extent these guidelines influence their solutions. Ultimately, abiding by these principles and baking them into all the behind-the-scenes work of building a model should be nonnegotiable, especially as trust in these solutions continues to sway.
Demystifying machine learning through traceability
Clinicians and pharmaceutical sponsors can be wary of ML’s “magical” element in which it spits out a conclusion without evidence to support it, especially when a patient’s care or the future direction of a trial is on the line. As sponsors and clinicians operate in a highly regulated environment that requires vigilant documentation and robust proof around a drug’s efficacy, increasing the transparency and traceability of ML can help drive trust and give users peace of mind.
When encouraging traceability of ML, some developers may fear this would involve giving up proprietary information about the algorithm’s code. But that is not the case. Instead, we aim to gain visibility into the controls of that algorithm, or the system built around it that determines how data is collected, how it is trained and tested, and how a specific output is generated—all while keeping intellectual property secure. One day, this could involve an “audit report” or pedigree of a model that shows the workflow of a system and confirms it was developed using best practices and for its intended patient population. By lifting the veil on ML, we can hold developers more accountable to ensure no corners are cut, while also giving end users the boost of confidence they need.
Combining old and new data approaches
Big data analytics have dominated ML development for decades and will continue to be an important foundation, feeding algorithms large amounts of high quality, diverse data. But, as the industry aims to achieve precision medicine and advance research of rare diseases, there’s a need for data approaches that allow for more targeted analysis. The quantity of data will still be a big factor in teaching ML the basics, but there’s a growing emphasis on extracting actionable insights from smaller, more contextual datasets.
While big data is valuable for building bigger picture trends and correlations, Gartner
Paving the way for machine learning’s potential
ML has already made its mark on the healthcare and life sciences industries—many have reaped the benefits of more efficient operations, a better understanding of patient’s response to treatment, improved forecasting and more. The future, and reputation, of these relatively novel solutions depends on addressing these foundational issues of regulation, transparency, and optimized data approaches. By ensuring we are serving the unique needs of sponsors and clinicians for visibility, while reducing the burden on developers, we can advance best practices and innovate with purpose.
Michelle Marlborough, Chief Product Officer, AiCure, LLC
Newsletter
Stay current in clinical research with Applied Clinical Trials, providing expert insights, regulatory updates, and practical strategies for successful clinical trial design and execution.
Related Articles
- Breaking Down Data Silos to Enable Real-Time Trial Oversight
August 7th 2025
- Ensuring Accuracy with Data in Oncology Research
April 30th 2025
- Leveraging RWD to Improve Reporting in Oncology
April 29th 2025