From Regulation to Data Aggregation: Three Machine Learning Trends to Watch

June 3, 2022

Article

Foundational issues must be addressed to advance best practices.

Michelle Marlborough

For over a decade, we’ve discussed the potential of machine learning (ML) in clinical research to objectively gather and analyze data, optimize trial design, and accelerate drug development. While the opportunities of these technologies get a lot of buzz, there is still a long way to go when it comes to proving they can deliver on their promise and ensuring their development is sustainable long-term. We now find ourselves at a crossroad to improve confidence in ML among pharmaceutical sponsors and clinicians, while finding alternative ways to keep pace with the data-hungry nature of these algorithms.

Three key trends will direct the future of ML: regulatory guidance, an emphasis on model traceability as a means to build trust, and new data aggregation and analysis approaches that may help make ML innovation more practical and cost-effective.

Evolving regulatory guidance

Until recently, federal oversight over ML’s development has been limited, with developers defining best practices based on their own experience. While leaving the development of scientifically sound models up to the discretion of the data scientists has helped spur innovation, it has also led to faulty or biased models. These not only taint the reputation of ML overall, but also can have serious consequences for a patient’s care. To address the challenges that come with autonomy, FDA, Health Canada, and the UK’s MHRA have jointly identified ten guiding principles to inform the development of Good Machine Learning Practice (GMLP). From data-specific guidance such as ensuring data sets are representative of the intended patient population, to identifying opportunities for improved cross-industry collaboration, these guidelines aim to promote safe, effective, and high-quality ML, while also considering its complex and iterative nature.

GMLP is an important first step to encouraging the adoption of proven, quality practices, and its evolution will be important to watch. Because these are suggestions rather than required standards, ML companies still hold the reins and can decide to what extent these guidelines influence their solutions. Ultimately, abiding by these principles and baking them into all the behind-the-scenes work of building a model should be nonnegotiable, especially as trust in these solutions continues to sway.

Demystifying machine learning through traceability

Clinicians and pharmaceutical sponsors can be wary of ML’s “magical” element in which it spits out a conclusion without evidence to support it, especially when a patient’s care or the future direction of a trial is on the line. As sponsors and clinicians operate in a highly regulated environment that requires vigilant documentation and robust proof around a drug’s efficacy, increasing the transparency and traceability of ML can help drive trust and give users peace of mind.

When encouraging traceability of ML, some developers may fear this would involve giving up proprietary information about the algorithm’s code. But that is not the case. Instead, we aim to gain visibility into the controls of that algorithm, or the system built around it that determines how data is collected, how it is trained and tested, and how a specific output is generated—all while keeping intellectual property secure. One day, this could involve an “audit report” or pedigree of a model that shows the workflow of a system and confirms it was developed using best practices and for its intended patient population. By lifting the veil on ML, we can hold developers more accountable to ensure no corners are cut, while also giving end users the boost of confidence they need.

Combining old and new data approaches

Big data analytics have dominated ML development for decades and will continue to be an important foundation, feeding algorithms large amounts of high quality, diverse data. But, as the industry aims to achieve precision medicine and advance research of rare diseases, there’s a need for data approaches that allow for more targeted analysis. The quantity of data will still be a big factor in teaching ML the basics, but there’s a growing emphasis on extracting actionable insights from smaller, more contextual datasets.

While big data is valuable for building bigger picture trends and correlations, Gartner recently predicted that 70% of organizations will shift their focus from big to “small and wide data” by 2025 to make ML less data-hungry. Wide data allows one to view disparate data from a variety of sources to come up with meaningful analysis, while small data is focused on using small, individual sets of data to draw specific, more personalized insights. Together, small and wide data allow ML developers to extract more value from the data they have available to them and target those insights to solve a specific problem. In healthcare, this approach is particularly helpful when building training sets of patient data that are big enough, which in some instances can be an impossible feat. Big data is not going away, but when combined with small and wide data, we open the door to more pragmatic AI development and more precise patient care.

Paving the way for machine learning’s potential

ML has already made its mark on the healthcare and life sciences industries—many have reaped the benefits of more efficient operations, a better understanding of patient’s response to treatment, improved forecasting and more. The future, and reputation, of these relatively novel solutions depends on addressing these foundational issues of regulation, transparency, and optimized data approaches. By ensuring we are serving the unique needs of sponsors and clinicians for visibility, while reducing the burden on developers, we can advance best practices and innovate with purpose.

Michelle Marlborough, Chief Product Officer, AiCure, LLC

Related Content

Credit: Egor | stock.adobe.com. Key developments such as the application of artificial intelligence (AI), broader use of real-world evidence (RWE), decentralized clinical trials (DCTs), master protocols, risk-based quality monitoring (RBQM), and precision medicine are shaping the future of research.

Effect of AI/ML, Real World Evidence and Master Protocols on Trial Success

Janel Shelton-DeMagnus, MS, MPAS; Yun Lu, PhD; and Sowmya Kaur

July 7th 2025

Article

How the application of artificial intelligence, broader use of real-world evidence, decentralized clinical trials, master protocols, and risk-based quality monitoring, together with strong ethical oversight and increased collaboration, are contributing to better healthcare delivery and strengthening the role of clinical research in driving global health progress.

Beyond the Molecule: How Human-Centered Design Unlocks AI's Promise in Pharma

Eric Karofsky

June 23rd 2025

Article

How human-centered AI that is focused on customer, user, and employee experience can drive real transformation in clinical trials and beyond by aligning intelligent technologies with the people who use them.

© photon_photo - © photon_photo - stock.adobe.com

Less is More in Clinical Data: The Road to Simpler and Better Trials

Drew Garty

May 20th 2025

Article

A pragmatic approach to streamlining technology and innovating processes will help advance clinical data management.

Turning Uncertainty into Opportunity: Smarter Clinical Asset Evaluation Powered by Causal Machine Learning

Raviv Pryluk, PhD

February 18th 2025

Article

Machine learning can help investors dive deeper into trial data to evaluate the true potential of an asset and uncover new hidden opportunities.

The Transformative Power of Data Analytics in Clinical Trials

Melissa Hutchens

February 10th 2025

Article

Leveraging and benchmarking insights to boost efficiency and optimization.

Decentralized Phase I Trial Highlights Promise of Remote Data Collection to Improve Clinical Research

Davy James

January 22nd 2025

Article

Use of decentralized approach in a Phase 1 pharmacokinetic trial shows the ability to enable remote data collection and monitoring, which could improve patient access and enhance the efficiency of clinical research.