The Role of Big Data in Clinical Trials

Dec 01, 2016
By Applied Clinical Trials Editors
Volume 25, Issue 12

Today, big data is already proving its value by driving business decisions in finance, communications and automotive industries, among others. But what is the value of big data—which in R&D is really real-world data—in clinical trials?

In the past, clinical trials have used only structured, clinically-sourced data, which was relatively easy to organize and mine. But, with the advent of the trial in the cloud, connected mHealth devices for remote monitoring of trialJames Streeter patient participants, and advanced technologies to analyze very large amounts of data quickly, things have changed.

In the following Q&A, James Streeter, Global Vice President, Life Sciences Strategy, for Oracle Health Sciences Business Unit, shares insights on this data-driven shift in drug development.


Q: How does big data work for clinical trials?

STREETER: Today, the cloud allows us to include terabytes of unstructured data from many different, real-world data sources (EMRs, genetic profiles, phenotypic data, mHealth devices, etc.). With the ability to scale technology to collect this unstructured, real-world data from myriad systems, organize it into comparable formats, analyze it, and visualize the  results, we can deepen our evidence for known relational trial  factors and explore the data for unexpected patterns. These unexpected patterns can lead us to new hypothesis that can be validated by the trial data. Also, genetic data can provide deeper insights into the nature and size of the sub-population groups who could be served by a new treatment.

Q: How can unstructured data be organized and included?

STREETER: The key may be a federated approach to store data from various domains (clinical trial, EMRs, claims, medical device, consumer device, etc.) in multiple repositories. Each repository contains a tool set optimized for the data domain managing and preparing these data for high-speed analysis and optimized for storage of that domain data, both in a structured and unstructured representation. These resulting, federated data sets can then be used together to support a myriad of advanced analysis, patient data visualizations and analysis use cases to accelerate clinical research and improve patient outcomes.

Systems such as these—with metadata management capabilities—will have the flexibility and scalability to handle all real-world data  in the format, size and frequency required as clinical trials evolve. They will streamline the process of accepting/storing/enriching/provisioning patient-related data as needed. In addition to having the ability to trace trial data from beginning to end and speed the review/query resolution, they will also offer ease of access to data management and integrated downstream analysis across all patient data types. Finally, they will offer advanced analytics and patient data visualizations for better, faster, actionable insights, and cross-portfolio analysis.

Q: Will the clinical data manager’s role change?

STREETER: Yes. The introduction of large amounts of real-world data into the research study will also change the role clinical data manager. Traditionally, he or she asks, “How do we want to use this data?” Now the manager is taking on the role of the clinical data scientist. Instead of just managing the clinical trial information, the data scientist may observe new patterns leading to new hypotheses. So the manager might ask, “What data do I need to collect and analyze to validate this theory?”

For instance, when the National Cancer Institute (NCI) set up a prototype project, the question was asked, “How can we gain more insight into the relationship between genes and cancer?” NCI was able to search a 4.5 million cell matrix in 28 seconds. In this search, NCI cross-referenced the relationships between 15,000 genes and five major cancer types, across 20 million medical publication abstracts. It also cross-referenced genes from 60 million patients. This enabled NCI to gain a deeper understanding of the network of gene-cancer interactions and the state of research in relation to cohort groups treated.

Analysis of the bigger, more diverse real-world data sets can shed light via visualizations on unknown relationships among clinical trial factors. For instance, in relation to patient centricity, safety, risk-basd monitoring and genomics. Taken together, the ability to amalgamate, organize and analyze real-world data sets with structured data sets can only enhance those questions we know to ask of our clinical trials. But additionally, this capability can provide new, provable indications of hidden relationships that can precipitate better support, new hypotheses and provoke new, potentially life-saving questions that we didn’t think to ask before.

lorem ipsum