News|Articles|February 16, 2017

How to Leverage Big Data to Improve Clinical Site Selection

Predictive modeling has allowed biopharma companies the opportunity to solve inaccurate, inefficient and misinformed site selection practices by leveraging big data.

The emergence of big data sets, natural language processing, and machine learning has created the opportunity to solve inaccurate, inefficient, and misinformed site selection practices through predictive modeling. Companies that fail to implement predictive modeling solutions that analyze data to determine the probability of achieving their target outcome when making their site selection decisions are at risk of falling behind their industry peers. This article discusses how once-separate data sources can now work together, allowing biopharmaceutical companies to evaluate investigator performance trends. Sponsors should leverage this information to improve their site selection strategy, allowing them to switch from choosing only investigators they know to building new relationships with investigators that deliver predictable strong performance.

Historically, clinical research teams at biopharmaceutical companies have worked with the same institutions and investigators repeatedly, relying on internal databases and personal referrals for site identification regardless of the results generated. During the past 10 years, biopharmaceutical companies have begun purchasing subscriptions to external clinical research resources in an effort to improve their approach. These data subscriptions draw from more than 50 publically-available global clinical research information repositories. With all of this new information, one would have expected an increase in the number of new relationships between biopharmaceutical companies and investigators. However, that has not been the case. Poor performance cannot be solved simply by purchasing data sets. These data are only as valuable as their most predictive element of future performance and most sources lack the most important one: enrollment data. Furthermore, when key performance predictors (KPPs) such as enrollment information are included in the data set, they must be organized and structured so they can be analyzed. Biopharmaceutical companies are now starting to require that all data are mined to generate structured information that can be converted into a predictive model that allows their clinical teams to evaluate the highest probability of investigator performance.

Biopharmaceutical companies are recognizing that they cannot lead the industry from a data silo. They need to partner with providers who have these data capabilities in order to build the most robust data engine and subsequent predictive analytics tool using structured source performance information. By partnering in this way, biopharmaceutical companies can take advantage of the largest informed data sets available.

Patient enrollment rates and, more specifically, randomization rates, have consistently been shown to be the best predictors of future clinical site performance. However, depending on the therapeutic area, there may be additional valuable predictors. For example, when a sponsor is planning to open a heart failure study, which is one of the busiest areas of clinical research today, there are more than 5,000 US investigators to choose from. How should the sponsor narrow the investigator pool to find the best 100 for its study? It is no longer acceptable to choose investigators randomly from a list based on location or familiarity with the medical director. The crowded investigator landscape demands that KPPs, such as the investigator enrollment rate, number of competing trials, and number of patients randomized be used to run a model to determine the 100 investigators most likely to enroll subjects for that clinical research trial. Similarly, with an ovarian cancer study, it might make sense to choose investigators that have experience using specific companion diagnostics, in addition to desirable enrollment rates. It is essential that sponsors have a dynamic data analytics tool that has the flexibility to weigh the KPPs that they value most for their trial. While having actionable intelligence, such as companion diagnostic experience, is of paramount importance, how that information is used is also crucial.

The amount of information being collected from the 40,000 active global investigators is immense. Without a dynamic way to organize and analyze that volume of information the clinical trial data landscape becomes so unwieldy it is impossible to draw meaningful conclusions. Predictive modeling and trending allow biopharmaceutical companies to remove noise from the data and focus on making the right decisions with actionable data. For example, predictive modeling can analyze the number of investigators who have completed at least three ovarian cancer trials and can determine, on average, based on the number of patients needed to reach statistical significance, how many sites should be opened to meet the target date for the last participant’s last visit. That model could also show how many more investigators would be needed to exceed expectations and surpass the target date for the last participant’s last visit.

Predictive modeling provides a way of evaluating what is “average.” Having real data on which to model performance pushes sponsors and investigators to evaluate how they can improve. If sponsors know that certain investigators always exceed expectations and enroll at a higher rate than average, they can refine their pool to include only high-performing investigators. They can then focus on which of those investigators are familiar with the diagnostic they want to use. The ability to focus on the needs of the specific protocol through the use of predictive modeling is the ultimate differentiator. Hitting average timelines with a mix of high-performing and low-performing investigators is no longer acceptable. Sponsors can now remove the low-performing investigators from their investigator pool and increase the odds of tripling their enrollment rate by making smarter informed choices from the beginning of the study.

Sponsors should use investigator performance information to ensure that their clinical trial timelines are achieved or exceeded. It is only through the use of predictive modeling that big data becomes actionable. It is time for biopharmaceutical companies to stop looking inward and start looking outward toward partners with the appropriate tools to gain access to performance data that inform the most important clinical trial decision a biopharmaceutical company makes, namely, site selection.

Suzanne Caruso, VP Clinical Solutions, WIRB-Copernicus Group

Stay current in clinical research with Applied Clinical Trials, providing expert insights, regulatory updates, and practical strategies for successful clinical trial design and execution.

Subscribe Now!

How to Leverage Big Data to Improve Clinical Site Selection

Newsletter

Related Content

Accelerate Clinical Trials with AI-Enhanced Financial Management

SCOPE Summit 2026: Reducing Patient Burden Is the Foundation of Wearable Success in Oncology

Evolving FDA Risk Tolerance Reshapes Global Trial Alignment

SCOPE Summit 2026 Panel Discussion: Diversity in Clinical Trials—What’s Working, What’s Next

SCOPE Summit 2026: Elevating Patient Experience in Clinical Operations

Trending on Applied Clinical Trials Online

SCOPE Summit 2026 Keynote Panel: Is Radical Acceleration in Clinical Research Possible?

SCOPE Summit 2026 Panel Discussion: Diversity in Clinical Trials—What’s Working, What’s Next

Accelerate Clinical Trials with AI-Enhanced Financial Management

SCOPE Summit 2026 Keynote Fireside Chat: Aligning Purpose, Innovation, and Operational Excellence in Clinical Development

Meeting Halfway: Co-Developing Frameworks for Seamless FSO to FSP Transitions