The Importance of Biomarker Management

August 13, 2020
Cliff Culver
Volume 29, Issue 7/8

Exploring the benefits of capturing and integrating molecular biomarker data within clinical trials to build foundations for data assets.

Cell therapy clinical trials typically generate significant quantity and diversity of biologic data intended to characterize disease and patient biology, as well as data intended to demonstrate evidence of therapeutic mechanisms. With the rise of precision medicine, similar expansions of data generation are occurring across therapeutic areas—though cell therapy particularly emphasize the critical role of molecular biomarkers for early identification of risk factors for adverse events and to measure the durability of therapeutic mechanisms.

These biologic data are rich resources that can both inform the ongoing clinical program and further represent an opportunity for organizations pursuing cell therapies to build data assets that create optionality at the program and portfolio levels. However, systems and processes for aggregating and integrating data within clinical programs have not kept pace with sponsors’ ability to produce biomarker data. Just as electronic data capture (EDC) technology revolutionized industry’s approach to clinical data management, we now see an imperative for strategic management of biomarker data.

In this article, we will explore the increasingly critical functions of capturing and integrating molecular biomarker data concurrent with ongoing clinical trials and establishing foundations for program and enterprise level data assets. Key elements align with findable, accessible, interoperable, and reusable (FAIR) principles and include:

  • A multilayered and unified approach to data processing that centralizes and aligns all key information across data sources.
  • Analytic and visualization capabilities that empower stakeholders with skill sets covering clinical, biologic, and data sciences.
  • Strategic implementation to support current or potential future efforts to integrate additional external data and information—including from publicly available data sources.

Successfully completing this shift from disjointed data to data asset can lead to massive scientific value creation.

Big data at the heart of precision medicine

Data generation in clinical trials has expanded exponentially as the cost of using complex assays to interrogate biological pathways has decreased, and sponsors continue to invest considerable time and resources to understand and implement cutting-edge modalities. These data, and data in the public domain, have a broad range of applications, from addressing patient selection, characterizing mechanism of action, to guiding dosing and optimizing study design. This is certainly evident in the discovery and development of cell therapies, for which molecular biomarkers characterize targets, can provide evidence of therapeutic durability, and have a critical role in identifying risk factors for adverse events. Unfortunately, these data are collected and generated from a myriad of sources and typically remain disconnected due to their complexity and disparate nature.

Today, these disjointed data have led to a resource-intensive approach to analysis, consisting of ad hoc efforts to compile the data needed to test a particular hypothesis or answer a specific scientific question, e.g., confirming therapeutic engagement with an intended target. Oft-cited studies show that organizations report spending >80% of time collecting/organizing data, at a significant detriment to capacity for analysis (not to mention that scientific teams report that data cleaning is the least enjoyable data science task).1 This potential impact can be especially stifling given that the depth and diversity of the data are generally intended as a powerful tool to inform on-trial decisions as well as ongoing development strategy.

This is especially true as an ever-growing set of biomarker assays are available, ranging from targeted genomic panels to high-content or high-throughput experiments. Data are now being generated at unprecedented rates from flow cytometry, next-generation sequencing, immunophenotyping, mutational analysis, gene or protein expression, and more. The diversity and complexity of biomarker assays is further complicated by the fact that the resulting data are generated in different formats and structures, creating challenges around harmonization, interpretation, and accessibility. Moreover, specialized labs and associated technologies have emerged and many have developed proprietary technologies, methods, or panels.

Empowering flexible insight generation

Scientific advancement is inherently unpredictable and delivering on the promises of precision medicine requires the ability for sponsor teams to flexibly interrogate and analyze data. In order to optimize value creation from generated data, drug developers and innovators need the ability to perform rapid data interrogation within and across platforms, trials, and geographies.

As with any big data opportunity, one key to success lies in developing a systematic approach to harmonizing the data. Our experience has shown that investment in comprehensive data alignment contributes exponential value downstream by enabling rapid and flexible analyses across the entirety of data sets, rather than spurring ad hoc efforts (each requiring incremental data alignment efforts) that yield ongoing delays that can stifle momentum—or miss opportunities to inform time-sensitive decisions with data that should already be available.

A multilayered and unified approach to data processing is critical to fully leverage the information and content produced across studies and experiments for scientific insight generation. To handle the complexity and throughput of biomarker assays and make data available to inform decision-making, best-practice approaches leverage:

  • Automated pipelines for the ingestion of data from diverse labs and biomarker assay technologies.
  • Assay-specific workflows and quality control parameters, as well as a capability to incorporate custom workflows.
  • Data reconciliation capabilities to expedite historically time-consuming reconciliation activities between laboratory information management systems (LIMS) and EDC.
  • A centralized database for storage and access to all specialty lab data post-ingestion with the ability to support production of submission-ready data sets that comply with Clinical Data Interchange Standards Consortium (CDISC) standards and can be adapted to shifts in regulatory requirements.

Leveraging publicly available data

The increasing availability of public data sets, such as The Cancer Genome Atlas (TCGA) and the Alzheimer’s Disease Neuroimaging Initiative (ADNI), offer a transformational opportunity for insight-driven planning at all phases of the development lifecycle. These public data sets are often of considerable size for life sciences data, compiling billions of data points across multiple modalities and representing data that would have previously required many years and millions of dollars to generate. Many organizations now recognize the opportunity to integrate publicly available data with their proprietary data assets to immediately adopt more rational approaches to trial design, patient stratification, and portfolio strategy.

We see progressive organizations taking a strategic approach to compiling and integrating publicly available data from their own preclinical and clinical programs to exponentially increase their capability to develop and test hypotheses quickly and fuel a more data-driven approach to clinical planning.

A multidisciplinary resource

Precision medicine development is a truly multidisciplinary endeavor, combining deep biological, scientific, clinical development, regulatory, and further stakeholders. Organizations that are able to provide access to organized data sets and facilitate collaboration across these groups have a material advantage in realizing the full potential of these teams, particularly those collaborating on active development programs.

The opportunity for effective collaboration is extended given the prevalence of partnerships and licensing relationships as strategies to support the development of novel therapeutics. The ability to gather data and insights to demonstrate early signs of positive biological response in early-phase trials can further support increasing partnerships, licensing, and fundraising activities.

User-friendly, intuitive web-based tools designed for biomarker analysis and data visualization are a particularly valuable opportunity to facilitate these collaborative efforts. Such platforms have the benefit of expanding access to data exploration to biologists, translational scientists, and other teams for which data science and coding experience may be capacity-constrained resources. However, such tools are unlikely to fully satisfy the requirements of all stakeholders, as data scientists within a bioinformatics team will ultimately require direct access to raw data to enable thorough and more formalized analysis and exploration.

Strategies to comprehensively integrate biomarker data must support the needs of these diverse stakeholders, providing intuitive visualization and analysis capabilities designed to support biomarker exploration while also delivering direct access to raw and processed data, visibility into quality control and processing pipelines, and interoperability with software tools leveraged by both translational and informatics groups (e.g., custom R packages, GraphPad Prism, Spotfire).


Given the investment in generating molecular biomarker data and its potential to radically increase an organization’s capability to advance the understanding of diseases, discover new drug targets, and/or identify biomarkers, we are quickly moving toward an era where proactive and continuous data analysis is the expectation. Moreover, we cannot simply take comfort in the promise of multiyear efforts aligned with timely buzzwords from “Big Data” to “Data Lakes” and “Machine Learning.”

Delivering on the promise of precision medicine requires out-of-the-box thinking and platforms designed to efficiently unlock the full potential of the data generated. Most important, we believe this must prioritize both the potential to advance ongoing clinical programs while setting the foundation for future enterprise systems.Just as EDC technology helped revolutionize clinical data management, technology-based solutions for biomarker data management will now become a requirement for modern clinical trial operations.

We must also recognize that technology alone is not enough. Ultimately, success requires a cross-functional team including biologists, clinical developers, translational scientists, and innovative data scientists with the skill to design, validate, and operate technologies engineered specifically to address the challenges of biomarker data management in the new paradigm of biomarker guided drug development.


  1. Press G. “Cleaning Big Data: Most Time-Consuming, Least Enjoyable Data Science Task, Survey Says.” Forbes. March 23, 2016. Accessed June 15, 2020.

Cliff Culver is SVP, QuartzBio – Precision for Medicine

download issueDownload Issue : Applied Clinical Trials-08-01-2020