Cancer Biomedical Informatics Grid


Applied Clinical Trials Supplements

Applied Clinical Trials SupplementsSupplements-05-01-2011
Volume 0
Issue 0

Information network allows constituencies to share data and knowledge.

Since the mid-1980s, biomedical science has undergone a major revolution, driven by the advancement of molecular science, ushering in the era of molecularly-driven, "personalized" medicine. This has increased the influence of basic research on the clinical process, best illustrated by the advent of new molecular tools, such as next-generation sequencing, which are pushing the boundaries of data handling. The scientific community now recognizes that genotypes exist not only in clinical settings as key data, but also as contributory factors for the development of treatments. While the current state of technology allows for only limited personalized treatment regimes, it is clear that certain genetic traits lend themselves to the success or failure of drugs within a given population. The need to rapidly move from discovery to development—the essential feature of translational medicine—is driving the technology.

(Dieter Spannknebel/Getty Images)

The pace of change in the clinical trial process has not kept up with that of molecular biology. The widespread acceptance in the 1950s of the controlled clinical trial as a best practice radically changed the face of clinical research and drug development. Since that time, however, there have been no changes of comparable magnitude, and the resulting dissonance between the pace of science and that of the clinical trial process has manifested itself in the "pipeline problem"—a slowdown, instead of an expected acceleration, in the number of innovative therapies being made available to patients. The US Food and Drug Administration (FDA) noted in 2005 that "the applied sciences needed for medical product development have not kept pace with the tremendous advances in the basic sciences."1 New Molecular Entity (NMEs) applications to the FDA's Center for Drug Evaluation and Research have been trending downwards for the past 15 years, from 45 in 1996, to 23 in 2010.2 Meanwhile, research and development spending by pharmaceutical companies more than doubled in the decade following 1996.3

Many of the challenges posed by the paradigm of translational research—the ability to take findings in basic science and rapidly translate them into usable therapies—relate directly to those that bioinformatics seeks to solve: the ability to deal with rapidly increasing amount of data; the capacity to process the data; and the ability to share the generated data. The sheer amount of data generated is staggering, as illustrated by Duke University, which is currently running near capacity of its 300 terabytes of storage space at one facility, after encoding 200 whole human genomes and 100 exomes.4 The number of different systems and programs used to produce, collect, and analyze the data further complicates the sharing and analysis across multiple data sets and sources.

Furthermore, exploration of new models for clinical trials data management is critical; current models do not scale to the rapid increase in information generation and availability, imposing rate-limiting steps on translation of this information into new therapies. The development of new therapies is no longer based on strictly linear, asynchronous sets of experiments and clinical trials results. The National Cancer Institute's (NCI) 1997 report of the Clinical Trials Program Review Group,5 which noted that the complexity of the clinical trials infrastructure had "eroded the ability of the system to generate new ideas to reduce the cancer burden," was an early expression of the need for new models for managing clinical data. More recently, the Institute of Medicine expressed the imperative need for a "learning healthcare system that is designed to generate and apply the best evidence for the collaborative healthcare choices of each patient and provider; to drive the process of discovery as a natural outgrowth of patient care; and to ensure innovation, quality, safety, and value in healthcare."6

Nowhere has the difference between the pace of scientific discovery and that of technology development been more keenly felt than in oncology. As noted recently by NCI, "Although many still think of a single disease affecting different parts of the body, research tells us—through new tools and technologies, massive computing power, and new insights from other fields—that cancer is, in fact, a collection of many diseases whose ultimate number, causes, and treatment represent a challenging biomedical puzzle."7 The effect of cytochrome p450 polymorphism on drug metabolism, for example, showed that a group of cancer patients hitherto regarded as relatively homogenous in fact had to be divided into those that could take flutamide, tegafur, cyclophosphamide, paclitaxel, or tamoxifen, depending on the polymorphism, or had to be excluded from treatment by these drugs due to highly toxic side effects.8

As personalized medicine becomes more prevalent, and translational medicine becomes the imperative, the need for sophisticated data collection, storage, and analysis will be of vital importance. This was recognized, even prior to the increased availability of next-generation sequencing and the huge amount of data associated with it, in the report of the Clinical Trials Working Group to the National Cancer Advisory Board in 2005.9 The Working Group identified the need for a new clinical trials infrastructure to enhance coordination and communication, scientific quality and prioritization, standardization of tools and procedures, and operational efficiency. In this article the authors will discuss the implications and trends of information and data handling in the modern clinical research setting.

Existing model and new technologies

The increasing number of clinical trials and the rapid pace of technology led to the development of specialized information technology in the late 1980s and early 1990s. Automated analytic and reporting systems were driven by study sponsors and sites—commercial, academic, and government entities—in an effort to manage the growing volume of data and increasingly complex reporting requirements.

As a result, many large academic medical centers have spent years developing highly customized, tightly integrated in-house boutique systems intended to support local clinical workflow, which can vary significantly between organizations. The tailored nature of the applications led to rapid adoption by the organizations that created them, at the price of becoming increasingly ingrained, sclerotic, and resistant to change as time moved on. These systems have become legacy systems, which are maintained at an extremely high cost, since almost any process change or integration with another system will cause them to "break" and require code modification. The effort needed to maintain the systems leaves few resources for the development of enhancements and new functionality. As a consequence, architectural considerations and refactoring become a low priority, ensuring further "entrenchment" of the legacy systems and their monolithic nature. As standards and open-source capabilities become available, these legacy systems are unable to take advantage of them, since their ability to integrate and interoperate easily is limited.

The sustainability and scalability of these legacy systems is being challenged by the need to share large volumes of complex data, and the wide availability of affordable technologies such as polymerase chain reaction (PCR) and microarray technologies. A Google search for microarray tools yields over 3.6 million results, and a search for PCR tools yields over nine million.10 As the amount of laboratory-generated data has grown, so has the development of laboratory information management systems (LIMS). The ability to use this data in clinical trials is a sine qua non if researchers and clinicians ever hope to deliver on the promise of a true translational "bench-to-bedside" drug development approach. Clinical trials coordinators increasingly assume that individual patient genetic data is available to evaluate patient eligibility for a trial, and for randomization into an arm of the study. Imaging data is also commonly expected to be readily available for diagnostic and outcomes evaluation, further increasing the complexity of data handling, sharing, and storage.

The development of these disparate systems presents another well-known problem, that of rekeying data. When systems do not interoperate, manual reentry, which is error-prone and a waste of valuable resource time, is inevitable. Rekeying is even more wasteful if the data was "born digital," meaning that the measurements that generated them were performed by machines and instantly rendered digitally without any human intervention beyond the calibration and operation of the machines. A major advantage over rekeying, but still a labor-intensive and error-prone process, is the development of complex semi-manual ad hoc data synchronization procedures involving interim steps (e.g., export files in comma-delimited or spreadsheet format). In these cases, however, the investigators and their scientific teams end up acting as data integration specialists and software engineers to create these processes, limiting their own time for research and increasing the risk of coding errors, transposition errors, data loss, and data corruption. Data collection, storage, and processing have become an important and increasingly challenging task, even without considering the need for semantic data exchange (i.e., the ability of machines to understand the context and meaning of data being shared between systems) to facilitate collaboration between researchers or organizations using disparate systems.

Cancer Biomedical Informatics Grid

The Cancer Biomedical Informatics Grid® (caBIG®) program was conceived in response to these challenges. In 2004 the National Cancer Institute (NCI) launched caBIG as an information network enabling all constituencies in the cancer community—researchers, physicians, and patients—to share data and knowledge. Prior to caBIG's launch, NCI consulted with many stakeholder groups and visited nearly all of the NCI-designated Cancer Centers (research sites that have been recognized by NCI as meeting specific standards for scientific excellence and for integrating diverse research approaches to the problem of cancer)11 to gather information and clarify the informatics needs of the scientific community. The challenges involved in managing clinical data emerged as a key pain point, along with many others including the increasing requirement to share this data across the basic science, clinical research, and patient care continuum. Additionally, the creation of technology and data sharing standards that would allow for the development of common tools and interfaces was seen as one of the major goals for the new program. Based on this assessment, the mission of caBIG was aimed at developing a collaborative information network that accelerates the discovery of new approaches for the detection; diagnosis; treatment; and prevention of cancer, ultimately improving patient outcomes. Specifically, caBIG's goals are to:

  • Connect scientists and practitioners through a shareable and interoperable infrastructure.

  • Develop standard rules and a common language to more easily share information.

  • Integrate or adapt tools for collecting, analyzing, integrating, and disseminating information associated with cancer research and care.

caBIG is primarily focused on the interfaces between systems. The idea is to create reusable interfaces through open, published standards, allowing organizations access to new data sources as they become available. Additionally, in response to community-identified need and reflecting a trend throughout software engineering,11 caBIG facilitated the development of standards-based, interoperable "plug and play" modules, intending them as adjuncts to supplement in-house and vendor systems.

The caBIG framework was developed to be open and available to all. It is intended to be co-created by all the communities likely to use and propagate these tools. Consequently, since its inception, caBIG has followed the following four guiding principles:

Open access. All caBIG resources, including software, infrastructure, documents, and data, are freely obtainable to all, in order to maximize opportunities for broad sharing and collaboration.

Open development. caBIG tools and infrastructure are developed through a participatory process open to all.

Federation. caBIG software and resources are widely distributed, interlinked, and potentially available to everyone, but each organization maintains local control over its own resources and data.

Open source. All source codes for NCI-funded caBIG modules is freely available to view, download, use, alter, and redistribute without restriction. The caBIG software license12 is non-viral (i.e., derivative works are not subject to the original open source terms, meaning that any of the code can be incorporated into proprietary software). This is consistent with the policy13 of the National Institutes of Health (NIH) that research resources, including software and data, should be broadly disseminated in order to promote further research, development, and applications, and that broad use of these resources will ultimately benefit public health.

To align the efforts of caBIG with the structure of translational research, the program was organized around interconnected "workspaces," each representing a particular area of expertise and/or scientific need. These workspaces allowed for a detailed focus on the stakeholders within that domain, as well as the higher-level needs across those realms. The domain workspaces in caBIG comprise the following: integrative cancer research (ICR); in vivo imaging; tissue banks and pathology tools (TBPT); and clinical trials management systems (CTMS).

Each domain workspace works closely with community stakeholders to document specific needs and allow for organizations to move towards increased data sharing and analysis. Additional "cross-cutting" workspaces were focused on the architecture and standards that would support the necessary interoperability between domains and systems: vocabularies and common data elements (VCDE) and architecture.

The goal of these workspaces is to develop an open informatics framework of standards, specifications, and vocabularies to support translational research and care. For caBIG, a services-oriented architecture (SOA) was adopted. Benefits of a SOA-based approach include a focus on business capabilities (and not on the technology), increased agility, and technology-independent specifications providing increased vendor participation and cross-vendor interoperability.


The caBIG clinical trials suite was developed to meet the expressed needs of NCI's cancer research community for the management of oncology clinical trials. The suite is designed as a modular, clinical trials management platform comprised of interoperable components (Figure 1). The following is a list of capabilities, which currently exist as separable, interoperable components of the suite.

  • Clinical participant registry (C3PR) enables efficient and streamlined registration of participants into clinical trials.

  • Patient study calendar (PSC) allows organizations to centrally and consistently manage study participant schedules in clinical trials.

  • Adverse event reporting system (caAERS) is used to manage the collection and reporting of adverse event data obtained during clinical trials.

  • Lab viewer is used to view laboratory data in transit between clinical source systems (typically in-house clinical chemistry laboratory systems) and clinical trials systems (typically clinical data management systems).

  • Clinical connector allows users of third party clinical data management systems (CDMS), both open source and proprietary, to expose caBIG capabilities within the CDMS. All of the CTMS modules are designed to work with a CDMS as the central repository for the clinical data for a trial. The specifications and requirements for all suite modules are free and open-source, so that commercial vendors, academic institutions, and industry alike can utilize and adapt these modules for their use.

  • Integration hub (iHub) enables tools and applications in the caBIG clinical trials suite and beyond to interface seamlessly with one another. iHub acts as a middleware component, using an enterprise service bus (ESB) to provide integration between preclinical, clinical research, and clinical care data, and ensure reliable transaction control (enabling applications to "talk" with one another and confirm receipt of messages and appropriate responses).

Figure 1. The suite is a modular, clinical trials management platform comprised of interoperable components.

Key to the ability of the caBIG CTMS modules to interoperate is a single domain analysis model (DAM), (i.e., a shared model of the dynamic and static semantics that collectively defines a shared domain-of-interest). The accepted model for the domain of clinical and preclinical research is the Biomedical Research Integrated Domain Group (BRIDG) model, which has been developed jointly by caBIG, the Clinical Data Interchange Standards Consortium (CDISC), Health Level Seven (HL7), and the FDA. Harmonizing an application's data model with BRIDG not only ensures that its semantics are aligned to those of other applications similarly harmonized, it also enables domain experts to understand, and thus to review for correctness and completeness, the semantics of the application. Furthermore, as BRIDG is itself mapped to HL7's Reference Information Model (RIM), mapping an application to BRIDG ensures that the application's semantics are HL7-compliant.

Molecular and translational research tools

In keeping with the integrative mission of caBIG, tools have been developed in response to community-expressed needs in molecular and translational research, including:

  • caArray is a microarray data management system.

  • caIntegrator is a research data mart that allows researchers to set up custom portals to conduct integrative research without programming. These portals bring together heterogeneous clinical, microarray, and medical imaging data to enrich multidisciplinary research.

  • geWorkbench (genomics Workbench) is a platform for integrated genomics.

  • GenePattern combines a powerful scientific workflow platform with more than 90 computational and visualization tools for the analysis of genomic data.

  • The National Biomedical Imaging Archive (NBIA) enables users to securely store, search, and download diagnostic medical images.

  • Annotated Imaging Markup (AIM) is the first project to propose/create a "standard" means of adding information/knowledge to an image in a clinical environment.

  • caTissue suite is a biorepository management tool designed for biospecimen inventory, tracking, and annotation.

Infrastructure for interoperability

As systems share data, semantics provide the ability to understand the meaning and context for the data. caBIG leverages a number of related capabilities:

  • Common data elements (CDEs) enables definition of standard data elements and data types to be used by all clinical systems

  • Cancer data standards repository (caDSR) stores and provides access to all CDEs

  • Standard case report forms (CRFs) are shared templates, which utilize the CDEs, for use in clinical trials

  • Enterprise vocabulary services (EVS) provides access to standardized terminologies and ontologies

Case study: adaptive trials

Traditional clinical trials generally test one therapeutic agent against a standard therapy. Patients are randomly assigned to a treatment arm or control group, and the protocol is not changed during the course of the trial. If researchers and clinicians notice a trend in how certain patients are responding to treatment, they cannot respond to that trend while the trial is ongoing.

Conversely, adaptive clinical trials, typically early phase trials, "learn" as they progress. Patients are typically tested for certain biomarkers and assigned a study arm based on that information, and this can change as the trial moves forward and researchers learn which patients are responding. If patients are not responding to a given therapy, the drugs can be removed and replaced with others. Conversely, if patients clearly are responding, the drugs can quickly be "promoted" to a later phase and, again, replaced with others.

Adaptive trials require strong bioinformatics support. The ability to rapidly share information, including images, with multiple sites that might be participating in a trial; to analyze gene sequences for patients; and to perform and share analyses on potentially large data sets, are all critical capabilities.

Two examples of recent adaptive trials highlight these needs. The first is the BATTLE trial (biomarkers-integrated approaches of targeted therapy for lung cancer elimination), the principal investigator for which is Edward Kim, MD, at MD Anderson Comprehensive Cancer Center. Four drugs were tested on approximately 250 lung cancer patients, many of whom had failed to respond to previous therapy. Biopsies of the patients' tumors were analyzed for mutations or molecular changes in four different pathways that affect growth of tumor cells. As specific patients responded, or failed to respond, to the drugs, new patients were assigned to specific arms of the trial based on that information. Sixteen percent more patients experienced disease control after eight weeks than those on traditional chemotherapy, and an increased survival rate was also shown for these patients.14 Another example of an adaptive trial, and one that is using the caBIG technology platform, is I-SPY TRIAL (Investigation of Serial Studies to Predict your Therapeutic Response with Imaging and Molecular Analysis), the principal investigator for which is Laura Esserman, MD, at the University of California, San Francisco. I-SPY TRIAL is designed to determine whether adding investigational drugs to standard chemotherapy pre-surgery is better than standard chemotherapy alone in women with newly-diagnosed, locally-advanced breast cancer. The treatment phase of this trial tests multiple investigational drugs that are thought to target the biology of each patient's tumor. The trial uses the information from each participant who completes the study treatment to help determine treatment for women who join the trial in the future, creating a "learning trial" and helping the study researchers rapidly understand which investigational drugs will be most beneficial for women with certain tumor characteristics.15 I-SPY TRIAL uses the caBIG TRANslational Informatics System to Coordinate Emerging Biomarkers, Novel Agents, and Clinical Data (TRANSCEND) platform—a set of integrated caBIG components specifically designed to support adaptive trials. The platform was developed to allow for interoperability not only with caBIG modules, but also with external electronic health record systems, to allow for integration between the clinical research and clinical care domains.

Jens Poschet, PhD, Center for Biomedical Informatics and Information Technology, National Cancer Institute, and Sapient Government Services, Arlington, VA. Eve Shalley Center for Biomedical Informatics and Information Technology, National Cancer Institute, and Essex Management, Rockville, MD. John Speakman* Center for Biomedical Informatics and Information Technology, National Cancer Institute, 2115 East Jefferson Street, Rockville, MD, e-mail:

*To whom all correspondence should be addressed.


1. Woodcock, J (2004) Innovation or Stagnation: Challenge and Opportunity on the Critical Path to New Medical Products. Food and Drug Administration Critical Path Initiative.

2. Food and Drug Administration, New Molecular Entity Statistics 2010,

3. I.M. Cockburn, "Is the Pharmaceutical Industry in a Productivity Crisis?" in Innovation Policy and the Economy, Volume 7, A. B. Jaffe, J. Lerner, and S. Stern, eds. (National Bureau of Economic Research, 2007).

4. J. Perkel, "Sequence Analysis 101," The Scientist, 25 (3) 60 (2011).

5. J. O. Armitage, et al., "Report of the National Cancer Institute Clinical Trials Program Review Group," (1997) National Cancer Institute Division of Extramural Activities,

6. L. Olsen, D. Aisner, and J. M. McGinnis, "The Learning Healthcare System," workshop summary, Institute of Medicine Roundtable on Evidence-based Medicine, March 30, 2007.

7. National Cancer Institute, "Changing the Conversation. The Nations Investment in Cancer Research," An Annual Plan and Budget Proposal for Fiscal Year 2012, (2011).

8. R.H. van Schaik, "Cancer Treatment and Pharmacogenetics of Cytochrome P450 Enzymes," Invest New Drugs, 23 (6) 513-522 (2005).

9. National Cancer Institute, "Report of the Clincial Trials Working Group of the National Cancer Advisory Board," (2005),

10. National Cancer Institute, "Cancer Centers Program,"

11. John Markoff, "Software Out There," New York Times (April 5, 2006).

12. National Cancer Institute, "Model caBIG® Open Source Software License," (2008)

13. Department of Health and Human Services, "Principles and Guidelines for Recipients of NIH Research Grants and Contracts on Obtaining and Disseminating Biomedical Research Resources: Final Notice," Federal Register 64 (100) (December 23, 1999)

14. Tina Hesman Saey, "BATTLE Trial Personalizes Lung Cancer Treatment," Science News Magazine, 177 (1) 15 (2010).

15. Breast Cancer ISPY2 Trial, website,

Related Content
© 2024 MJH Life Sciences

All rights reserved.