OR WAIT 15 SECS
Outlining the latest government, industry, and public health efforts to promote increased adoption of common standards in data collection and sharing.
Clinical research is only as effective as its ability to have an impact on health. This impact comes when researchers find breakthroughs, discover new diagnostics or treatments, and identify critical pathways that lead to curing diseases. To maximize their utility, clinical research data should be traceable, accessible, interoperable, reproducible, and of good quality, allowing study findings to be imparted and shared in a clear and understandable way.¹ Unfortunately, today, clinical research data are often collected in a variety of formats, leading to difficulties to effectively share and compare the data under the terms allowed by study participants’ consent. This disconnect creates an evidence gap that slows scientific advances, which can result in ineffective and even harmful treatments and diagnostics that continue to be employed in clinical practice.²
A significant issue that arises when working with research data is the inability to validate and reproduce findings to demonstrate that the experimental result is in fact true. A survey of over 1,500 researchers conducted by Nature in 2016 found that more than 70% of researchers have tried and failed to reproduce another scientist’s experiments, and more than half have failed to reproduce their own experiments.³ This effect is commonly caused by divergence from the protocol and the inability to retrace steps in the process.4 The landmark article by John Ioannidis in 2005, titled “Why Most Published Research Findings Are False” states: “The greater the flexibility in designs, definitions, outcomes, and analytical modes in a scientific field, the less likely the research findings are to be true.”5
While irreproducibility of research results in the field of genetics is encouraging greater transparency in methods and materials, along with the analytic codes that underlie the conclusions, this does not appear to be the case for clinical trials. There are also efforts to leverage big data, which may provide information on trends, signals, or hypotheses to be tested further, but generally do not provide results of sufficient adequacy to support regulatory submissions.
Regulated clinical resesearch has become increasingly global, particularly for areas such as rare diseases for which there is a small population of patients spread throughout the world. Efforts to streamline regulatory submissions for new product approvals have encouraged the development, largely through the International Council for Harmonization of Technical Requirements for Pharmaceuticals for Human Use (ICH),6 to standardize and harmonize the structure of these submissions as eCommon technical documents (eCTD). Such standards are useful not only for sponsors who wish to submit in multiple regions simultaneously, but also for regulators to facilitate reviews. ICH has also provided guidelines for global research on protocols, terminologies, and statistical analyses.
Currently, an estimated 85% of research studies do not translate to a meaningful clinical discovery.7 The causes for this low level of translation of promising research into meaningful insights and interventions for human health are multiple. One of many examples is the discovery of the relationship between infant sleeping position and sudden infant death syndrome (SIDS). Had it been possible to aggregate and systemically analyze all the evidence available by the year 1970, over 60,000 infant deaths worldwide could have been prevented.8 Differences in protocols among studies, small sample sizes, numbers of patients, and families involved per study, and differences in comparisons between SIDS and unaffected infants were among the factors that may have contributed to the delayed recognition of infant positioning on their back while sleeping as a protective factor against SIDS. This is one of many cases where critical health findings were present, but hidden in the data.
Regulatory validation of clinical trial findings involves stringent requirements to ensure that regulators can adequately evaluate the safety and efficacy of the medicinal product. Within the flexibilities afforded by the U.S. Federal Food, Drug, and Cosmetic Act, at least two adequate and well-controlled studies, each convincing on its own, are generally needed to establish effectiveness; similar recommendation was given by the European Medicines Agency (EMA).9,10,11 Review of trial data includes the “validation” needed to establish that the results have clinical meaning and that the findings are not due to chance alone. Furthermore, the need to provide adequate directions for the use of a drug in relevant subgroups requires an assessment of aggregated data from multiple trials. This regulatory review is facilitated by the use of standards for protocol information, outcome definitions, data terminology, and formats.
Adoption of common standards in research becomes pertinent to the regulatory process as data from early discovery is translated into clinical benefit (e.g., biomarker discovery, mechanistic studies, etc). The terminology standards used in regulatory submissions and healthcare can be similarly adopted in clinical research trials to facilitate this seamless integration of data.12,13
Interoperability is “the ability of different information technology systems and software applications to communicate, exchange data, and use the information that has been exchanged.”14 “Semantic interoperability” refers not only to the exchange of information, but also the exchange of meaning such that the recipient of the information can readily understand and interpret the information accurately in the manner intended by the data generator and/or sender.
Recently, FAIR has been cited as an acronym for four requirements that should be provided for a data publishing environment for machines and humans, to support appropriate aspects of data sharing.15 These FAIR facets are:
One key to ensuring semantic interoperability and adherence to the FAIR principles or facets is for parties to use the same data standards and terminologies or ontologies. Clearly, the more parties who agree on the data standards and terminologies, the better. This is the rationale behind consensus-building for a robust standards development process.
To maximize the real-world impact of any research study, the data must be collected and analyzed in a common format. Standardization helps build efficient and interoperable research data networks capable of producing high-quality and more reliable data that can support healthcare decisions, detect safety and other signals, and be utilized to generate new hypotheses and new knowledge. It also streamlines research activities by allowing data to be accrued more efficiently, and makes it possible to consolidate digital data available from different sources to support further research and healthcare decisions.16
Data standards allow research teams to explicitly name and define the different elements and aspects of their studies. By using standard terms, researchers can precisely describe, manage, and share their data, allowing external research teams to understand what the researchers did, how they did it, how to interpret the results, and accurately reproduce these results in future studies. It also lets researchers perform queries across diverse datasets, which allows for data from different research studies to be consolidated into larger datasets for analysis. In addition to supporting collaboration among researchers, standardization ultimately leads to more organized evidence, which can be better understood by audiences possessing limited scientific literacy. This organization can increase the ability of researchers and lay people to comprehend and share important findings.
There are several clinical research standards in use globally today, which cover the different stages of clinical research. These include those from the Clinical Data Interchange Standards Consortium (CDISC) for clinical and translational research;17 controlled terminology published through the National Cancer Institute’s (NCI) Enterprise Vocabulary Services; MedDRA (Medical Dictionary for Regulatory Activities) for medical history in clinical trials and for adverse events reporting,18 Health Level Seven (HL7) for structured product labels and ECG waveforms, the International Standards Organization (ISO) for the identification of medicinal products (IDMP); LOINC (Logical Observation Identifiers Names and Codes) for clinical laboratory tests and observations;19 and the ICH, as previously mentioned. There are also standards that exemplify collaboration among standards development organizations (SDOs) and other entities. For example, the Biomedical Research Integrated Domain Group (BRIDG) Model is a CDISC, HL7, and ISO standard, with NCI and FDA as key stakeholders.20
Over the past two decades, CDISC, a global non-profit organization that develops data standards through a volunteer-driven, consensus-based process, has developed a global, open-access suite of clinical and translational research data standards. These standards support the entire research lifecycle (including preclinical research) from structured protocol information through data collection, exchange, tabulation, analysis, and reporting.21 Standards specific to certain therapeutic areas have been developed collaboratively through the Coalition for Accelerating Standards and Therapies (CFAST), which has included the Critical Path Institute, CDISC, FDA, NCI, and TransCelerate BioPharma, along with medical experts and patient groups working in these therapeutic areas (TAs). Regulators from Europe and Japan have also contributed to the development of these TA standards.
TA user guides specify how to use these standards to structure the data for research on a given disease or treatment, broadening the circle of collaboration with patient representative groups, research investigators, and public-private partnerships. FDA has published specifications for these TAs in their Study Data Technical Conformance Guide.22 Working with data in a common format with controlled terminology makes it easier, faster, and more efficient for pharmaceutical companies, CROs, academic organizations, regulators, and other government entities to collaborate on projects.23 These standards are utilized for both regulated and some non-regulated trials, including interventional and observational studies, nutrition, public health, epidemiology, medical device, and outcomes research. They have even been applied to data from studies on healthy birth, growth, and development.
Data from traditional pharmaceutical, academic, public health, and the healthcare enterprises vary in their level of standardization. This interdependent research continuum highlights the need for standards that translate across the evidence divide.7 Implementing standards from protocol through analysis stages can enhance the quality and efficiency of clinical research processes and facilitate traceability, particularly when the standards are implemented from the start. Many research teams have made impactful discoveries with the application of data standards in later stages of the research process, but not without significant data transformation effort at the end of the process. For instance, a research team recently conducted a meta-analysis of chemotherapy in head and neck cancer (MACH-NC) by contacting and requesting individual patient data from several published studies. Analyzing the combined data, which included patient and tumor characteristics, dates of failure and death, treatment details, and toxicities, the researchers demonstrated the superiority of concurrent chemotherapy in the treatment of certain cancers, validating the results of the published studies.24 Their work could have been simplified and enhanced substantially had the different datasets been standardized from the beginning of each individual study.
Standardization allows a significantly faster and less costly avenue for generating evidence and performing robust analyses, by providing the data and processes employed in a common, predictable, and explicit format. A recent research project exploited open-access clinical trial data standardized using CDISC to answer important questions in prostate cancer to save time and reduce costs of the initiative.25 Data standards also provide great potential for semi-automation of the evidence generation process26 and for saving substantial human resources and time in the start-up of a clinical trial.21 If data collection standards are employed from the beginning, study start-up times can be reduced by 70% to 90%, since standard case report forms, edit checks, and validation documentation already exist and many can be reused from trial to trial. Study teams can then focus on protocol-specific additions to the standards, which results in cost savings, faster delivery of results, and higher quality data.27
Data standards also facilitate community engagement, data sharing, and transparency. An open-data, crowdsourced project from Project Data Sphere identified predictors for survival in castration-resistant metastatic prostate cancer through prognostic models that used CDISC-standardized data from the comparator arms of four Phase III clinical trials and enabled 50 independent teams.25 These teams developed a comprehensive set of benchmarked models that uncovered key prognostic variables and novel interactions between them. All method predictions and code from this initiative are available for public use, increasing transparency and facilitating collaboration. Project Data Sphere participants noted that the data provided in a known standard format were easier to interpret and more useful than those that were submitted in proprietary formats.
Responses to epidemics and global public health emergencies, such as outbreaks like Ebola and the Zika virus, realize significant benefit from standards by ensuring that decisions are based on the best available evidence. The earlier treatments can be evaluated, the faster outbreaks can be contained. In 2015, the World Health Organization (WHO) conducted a consultation on research data sharing during public health emergencies. A background briefing for this exercise mentioned multiple opportunities for improvement with regard to data sharing, including the “need to build databases where all data are entered in a uniform way, which can be populated when outbreaks occur and are available worldwide.”28 This solution requires that data standards be available prior to outbreaks. WHO convened a diverse group of stakeholders to discuss the development of global norms and standards for more rapid and transparent data sharing during public health emergencies.29 Common research data standards have now been collaboratively developed for Ebola,30 malaria,31 and influenza,32 all of which can be leveraged for responding to new outbreaks.33,34
Data standards in regulatory submissions supporting new product applications have enabled efficient review through automated validation of data quality. A suite of tools and services for clinical and nonclinical standardized data support high level analysis early in the review process.35,36 Transparency of the regulatory review processes is enhanced through engagement in the process of standards development and the availability of publicly-shared standard analyses scripts.37,38 The incorporation of patient-reported outcome (PRO) measures along with the TA standards could draw an even broader set of stakeholders into the process. These standards are freely available and could be adopted to enable the same transformation in all supported clinical research.
Downstream standard development efforts built on standardized data include harmonized research protocol templates and outcomes adapted for TAs. These efforts bring us closer to the possibility of even greater efficiency with master protocols for use in clinical trial networks. FDA and ICH developed a common protocol template concurrently with another such development effort by TransCelerate. These templates have now been harmonized and published as one.39 They are now being “technology-enabled” based upon protocol standards developed previously and incorporated into the BRIDG Model. This common protocol template has already proven to be quite useful in a) ensuring that endpoints to be collected are aligned with protocol objectives; and b) information from the protocol can be re-used across multiple downstream documents such as the statistical analysis plan, the clinical study report, and the product label. These efforts have now led to a new protocol project with ICH.
Exchange of “computable biomedical knowledge” (CBK) is also being studied in academia for providing results of research back to practice as in the final portion of a learning health cycle.40 The Learning Health Community41 has an initiative called Essential Standards to Enable Learning (ESTEL),42 which has published a white paper regarding a framework for LHS standards. These LHS-related efforts do not encourage the development of new standards, rather leveraging those that already exist and building upon them. The NIH has also recently invested funds in a Center for Data to Health (CD2H) to encourage adoption of standards across NIH Clinical and Translational Science Awards (CTSAs) as one goal.43 Another area ripe for standards adoption is electronic health records (EHRs), which will be better leveraged for research purposes when data can readily be shared in a standard format. FDA has issued recent guidance in this regard.44
For research studies intended for regulatory review, concerted efforts have been made to create global guidelines and standards for developing new therapies. The ICH developed guidelines for good clinical practices and formats for new product submissions to regulators for review in Europe, the U.S., and Japan. One key data standard output of ICH was MedDRA, which consisted of a rich and highly specific standardized medical terminology, created to facilitate sharing of regulatory information internationally for medical products used by humans. Global data standards for regulated clinical research were collaboratively developed to complement the ICH work, for example, the clinical trial registry (CTR) standard,45 which can be used to register clinical trials in the NIH/NLM ct.gov, the WHO International Clinical Trial Registry Platform (ICTRP),46 and the EMA’s EudraCT.47 The European Innovative Medicines Initiative (IMI) also encouraged the use of standards for the research studies they fund by offering a “standards starter pack” as a reference.36
Governmental authorities, international public health sponsors and advocates, biomedical research consortia, professional medical societies, and advisory committees charged with recommending ways to improve the efficacy and safety of medicines and other health technologies have promoted data sharing as a way to improve research. At the time of the this writing, the NIH is drafting guidelines to foster the development of scientific evidence with explicit, transparent, and consistently reported methods allowing: 1) decisions to be traced to the underlying evidence; 2) additional analyses of the dataset that may be required for decision-making; 3) new knowledge and insights to be gained through the analysis of pooled data; and 4) routine updating of systematic reviews across studies as new evidence becomes available.48 The U.S.’s 21st Century Cures Act49 encourages FDA to develop ways to leverage real-world data (e.g. from EHRs and mobile devices) to augment clinical trial data and specifically referenced CDISC as a standards setting body. The Patient-Centered Outcomes Research Institute (PCORI)50 has funded, through its Trust Fund, a cross-agency project led by FDA to facilitate the use of real-world data through the harmonization of common data models (CDM) that have been adopted by various research networks, including PCORNet, ODHSI/OMOP, and Sentinel. The “Cures” legislation did not, however, mandate use of standards for federally-funded academic clinical trials.
More generally, funding agencies also have established data-sharing policies, though few require the use of data standards over the course of conducting the funded research. While trials that meet criteria for submission to electronic clinical trial registries will need some degree of protocol description or adverse event standardization, aggregation and secondary use of full datasets is inhibited due to the absence of a requirement that funded researchers utilize standards. As long as federal-funding agencies do not have similar mandates or guidelines for standards as do regulatory agencies, sharing of data between or among agencies is hindered.
Some funding agencies have taken another approach-to standardize data from researchers to common structures and semantics. The U.S. National Institute of Allergy and Infectious Disease (NIAID) has created a data warehouse that utilizes CDISC’s data-collection and aggregation standards to model and standardize their funded clinical trial data from diverse sources;51,52 also, NIAID is funding the development of a TA standard and implementing CDISC standards for global research studies. Similarly, NIAID’s ImmPort database,53 which aggregates information from diverse translational or clinical immunology studies, uses CDISC to structure data extracts to support secondary use.54 These platforms maximize the NIAID investment in research by providing sources of data that share common meaning. Their data can be readily utilized for meta-analyses with similar regulated trials, as the FDA requires use of CDISC standards for submissions, but adoption and use of a common standard within academic federal funding agencies’ systems is not yet common globally. Thus, policymakers have the opportunity to multiply the value of federally-funded and regulated trials by not only making provision for data sharing, but also by requiring global clinical research standards.
Getting from where we currently operate to a place where standardized research data around the world can truly talk to each other is a great challenge and an immense opportunity. We have a collective responsibility to contribute to this effort; global stakeholders have different roles to play. Researchers and sponsors alike should become aware that the initial training and time required to implement data standards is more than worth the effort, since standards simplify the regulatory submission process, while enabling the data to be repurposed within and outside their research teams. Furthermore, regulatory agencies could continue increasing the amount of information-publicly or via controlled access-from regulatory submissions, following the example of EMA, to allow examination from different parties and enable the wider scientific community to conduct research and answer more questions using the increasingly available data. Coupled with the use of standardized data, it should eventually lead to higher quality submissions and regulatory reviews.55
National and international health policymakers have the responsibility to demand a broader evidence base to support their decisions and recommendations, as well as a more rigorous approach for evidence synthesis presented to them or developed by their teams. As FDA and Japan’s Pharmaceuticals and Medical Devices Agency (PMDA) have done, national entities, such as the 27 different institutes and centers that comprise the NIH in the US, should avoid unnecessary duplication of efforts and coordinate around existing robust standards that are maintained by global standards development organizations. There are several examples of global standards used within NIH. The National Human Genome Research Institute (NHGRI) relies heavily on the use of international standards to annotate genetic and phenomic data. Without the use of standards such as the Gene Ontology (GO) and the Human Phenotype Ontology (HPO), scientists would not be able to directly compare scientific results. Furthermore, as new discoveries are made, these same scientists contribute back to the ontologies to maintain the standards. Another example of NIH involvement with standards bodies is the Genetic and Rare Diseases Information Center (GARD), which relies heavily on SNOMED, ICD, and Orphanet to find and share resources.
National policymakers should form a team of technical experts to evaluate the best avenues for implementing data standards, adopting and encouraging the use of existing international standards whenever possible, to pave the way for global data exchange. International policymakers, in turn, should promote the adoption of global data standards as means of accelerating and enhancing collaborations among international partners for greater global impact of research. International policymakers are also responsible for providing technical support to countries in the progressive implementation of research data standards, so countries can make more informed national decisions and contribute to the global pool of standardized data. Entities that are part of the healthcare system should continue efforts to bridge the gap between clinical practice and research while implementing data standardization as well.
Imagine a world in which research data can be shared and aggregated seamlessly such that the power of that data can be maximized to accelerate collaborative learning and streamline the path to new therapies. We have an ethical imperative to adopt and leverage robust global data standards that will improve the way research is conducted to benefit all patients.
Barbara Jauregui, is an International Consultant, Pan American Health Organization/World Health Organization; Lynn D. Hudson is Chief Science Officer, Critical Path Institute (C-Path); Lauren B. Becnel is Senior Director, RWDnA & Data Strategy-Oncology Client Partner, Pfizer; Eileen Navarro Almario is Associate Director for Clinical Affairs in the Office of Computational Science, OTS, CDER, FDA; Ronald Fitzmartin is Data Standards Staff, Office of the Director, CBER, FDA; Frank Pétavy is Head of Biostatistics and Methodology Support, EMA; Nathalie Seigneuret is Senior Scientific Project Manager, Innovative Medicines Initiative; James K. Malone is Senior Medical Director, Eli Lilly and Company; Fang Liz Zhou is Director, Global Medical Evidence Generation; Jose Galvez is Chief, Office of Biomedical Translational Research Informatics (BTRIS), NIH; Tammy Jackson is Senior Director, Clinical Innovation, PPD Inc.; Nicole Harmon is Chief of Staff, Clinical Data Interchange Standards Consortium (CDISC); Rebecca D. Kush is President, Catalysis Research; Scientific Innovation Officer, Elligo Health Research, and Fellow, Translational Research Center for Medical Innovation, Foundation for Biomedical Research and Innovation, Kobe, Japan
Authors include representatives of PAHO/WHO, Critical Path Institute, FDA (CBER and CDER), EMA, NIH, IMI, ACRO, CDISC, Lilly (TransCelerate), Sanofi (DataSphere), Pfizer, and Elligo Health Research.