Putting Our Shoulders to the Wheel: Thoughts on Data Sharing


Data sharing in biomedical research has recently attracted widespread attention from physicians, scientists and stakeholders alike. DCRIs Dr. Eric Peterson discusses the context, flaws and positives of how this initiative could be implemented and the effects it can have on the industry.

The topic of data sharing in biomedical research has recently garnered widespread attention, not just among physicians and scientists, but across much wider groups of stakeholders. Patients, consumers, advocacy groups, and the general public are all taking an increasingly intense interest in the practices and policies that shape how data gathered from experiments on human research subjects can be used, shared, and made available for wider scrutiny and discussion. Even President Obama has recently talked about the importance of setting up new funding policies that would encourage academic researchers to stop “hoarding” data.1

Public discussions about this topic were further stoked following the release of a proposal by the International Committee of Medical Journal Editors (ICMJE) calling for data sharing. According to this proposal, authors who published findings from clinical trials in their journals would be required, as a condition of publication, to

…share with others the deidentified individual-patient data (IPD) underlying the results presented in the article (including tables, figures, and appendices or supplementary material) no later than 6 months after publication. The data underlying the results are defined as the IPD required to reproduce the article's findings, including necessary metadata.

Responses came swiftly. While some in the research community agreed with the broader aims of data sharing, many had significant concerns about the operational details outlined in the proposal. In fact, the concerns voiced were sufficiently strong that it now seems likely that such a top-down approach will be tabled until those concerns are adequately addressed.2Why Should Data Be Shared?

Why should researchers support the sharing of data? For one thing, it fosters transparency and openness in research. Although scientific research is peer reviewed prior to being published, these processes do not allow for direct access to the data behind the papers and therefore have limited ability to detect errors or fraud. If other researchers had access to study data and to detailed information about the study’s methods (including how the data were collected and analyzed), they could more quickly confirm study findings.
Expanded access to data also enables other researchers to conduct secondary analyses that allow additional questions to be asked. The data from original trials is often richly detailed and of high quality, which can facilitate valuable observational investigations beyond the original trial question. While some of these analyses are often conducted by the original trial leadership committees, it is unlikely that any small group will formulate all the potential important questions that could be addressed using such data.
In addition, datasets from multiple trials can be combined to perform meta-analyses, so that the safety and efficacy of a particular therapy can be scrutinized in larger or more diverse populations. Or, in other cases, better access to a given trial can avoid wasting time and resources that would otherwise be consumed in unnecessarily duplicating effort or by pursuing approaches that are unlikely to work.

All of these reasons are extremely important to patients, research participants, and the public. Combining and sharing data offer an important way to help patients gain access to needed therapies more quickly, and just as importantly, be confident that these therapies are supported by science.

Putting Data Sharing into Context

It’s important to remember that the idea of sharing data from clinical trials is not really new. Re-analysis and meta-analysis of data already plays a critical role in evaluating medical evidence, and recent technological innovations are allowing scientists to share data more easily and efficiently. Further, a provision of a federal law that went into effect in 2007 requires that any clinical trial conducted under U.S. regulatory oversight must make certain minimal kinds of data-such as information about the study design, patient population, and study outcomes-available to the public through the ClinicalTrials.gov registry. While the law already requires at least some data reporting from clinical trials, recent high-profile research is making a case for in-depth scrutiny of study findings, even after peer-reviewed publication. There’s also a strong case to be made that patients and the public have a basic right to data that is being produced by studies on human volunteers and used to guide treatment decisions that affect their own health and well-being.


Despite this, however, there is mounting evidence that many researchers are not meeting even the most basic requirements for reporting and sharing data from their research projects. One investigation found that reporting of results to ClinicalTrials.gov as required by law was generally poor, and that academic researchers (as compared with researchers funded by the medical products industry) were especially lax in reporting results. This work has since been confirmed and amplified by other studies.

This scrutiny takes place against a background of growing concern about the reliability of the scientific evidence that shapes our understanding of the risks and benefits of medical therapies, as well as how we use medical products and other health interventions in practice. A growing number of analyses are casting doubt on the methods and conclusions of previous studies3,4 and a steady procession of cases of research misconduct have prompted worry about whether we can actually trust the science that’s being published in major peer-reviewed journals.

When Simple Gets Complicated

While the idea of creating an open and transparent system of data sharing seems to be one all would support, the devil is in the details. For clinical trialists, it’s not enough simply to take a huge file containing the raw data from a study and post it online. In order to make sense of clinical trial data, they must be understood in the context of the research setting, the trial design, the population(s) of people who participate in the study, and a host of other factors. Data must be painstakingly annotated and curated so that others can understand these factors.
Storage and transfer of datasets may present problems as well. Although the capacities of our computers and information networks have continued to grow at a rapid-even sometimes startling!-pace, it’s worth remembering that many modern studies that gather genetic and other cutting-edge data (often referred to as “omics”), or which are tapping into data streams for personal electronic devices and health apps, may have truly enormous datasets that are difficult to transfer rapidly and expensive to store, even with the best available IT infrastructure. Data sharing also has important implications for privacy and confidentiality. The kinds of data that the ICMJE proposal discusses include individual patient-level data, whose disclosure is protected by state and federal laws. Researchers will need to develop methods for painstakingly ensuring that data are fully anonymized and that the privacy of study participants is scrupulously protected.

As important as these technical and privacy issues are, though, they are already being tackled and are ultimately solvable, although this will require significant time, effort, and resources. However, other, less clear-cut issues may prove more challenging. First of all, some have voiced concerns about the unintended consequences of data being re-analyzed and combined, especially by individuals who may lack the necessary knowledge and analytical skills. While no one doubts the benefits of data sharing, at the same time no one wants to see a sudden increase in flawed analyses that can muddy the scientific waters and even have a substantial negative impact on public health. Nor do they want to see researchers expending limited resources in responding to obviously erroneous critiques.

Another question is how to handle credit for the work done in creating this shared data. Should the authors of the original study be invited to assure the new investigators understood their data and used it properly? Should these original investigators be somehow credited with for the work and resources it took to carry out the original study? The ICMJE proposal suggests some possible solutions to these issues but does not specify a particular approach.

Imagining a Way Forward

Despite these challenges, I am convinced that there are ways forward that will satisfy the interests of all stakeholders. In my own career as a cardiovascular researcher, I’ve been involved in multiple projects that benefitted from a thoughtful, collegial approach to data sharing. My own institution, the Duke Clinical Research Institute, has developed an innovative partnership with the Bristol-Myers Squibb company and SAS Analytics designed to facilitate data sharing while also ensuring important standards for data quality and patient protection-the SOAR (Supporting Open Access to Researchers) Initiative. A paper recently published in the American Heart Journal lays out what we believe can serve as a model approach in which academia and industry work together to make well-characterized, highly usable datasets available in an open and transparent fashion.5


SOAR is more than just data from clinical trials, however. One of the first major assets offered by the SOAR initiative comprises a remarkable longitudinal dataset incorporating nearly a half-century’s worth of de-identified patient-level data from the Duke Databank for Cardiovascular Diseases. This resource, which leverages analytical insights from SAS and includes detailed data from more than 100,000 cardiovascular procedures in over 50,000 patients, showcases a new path forward-one in which academic institutions and their partners can share immensely valuable data with appropriate governance and safeguards. These safeguards are reinforced by SOAR’s independent review committee, which vets proposals for use of datasets, enforces strict standards for anonymizing data and ensuring patient privacy, and mandates that any manuscripts produced using these shared datasets are independently reviewed before being submitted for publication.

Other important initiatives, including the Yale University Open Data Access (YODA) Project (itself a collaboration between Yale and Medtronic/Johnson & Johnson) are adopting similar approaches to providing transparent access to trial data while also ensuring important quality safeguards and protections for study participants. Another data-sharing project, ClinicalStudyDataRequest.com, was initiated in 2013 by GlaxoSmithKline. Now run independently by the Wellcome Trust, the site has a growing list of participants and facilitates controlled access to data from more than 3,000 clinical trials.6 In addition, the AllTrials Initiative has assembled a large international coalition of academics, publishers, research organizations, and foundations, all of whom are banding together to encourage the universal registration and reporting of clinical trials. Although participation in many of these programs by sponsors and institutions has been gratifying, uptake by the broader research community has been relatively slow in coming,7 and further investigation into the reasons for this may be needed.

Another possible approach that aims to balance the needs of all stakeholders is being put forward by the ACCESS CV consortium.8 This group of academic cardiovascular researchers, of which I am part, is working to adapt the ICMJE’s recommendations by combining transparent and timely data sharing with reasonable provisions that accommodate the legitimate concerns of patients, researchers, and other stakeholders. These concerns include ensuring responsible and scientifically valid use of study data, protecting the privacy and other interests of research participants, facilitating access in ways that acknowledge real-world constraints imposed by logistics and resources, and creating fair and appropriate standards for academic acknowledgment and credit.
Despite lingering complications and some differences of opinion about details, it’s clear that biomedical research is entering a new, more transparent era in which open access to clinical trial data is the rule rather than the rare exception. There’s no doubt that many operational details still need to be worked through. But my hope is that the world of clinical research rises to the challenges of data sharing “done in the right way,” and that and we as a society all contribute to shaping workable standards to achieve the greatest good. By building on pioneering efforts in this field, we can show what is possible with a thoughtful, collaborative approach to sharing information that benefits the entire community-researchers, patients, physicians, and the public.

Eric D. Peterson MD, MPH, FAHA, FACC, Executive Director, Duke Clinical Research Institute Professor of Medicine, Cardiology


  1. Whitehouse.gov. Remarks by the President in Precision Medicine Panel Discussion. February 25, 2016. Available at: https://www.whitehouse.gov/the-press-office/2016/02/25/remarks-president-precision-medicine-panel-discussion. Accessed November 3, 2016.
  2. Horton R. Offline: Data sharing – why editors may have got it wrong. Lancet 2016;388:1143.
  3. Freedman LP, Cockburn IM, Simcoe TS. The economics of reproducibility in preclinical research. PLOS Biol 2015. http://dx.doi.org/10.1371/journal.pbio.1002165
  4. Ioannidis P. Why most published research findings are false. PLoS Med 2005;2(8):e124.
  5. Pencina MJ, Louzao DM, McCourt BJ, et al. Supporting open access to clinical trial data for researchers: the Duke Clinical Research Institute – Bristol-Myers Squibb Supporting Open Access to Researchers Initiative. Am Heart J 2016;172:64-9.
  6. Rockhold F, Nissen P, Freeman A. Data sharing at a crossroads. N Engl J Med 2016;375:1115-17.
  7. Navar AM, Pencina MJ, Rymer JA, Louzao DM, Peterson ED. Use of open access platforms for clinical trial data. JAMA 2016;315:1283-4.
  8. The Academic Research Organization Consortium for Continuing Evaluation of Scientific Studies – Cardiovascular (ACCESS CV). Sharing data from cardiovascular clinical trials – a proposal. N Engl J Med 2016 375;407-9.
© 2024 MJH Life Sciences

All rights reserved.