Clinical Trial Data Stewardship


Applied Clinical Trials

Amid the push for clinical trials to adopt more modern digital data technologies, the job of assuring adherence to key data quality and integrity principles is achievable under current regulations governing electronic record-keeping. Doing so, however, will require a new and reinvented framework: data stewardship.

The introduction of mobile devices and software as a service (SaaS) into the conduct of clinical trials makes many of us in the clinical quality assurance (CQA) community unsettled. Even today, many CQA professionals, and some regulatory inspectors, do not feel comfortable or confident in the digital data environment having come of age in the world of three-part paper case report forms, hand-written logs, and hand-signed documents. Still, many of us feel our greatest competence lies in our knowledge and interpretation of the established predicate rules, International Conference on Harmonisation (ICH)-good clinical practice (GCP), which we generally view as completely independent of technology.

Consequently, two phenomena frequently manifest:

  • A highly conservative "this is not permitted" reaction toward digital technologies when they are proposed.

  • A "refer this to the validation/eCompliance group" reaction to regulatory application in the operational sphere when a computer system is involved, however tangentially.

While other heavily regulated industries have adopted modern, digital tools, clinical trials often remain technologically challenged because of these attitudes and perceptions.

Both reactions are unnecessary and result in self-imposed barriers to the successful adoption of key digital technologies in the clinical trial context. Both also indicate a very real need to provide clinical QA professionals with a truly comprehensive and comprehensible framework within which to apply their hard-won predicate rule knowledge and experience. Indeed, applying existing rules and principles to new media should not impede the compliant adoption of new technologies.

In truth, current electronic records, electronic signatures (ERES) regulation, and the predicate rules themselves need offer no significant barriers to the application of advancing technologies, including such advances as web-based tools and mobile devices, in the collection, processing, interpretation, publication, or archiving of clinical trial data. In fact, ERES regulations are written at such a high level that they can afford us measured choices in how to meet these principles in multiple ways.

At the heart of both ERES regulation and the predicate rules is this: What both industry and regulators want are data that are complete, correct, and forensically verifiable throughout their lifecycle.

Part 11 encapsulates this central concern in its definition of the objectives for computerized system validation (21 CFR 11.10 (a)):

  • Ensure accuracy, reliability, and consistent intended performance.

  • The ability to discern invalid or altered records.

These are the two central pillars of data integrity. The detailed requirements of the 21 CFR Part 11 elaborate for us the pragmatic yet high level measures required to attain and maintain these two central pillars. Indeed, these, along with the protection of the health, safety, and rights of patients and clinical trial subjects, are also really the central pillars of much of the GCP itself.

Applying and adhering to these key principles is entirely achievable under the current regulations. Doing so will consistently satisfy regulators, even when one is adopting very new technologies, and will also continue to be simply good business. However, to accomplish this, a new and reimagined framework for applying the regulations is required.

This reimagined framework, beyond the paper paradigm, is data stewardship.

Data stewardship defined

As a term, data stewardship is not a radically new concept. Perhaps most relevantly the term was invoked by Karen S. Baker and Lynn Yarmey in their 2009 article, Data Stewardship: Environmental Data Curation and a Web-of-Repositories.1 The authors addressed the topic in the context of the need to appropriately safeguard and preserve academic scientific research data gathered in the field and destined to move across multiple forums, technology systems, and potentially a variety of uses from public to private and commercial. Drawing on the concept of “stewardship” in land use and in the work of ecologists, Baker and Yarmey framed data stewardship this way:

Stewardship is a term that involves tending a community element not owned solely by one person.”

In their framing of data stewardship, the authors note the press of many of the same major technological changes we are experiencing in the clinical research environment:

“The advent of new measurement capabilities (e.g., autonomous vehicles and streaming technology), the increasing recognition of the value of long-term sampling strategies (e.g., time-series measurements and interdisciplinary studies) and the importance of data-sharing initiatives (e.g., master catalogues and interoperability efforts) all create unfamiliar data arrangements requiring identification of, and agreement on, new categories and descriptive standards, as well as expanded dynamic infrastructures for both local and large-scale data endeavors.

Data stewardship provides a conceptual framework for envisioning the flow of data amongst and between arenas. It provides an entrée to the notion of collective practices.”

This parallel in experience between the academic field research arena and the clinical research arena should not be surprising and the applicability of data stewardship in our GCP environment where responsibility for data also ranges across a very wide swath of actors, technologies, systems, infrastructures, and even potentially ultimate-end applications.



Stewardship for clinical trial data is a relevant topic for sponsors, health authorities, technology suppliers, and data management organizations. These stakeholders, along with the clinicians who collect clinical data, are the intended audience for this document. They are Clinical Data Stewards.

The term “data stewardship” implies an ethical obligation not necessarily bound or defined by ownership alone. Whoever one judges to be the “owner” of the data collected, clinical data are collected from patients for the ultimate benefit of patients in both safety and efficacy. This is the first principle and foundation of data stewardship. From the first moment of data collection, data stewards share the custody of that data and a profound responsibility for maintaining its integrity. Similar to a familial context, custody is not necessarily the responsibility of a single party. It is often a shared responsibility defined within the construct of legal agreements.

But shared custody is still real custody, and the shared responsibilities between sponsors, health authorities, and data management organizations have always been there. Out of respect for the owners of the data, this article is intended to clarify the roles, responsibilities and expectations for ethical handling of patient data collected in a clinical trial. It also suggests that a holistic understanding of the generation, management, and review of clinical data is necessary to promote effective data stewardship.

Data stewardship is a phrase specifically used in this article to guide the responsible parties that handle data during clinical trials. The research and development community consistently encounters new technologies, regulatory guidance, and organizational paradigms that affect the way clinical data is handled. These changes affect the operational models that govern the design, conduct, and reporting for clinical trials, and leave the interpretation and implementation up to the data stewards.

For these data stewards, each new organizational change, each new internal business process that is re-engineered, each new regulatory opinion that is published, and each new data collection system introduced to market can and does disrupt the status quo. While these disruptive changes can be successful and immediate, they still disrupt the clinical trial landscape.

Disruption has the potential to affect all data stewards. Any novelty, innovation, or streamlining can be daunting to what has been traditionally (and necessarily) a conservative and herculean endeavor for the research and development community. But if the novelty enhances data integrity by enabling more straightforward implementation of technology, if it makes data entry more intuitive and less burdensome for investigators and patients, or if it increases compliance to protocol and regulations, then the novelty can be considered true innovation to the clinical trial process. The disruption can be considered a positive one.

Good old-fashioned communication and contracts between the sponsor, the investigator, and the technology providers should provide clarity and establish boundaries for who handles the data along the clinical data chain-of-custody. Strong partnership with well-defined lines of communication is a key driver for successful data stewardship.

Internal organizational change (for all data stewards) often results in shifts with respect to internal data stewardship roles; so while it is important to establish shared responsibility between organizational entities external to each other, it is equally important to ensure that the chain-of-custody is well understood by each internal party as well.

For instance, a sponsor’s IT department may provide technology and validation support, but true stewardship is the responsibility of the business process owner. Sponsor quality assurance can only reflect, and not resolve, problems in this area. If the owner cannot answer questions about data integrity, then CQA has a challenge to educate them.



Sponsor efforts to be leaner, less bureaucratic, and more automated often cause a loss of institutional memory about systems and processes. While there is a chain-of-custody model to be established between sponsor, investigator, and data management (including technology providers), sponsors have unique challenges to maintain a strong sense of internal stewardship between business, QA, and IT. Sponsors who embrace data stewardship internally can expect healthier relationships with their suppliers as well as with health authorities.

Health authorities are data stewards, as they serve their respective publics and are responsible for public safety. Perhaps this is stating the obvious, but they face unique challenges as well. Health authorities are clearly challenged by the ever-advancing transition from the paper-based clinical trial landscape to the modern technologies that support digital clinical trials. They have to abandon the illusion of stronger custody where paper is the first point of data collection, and accept that well-implemented systems can effectively accelerate and improve the paper process.

Health authorities are also challenged to modernize their perspective on risk and to better communicate across jurisdictions and with other data stewards. Published guidance from different agencies and locations do not always coincide perfectly on paper; resources are constrained to the point that they cannot be everywhere-at-once, and they need to take care to not be so prescriptive that their guidance becomes a constraint to innovation.

Technology providers, data management organizations, and contract research organization (CROs) get the fun part of the job, as they are most poised to introduce novelty and innovation to the process landscape, and their business success largely depends upon it. One reason is these suppliers are simply more nimble than sponsors and health authorities. They can rapidly develop systems and are typically not encumbered by 10-year R&D timelines. While they are at the mercy of the sponsor (as much as the sponsor is at the mercy of the health authorities), they have opportunities to innovate by designing compliance into their solutions.

If the tech providers can introduce compliant, useful tools that enable data management organizations to leverage the clinical data via efficient and clear processes, then they, too, are data stewards inasmuch as a computerized system and its infrastructure are themselves inextricably connected with the completeness and integrity of the data. This is an emerging ethical and contractual reality we believe is inescapable in the digital world, one clearly defined in the privacy realm by HIPAA for SaaS and infrastructure as a service (IaaS) cloud providers.

Reduces risk to data integrity

Effective data stewardship creates a paradigm where clinical data can be used to deliver safe and effective therapies to patients, evolve the clinical trial model, and satisfy the needs of the public with greater transparency.

By recasting the application of ERES regulation and the predicate rules within the framework of data stewardship, it becomes more readily feasible to provide guidance for data stewards by outlining specific responsibilities during key events along the data chain-of-custody. These events start with trial setup and end with database lock.

Ahead, we discuss the key regulations in more detail to indicate the compatibility and, indeed, clarification and enrichment data stewardship brings to existing regulations and guidance.

ICH-GCP and data stewardship

cGCP and ICH both distinguish between the record-keeping responsibilities of the investigator and the sponsor. Broadly speaking, both require the investigator (ICH E6 4.9 /21 CFR 312.62)2 to keep a complete and accurate patient case history and require the sponsor (ICH E6 5.5 /ICH E93.6/21 CFR 312.57) to keep complete, timely, and accurate records directly related to the investigational drug and the conduct of the trial (ICH E9 3.6).

ICH E6 6.103 establishes the sponsor responsibility to ensure investigators will permit "direct access" to source data/documents in order to enable virtual monitoring, audits, and regulatory inspections. It is important to note that in the contemporary digital environment such "direct access" needs to extend well beyond the line-of-sight, scope, and control of investigators through multiple entities, including service and infrastructure providers as well as data sources such as electronic health records.



ICH E6 8.04 looks to render the focus on record-keeping still more specifically by defining the essential trial documents and identifying for each the principal document owner/holder. However, we need to acknowledge that in the digital age, a document, rather than being a discrete physical entity, can actually be one of many potential aggregations of included data that are themselves merely logically assembled, malleable as to association and format, and highly location independent.

The document is truly a metaphor, a fiction retained mostly for our linguistic convenience, but it is the data content on which we need to focus or the regulation will bring us to ultimately founder on a metaphor rather than adapting to actual reality. Fortunately, the currently proposed Addendum to E6 8.1 uses phrasing that can allow us to more reliably know how to apply the documentation requirements of E6 8.0 in this new digital context, as it opens its focus to more technologically adaptable concepts such as data access and data control. These concepts are, in turn, also relevant and well applicable within the framework of data stewardship.

As early as the preamble to the original publication of the final rule, FDA has specifically sought to assure industry that 21 CFR Part 11 is meant to enable, not inhibit, the adoption of new technology.5

The agency believes that the provisions of the final rule afford firms considerable flexibility while providing a baseline level of confidence that records maintained in accordance with the rule will be of high integrity.(21 CFR Part 11 Preamble III. C. 3)

FDA further went to great lengths to tie 21 CFR Part 11 very explicitly to the predicate rules in their (still in force) Guidance for Industry: Part 11 Electronic Records; Electronic Signatures Regulation-Scope and Application (2003)6 in which the phrase “predicate rules” is used no less than 34 times in a document of nine pages. At the time this guidance was issued, the agency also stated its belief and apparent intention to revise the Part 11 rule-although the authors believe a close reading shows the agency’s concern was focused on the cGMP application of the rule. In all the subsequent years; however, it is clear the agency and industry have found themselves quite capable of applying the existing rule adequately in the GCP-ICH environment.

Interestingly, in 2014, Yashashri Shetty and Aafreen A. Saiyed published Analysis of warning letters issued by the US Food and Drug Administration to clinical investigators, institutional review boards and sponsors: a retrospective study, an analysis of warning letter issued to investigators and institutional review boards (IRBs) during 2011 and 2012 and to sponsors during the period from January 2005 to December 2012.7

While their analysis showed that 40% of warning letters issued to investigators and 30% of those issued to sponsors were attributed to failures in recordkeeping-clearly a Part 11 implicated area-they also noted that investigator performance in the maintenance of case histories and the retention and production of records actually appeared to improve when they compared the results of their study to those of previous published analyses. It is precisely the predicate rules pertaining to recordkeeping (case histories, study data, and drug shipment records) where one would expect the preponderance of citations related to Part 11 to be present, yet it was in this area that, at least for investigators, Shetty and Saiyed found evidence of improved performance. It is hard to read this as evidence of problems applying or understanding the electronic records regulation via the predicate rules.

Of necessity, the GCP regulations have generally left open a significant margin for interpretation by the regulators and industry. Over time, that margin does become somewhat narrowed by practice, as industry defines detailed processes and deliverables that establish lower level regulatory expectations. An example of this would be the document sets and associated practices and processes that now comprise the regulatory expectations associated with the broad regulatory requirement for computerized system validation. Similarly, it is very reasonable to shift the wider GCP/ICH paradigm for existing practice from a document-based framework to a true and genuinely comprehensive data stewardship framework that acknowledges and embraces the chain of custody concept.

The FDA Guidance for Industry: Electronic Source Data in Clinical Investigations (2013)8 also displays the agency’s increased awareness of the complexities and nuances that can develop in the interactions among systems and users as “data sources” and “data originators” are carefully defined and delineated entirely contextually. This focus on context and the recognition that a user’s role can shift across multiple systems is an important development. It means the agency is looking to a more fundamental level, always underlying the mere technology strata to enforce the regulated person’s responsibilities. This most recent guidance appears to renew and reinforce the agency’s commitment to Part 11 referenced nine times throughout the new guidance and to technology-agnostic reliance on the predicate rules, a stance that might allow the agency to remain more nimble in the face of the still rapidly evolving electronic records environment.



As detailed earlier, we can see that the regulations (and FDA guidance) already acknowledge some degree of shared responsibility for data integrity, although only the investigator and the sponsor share the responsibility articulated. In the digital world of contemporary clinical data collection, monitoring, interpretation and archiving a much larger set of actors is usually involved. This includes eClinical vendors, Internet service providers (ISPs), document management systems/providers, and other infrastructure service providers.

Just who is a data steward?    

This leads us to a critical question: just who is a data steward?  Some, especially at the infrastructure level, are arguably very far removed from the clinical trial data itself. The answer, we believe, is that even those concerned solely and entirely with infrastructure, are still data stewards within their domain. The reality is that these data never cease being regulated clinical data, and the networks and other virtual infrastructure that comprise the conduits and switches for these data are also integral to their sustained integrity and are de facto links in their chain of custody. Everyone responsible for the environment inhabited by the data as well as operations on or affecting those data are data stewards who must have unambiguous responsibility for the impacts of their actions on the integrity of the data.

Focusing on data stewardship accomplishes the genuine ends the regulations are intended to achieve, but does so by acknowledging the digital reality, the complex web work within which data are actually collected, maintained, managed, transmitted, shared, and protected in the current electronic environment. In the digital environment, multiple entities have both shared and unique responsibilities for the protection of the data from its production through to its final archival preservation and, ultimately, its destruction. This approach supports attributable, legible, contemporaneous, original, and accurate (ALCOA) principles. Data integrity responsibility can be shared transparently through the entire complex data chain of custody; a "chain" which is far more persistent, a molecular chain of sorts, than it is sequential in the contemporary digital environment.

Just as computerized system validation has adopted a full life-cycle approach, so, too, must the regulations, as they utilize the comprehensive concept of data stewardship. Data stewards vary somewhat, but largely they are:

  • Trial sponsors

  • Investigators

  • CROs

  • Software developers, including trial-specific application developers

  • IT hosting vendors

  • Data destruction/archival vendors

While the regulation will always place the ultimate responsibility for data integrity with the trial sponsors and principal investigators, it will, likewise, always need to be applied across the highly interconnected entities upon which every sponsor and investigator must rely (to at least some extent) in the digital world. The interpretation of regulation will need to adjust from the old location-dependent paradigm of physical "original" documents to implementation in a world in which the data exist as "original" simultaneously across multiple networks, including often-independent domains (think "Cloud" storage here, as one very significant example).

When the “original” is a virtual entity, the concept of a “copy” is often problematic. As noted earlier, the newly proposed Addendum to E6 8.1 is a clear shift in this direction, a shift more aligned with the FDA’s 2013 Guidance on eSource.

As the FDA’s own eSource guidance reflects, the same data can be and often are appropriately represented on multiple systems in novel ways appropriate to a specific stage in the data acquisition, data entry, data management, and data analysis life cycle. These copies are both that-copies, and are still "original" data in a given system. This is a world quite different from that in which multiple entities might receive physical copies of a single multipart form.


Data stewardship is an approach that can translate the obligations, responsibilities, and intention of the regulation into this more-subtle and complex environment. The paper metaphor regulators have traditionally applied to their interpretations ignores the reality of the technology and hampers the successful assurance of quality and integrity the regulation is meant to ensure.

This article is a call to action to rethink the entire data "chain" as a full and persistent data life cycle. It endeavors to define data stewardship as a platform. It should inform and encourage data stewards to rethink their mission according to a more modern reality, where adherence to GCP is not only attainable, but also potentially more effective in the digitized world.


Tom Haag is Data Integrity Process Expert, eClinical Quality Assurance, Novartis Pharmaceuticals Corporation; Tom Koch is President, Lead Consultant, WTKoch GCP QA Consulting, LLC

Acknowledgements: Grant Simmons, Novartis Pharmaceuticals Corporation; Monica Cahilly, Green Mountain Quality Assurance, LLC; Phil Coran, Medidata Corporation



1. Baker, Karen S., and Yarmey, Lynn. Data Stewardship: Environmental Data Curation and a Web-of-Repositories.The International Journal of Data Curation, Issue 2, Vol. 4. (2009)

2. US Code of Federal Regulations Title 21 Part 312

3. ICH E6 Good Clinical Practice

4. ICH E8 General Considerations for Clinical Trials

5. US Code of Federal Regulations Title 21 Part 11

6. FDA Guidance for Industry: Part 11 Electronic Records; Electronic Signatures Regulation-Scope and Application (2003)

7. Shetty, Yashashri C., and Saiyed, Aafreen A. Analysis of warning letters issued by the US Food and Drug Administration to clinical investigators, institutional review boards and sponsors: a retrospective study. J Med Ethics, published online June 25, 2014

8. FDA Guidance for Industry: Electronic Source Data in Clinical Investigations (2013)

Related Content
© 2024 MJH Life Sciences

All rights reserved.