Anonymization and Redaction of Clinical Trials According to the EU Regulation


Applied Clinical Trials

Outlining the techniques for anonymization of clinical study reports and the identification and redaction of commercially confidential information to comply with EMA's Policy 0070 on trial data disclosure and transparency.

The European Medicines Agency (EMA) is committed to continuously extending its approach to clinical trials data transparency. In October 2014, the agency released Policy 0070/2014, with the purpose to make medicine development more efficient, to foster public scrutiny to clinical study information by the scientific community, and to develop knowledge in the interest of public health, while promoting a better-informed use of medicines.1 According to EMA, “A high degree of transparency will take regulatory decision-making one step closer to EU citizens, and promote better-informed use of medicines. […] access to clinical data will benefit public health in future.”1

The scope of EMA Policy 0070

The scope of the EMA policy on publication of clinical data for medicinal products for human use1 relates to proactively sharing study-level and patient–level clinical data, (i.e., clinical reports and Individual patient data (IPD), submitted under the centralized marketing authorization procedure after Jan. 1, 2015.

The policy serves as a complementary tool ahead of the implementation of the EU Clinical Trial Regulation No. 536/2014.2

The policy does not concern:

  • Clinical data held by EMA for marketing authorization application (MAA) submitted under the centralized procedure before Jan. 1, 2015, and for extension of indication applications and line extension applications submitted before July 1, 2015.

  • Clinical data (either data provided to EMA before Jan. 1, 2015, or data not yet held by the agency) submitted to EMA for non-centrally authorized products.

These clinical data continue to be made available to external requesters on a reactive basis in accordance with the EMA’s policy on access to documents related to medicinal products for human and veterinary use (POLICY/0043) effective from Dec. 1, 2010.3

The publication procedure

The procedure goes through two sequential phases:

  • The first phase of disclosure (i.e., the publication of clinical reports supporting centralized MAA submitted only after Jan. 1, 2015, and-starting from Jan. 1., 2015-extension of indications and line extension applications relating to existing centrally authorized medicinal products.

  • The second phase, regarding the provision of IPD, was foreseen for 2016 but, while at the writing this paper (February 2017), no indication from EMA was released.

The publication procedure for clinical reports is based on two pillars:

  • Terms of use (ToU), which govern the access to and use of clinical reports.

  • A user-friendly technical tool allowing access to such clinical reports.

The policy establishes methods for balancing the protection of patient’s privacy, through the anonymization/ de-identification of the protected personal data (PPD), while sharing clinical trial data and topics considered potential commercially confidential information (CCI) for redaction.

Part 1 - Anonymization of clinical reports for publication

On July 6, 2015, EMA issued a guidance to the pharmaceutical industry on anonymization of clinical reports,4 in the context of Phase I of the policy (i.e., the publication of clinical reports on the EMA website). The guidance, whose terms and indications were briefly anticipated during an EMA webinar of June 24  2015, aims at assisting companies by recommending methods, techniques, and processes that could be applied to clinical reports, for the purpose of achieving adequate anonymization while retaining a maximum of scientifically useful information on medicinal products for the benefit of the public. A new release of this EMA guidance, completed by a summary of changes,5 was issued April 11, 2017.6

Marketing authorization holders (MAHs)/applicants have the responsibility for submitting clinical reports that were previously rendered anonymous/de-identified.

Anonymization techniques4,6

The data in the clinical reports must be processed in such a way that they can no longer be used to identify a natural person by using “all the means likely reasonably to be used” by either the controller or a third party.7,8

The same data can be adequately anonymized in different ways, depending on the context of the data release. When selecting the most appropriate technique, the specificities of the clinical data should be taken into consideration.

Anonymization, a field of active research and rapidly evolving, makes available to MAHs/applicants several techniques. Each of them has its strengths and weaknesses. According to the Article 29 Working Party Opinion,the techniques that could be applicable to clinical reports are:

  • Masking

  • Randomization

  • Generalization

Randomization and generalization techniques are recommended in order to optimize the clinical usefulness of the information published.6

Options to establish data set anonymization

Two options are available to establish if the dataset is anonymized:8

1.      Demonstrate that, after anonymization, the following actions are no longer possible:

  • Singling out: possibility to isolate some records of an individual in the dataset.i

  • Linkability: ability to link at least two records concerning the same data subject or a group of data subjects (in the same database or in two different databases).

  • Inference: the possibility to deduce, with significant probability, the value of an attribute from the values of a set of other attributes.

It is up to a sponsor taking due account of the ultimate purpose and use of the clinical reports to decide which option to use (demonstrate that after anonymization all three criteria are fulfilled: singling out, linkability and inference, or perform a risk assessment).6

The sponsor is also in charge of deciding which anonymization techniques to use in order to achieve adequate anonymization, while retaining a maximum of scientifically useful information. The legislation is not prescriptive about the techniques to be used by data controllers.8


In the context of phase 1 of policy 0070, dataset is the set of clinical reports published by the EMA.


2. Perform an analysis of re-identification risk.

It is important to note that de-identification does not reduce the risk of re-identification of a data set to zero. Rather, the process produces data sets for which the risk of re-identification is very small.9

There are in fact three plausible re-identification attacks on the data by an adversary that need to be protected against, as summarized in Table 2.10

Measuring the risk of re-identification involves selecting an appropriate metric, a suitable threshold and the actual measurement of the risk in the clinical data information to be disclosed. The choice of a metric depends on the context of data release.

Setting an acceptable threshold encompasses:

  • The evaluation of the existing mitigation controls (none in the context of public disclosure).

  • The extent to which a particular disclosure would be an invasion of privacy to the trial participant.

  • The motives and the capacity of the attacker to re-identify the data.

Once a threshold has been determined, the actual probability of re-identification can be measured. MAHs/applicants are encouraged to use quantitative methods to measure the risk of re-identification as soon as they are in a position to do so.6

EMA recommendation to best achieve anonymization of PPD of trial participants

There are several sections with data results in clinical reports that may contain personal data of trial participants. These include:

  • Disposition (recruiting, pre-assignment, period/arms of the trial, etc.) of trial participants

  • Protocol deviations

  • Demographics

  • Other baseline characteristics

  • Treatment compliance

  • Pharmacodynamics

  • Pharmacokinetics

  • Efficacy

  • Safety (adverse events, laboratory findings, and vital signs).6

In general, clinical overviews (CTD mod. 2.5) and clinical summaries (CTD mod. 2.7) do not contain personal data related to trial participants, with the exception of the Narratives of the Clinical Summary. In addition, some of the tables included in the clinical overviews and clinical summaries may also contain personal data.

Anonymization of direct identifiers and quasi identifiers

A classification of variables into direct and quasi-identifiers for clinical trials has been completed by a PhUSE (Pharmaceutical Users Software Exchange) working group.11

Any direct identifiers (e.g., name, email, phone number, social security number, signature, full address, clinical trial participant numbers, and medical device serial numbers) should be removed, or in the case of unique identifiers, like Patient ID numbers, at least pseudonymizedii. There are established standards for such pseudonymization.12

The Guidance Regarding Methods for De-identification of Protected Health Information in Accordance with the Health Insurance Portability and Accountability Act (HIPAA) identifies 18 direct identifiers within the frame of the Safe Harbor method.13


ii Pseudonymization consists of replacing one attribute (typically a unique attribute) in a record by another. The natural person is still likely to be identified indirectly. Pseudonymisation reduces the linkability of a dataset with the original identity of a data subject.


Quasi-identifiers, which consist mostly of dates, location information, demographics, socioeconomic information, rare diagnoses, concomitant illnesses and medications, and serious adverse events such as death, hospitalization, and birth defects, cannot just be removed as these variables are very useful for the analysis.10 More sophisticated techniques (e.g., generalization) need to be applied to retain the value of these variables but also reduce the probability that these variables can re-identify participants.14 The need to redact quasi identifiers will depend on the following aspects:

  • Number of quasi identifiers per trial participant

  • Frequency of trial participants with same category/value on a set of the quasi identifiers (group size)

  • Size of a population

It is up to the MAH/applicant to decide which quasi identifiers need to be redacted and which could remain in the reports. The rationale for the decision should be included in the risk assessment section of the anonymization report to be provided to EMA.6

A detailed approach to a wider set of de-identification techniques for quasi-identifiers is available in El Emam et al, 2015.10

De-identification and data sharing: Other available standards

While considering the existing EMA guidance, other recommendations have been produced with the aim of promoting an approach that balances data utility and privacy risk and is applicable across clinical trial data holders.13HIPAA Privacy Rule15 suggests two general approaches to de-identification are exemplified by two different methods:

  • Expert Determination: Requires a statistical expert to apply statistical and scientific principles in order to render data not individually identifiable or such that the risk of re-identification is very small.

  • Safe Harbor: Requires removal of 18 direct identifiers that could be used to identify the individual or the individual’s relatives, employers, or household members, many of which are not routinely collected in clinical trials and applies to U.S. population.13

Both independent authors16,17,18,19,20 and private organizations21,22 have published their own opinions on responsible clinical data sharing, while outlining the evolving role of statisticians in the data sharing and in its success.23 In particular, TransCelerate BioPharma Inc. published a guidance, Data De-identification and Anonymization of IndividualPatient Data in Clinical Studies. Other standards on anonymization are available.11,24,25  



Personal data of individuals other than trial participants

The EMA performed a privacy impact assessment (PIA) to establish the functionalities of the database, in particular with regard to the data fields to be made publicly accessible.

Personal data of individuals other than patients (i.e., investigators), sponsor staff, and MAH/applicant staff will not be published, with the exception of the sponsor and coordinating investigator signatories of the clinical study report and the identities of the investigator(s) who conducted the trial and their sites.

In any case, the contact details and signatures of these individuals should be redacted. Data pertaining to the above exception in other parts of the clinical study report (CSR) will be redacted, as they may give away geographical information (e.g., site number, site address, investigator names) that could be linked to patients and, hence, may enable their identification.6 It was noted that the reductions published until April 2017 were largely inconsistent with the EMA guidance in this regard.26

Part 2 - Identification and redaction of commercially confidential information

On Dec. 19, 2016, EMA published a new release of the External guidance on the identification and redaction of commercially confidential information (CCI) in clinical reports submitted to the agency for the purpose of publication in accordance with EMA Policy 0070.27 The key contents of these guidelines were anticipated during the EMA webinar28,29 concerning the Policy 0070.

The guidance is a working tool and a reference document for pharmaceutical companies, aimed at supporting them for the preparation of their justifications regarding CCI in documents that fall under the scope of the Policy 0070. Annex 3 of the policy provides MAH/applicants with “redaction principles” to identify certain types of information that can potentially be considered CCI.1 EMA will scrutinize the justification for the redaction of CCI in order to assess whether the definition of CCI applies.

While providing a comprehensive overview and syncretic details on how the redaction of CCI is to be handled within the context of Policy 0070, the guidance ensures a common understanding of what can or cannot be considered CCI within clinical reports. It also ensures a good quality of the justifications for the proposed redactions.

Points to consider for the preparation of the redaction proposal of a clinical report

CCI shall mean any information contained in the clinical reports submitted to the EMA by the MAH/applicant that is not in the public domain or publicly available and where disclosure may undermine the legitimate economic interest of the MAH/applicant.26

Prior to proposing any redactions, the MAH/applicant should be aware of the level of information already available in the public domain concerning its product’s development, scientific knowledge, and advancements within the relevant therapeutic area(s). Such preparatory work by the MAH/applicant is essential, as it enables an expedited consultation process, and thereby reduces the probability that EMA will reject proposed redactions because the information is already in the public domain.

Information that EMA does not consider CCI

  • Information available in the public domain from various sources (company website; scientific guidelines; clinical trial registries; websites of other regulatory authorities within and outside of the EU; scientific literature and articles such as textbooks, PubMed, Medline, patent application).

  • Information that does not bear any innovative features (information reflecting common knowledge shared within the scientific community via scientific literature and articles [Textbooks, PubMed, Medline] or scientific and regulatory guidelines and guidance documents, treatment guidelines).

  • Information of public interest that, in the EMA’s view, does not constitute CCI.


Information that may be considered CCI

The information listed the Policy 0070 Annex III1 may be considered CCI and, therefore, is supposed to be adequately justified.

The Redaction Principles should not be perceived by the MAH/applicant as an open and unconditional invitation to propose, on a regular basis, the redaction of information.

If the MAH/applicant identifies a piece of information-a word or figure, part of a sentence, part of a paragraph-that it wishes including among the proposed redactions, it has to ensure that the information in question:

  • DOES NOT fall under Section 34.2 of the External guidance document1

  • DOES fall under the types of information that may potentially be considered CCI according to Policy 0070 Annex 3.26

The Justifications suggested in Annex III are not considered relevant, and, therefore, will be rejected.26

The MAH/applicant is discouraged from proposing the redaction of entire pages, subsections of a report, or full tables, especially when, in their view, only some sentences within the text or some specific figures within the tables fall under the types of information described in Annex 3.

The justification table and its use

The EMA considers the Justification Table a living document reflecting the justifications the company puts forward and the EMA’s conclusions (see Figure 1; click to enlarge).

Click to enlarge

According to Annex III, the justification table should contain justifications for all pieces of text considered as CCI and proposed for redaction. Should the company highlight a piece of text proposed for redaction, but fail to explain its redaction in the justification table, the proposal will be considered invalid and sent back to the company for clarification.iii

The justification table is used as a communication tool between EMA and the sponsor during the whole redaction consultation process.

Each table should list all proposed CCI redactions of a clinical report, and should be fully completed by the MAHs/applicants. The justification table is not part of the documents to be published.

Expected level of details for the justification

The EMA External guidance26underlines that the applicants are expected to submit a specific, pertinent, relevant, not overstated, and appropriate justification for each of the pieces of text proposed to be redacted.

The justification wording has to meet the following criteria:

  • Clearly refer to/identify the information proposed to be redacted.

  • Highlight the innovative features of the information.

  • Explicitly indicate to which ongoing development program the information relates to.

  • Explain how the disclosure of the concerned information would undermine the MAH’s/applicant’s economic interest or competitive position.

According to a recent paper by El Emam,26 “It is evident from the redaction approaches that have been applied thus far that the manufacturers have erred toward being more conservative and tilting the privacy/utility balance toward protecting patient privacy.”

iiiEach submitted clinical report requires a separate justification table that has to be submitted as Word document. Accordingly, the MAH/applicant is expected to indicate clearly which justification table corresponds to which clinical report.


Evaluation process of the proposed CCI redactions

If EMA considers the justification non-sufficiently detailed, additional clarifications will be requested. Failure to provide the requested clarifications within a reasonable time frame would render the available justification insufficient.

Should the agency consider the provided justification not sufficiently specific or too vague, the following rejection codes will be included in the justification table: CCI – Rejection 04 – Insufficient justification.

 Whenever the justification provided by the MAH/applicant does not correspond to/match the (type of) information proposed for redaction (i.e., is not relevant to the information proposed to be redacted), the following rejection code will be used in the justification table: CCI - Rejection 05 – Irrelevant justification.

The current debate on the nature of CCIs

It is worth mentioning that the debate on the definition of CCI is still open. A draft policy EMA published in June 201330,31 suggested that CSRs do not contain CCI and, therefore, could be released with no redactions. Later, commenting on the conclusions of the European Ombudsman about the EMA’s partial refusal to give public access to studies related to the approval of a medicinal product,32,33 the agency’s spokesman argued that there is no agreed or binding definition of CCI in European Union legislation, and that its own guidance “makes clear that the vast majority of the information contained in clinical reports is not considered CCI. The guidance clarifies which type of data the EMA would typically refuse as being CCI and how the redaction of such data will be handled.”34

Direct experience of anonymization and redaction

According to our experience, it is quite difficult putting into practice the theoretical principles expressed in the EMA Guidance and in the other reference documents available so far. This is due not only to the amount of information that, within the clinical reports, can be subject to interpretation, but also to the lack of practical examples in the guidelines. An improvement in this direction has been shown by the most recent update of the External guidance.6

Anonymizing and redacting the information contained in clinical trials synopses, we agreed with the sponsors to de-identify:

  • The name and surname of the principal/coordinating investigator and of possible other investigators.

  • The study centers’ names, addresses, and relative country, especially when they correspond to the investigator’s address.

The only exception in the anonymization of this information was:

  • The investigators’ names and reference centers are of public domain because they are already published (e.g., published on public sites like

  • The patients when they are reported in the text with a specific number code (e.g., in the conclusions and/or in the safety sections of the synopses).

We suggest keeping clear the investigators’ titles and positions inside their organizations/institutions.

As far as redaction is concerned, we agreed with the sponsors to prudentially redact the batch numbers of the test product and the reference therapies and their relative expire date/recheck date to avoid any possible identification.

In the case of clinical reports publication under Policy 0070, all clinical reports submitted as part of a regulatory application are subject to publication and, therefore, need to be redacted. Specific attention should be dedicated to the Leaf title naming in index XML of eCTD submission and the corresponding file names for the PDF documents.

The completion of the redaction procedure also involves the editing of:

  • The Anonymization Report (see the Annex 1.2 of the External guidance)

  • A Justification Report for each CTD module that underwent the removal of one or more CCIs (see the Annex 1.10 of the External guidance). 


The publication of clinical data for medicinal products for human use is one the clinical trial transparency (CTT) procedures that MAHs/applicants are supposed to cope with from now on. Nevertheless, only some of them demonstrate to have a clear view of this complex and time-consuming activity. It is our opinion that CTT will impact both the regulatory status of medicinal products and, in the mid-term, the MAH/applicant’s reputation. We, therefore, encourage MAH/applicants to dedicate the needed resources to the CTT activities. These include the procedure to upload information of clinical trials to the EudraCT platform, the publication of CSRs supporting centralized MAA, the editing of the layperson summaries, and the disclosure of patient-level data to specific requests from the scientific community.


M. Zaninelli, MA, and E. Ornago, MSc, both with Maxer Consulting s.r.l.; A. Ferrari, MD, PhD, with Erydel S.p.A.



1. EMA policy on publication of clinical data for medicinal products for human use (Policy 0070).

2. Regulation (EU) No 536/2014 of the European Parliament and of the Council of 16 April 2014 on clinical trials on medicinal products for human use, and repealing Directive 2001/20/EC

3. European Medicines Agency policy on access to documents (related to medicinal products for human and veterinary use) - POLICY/0043

4. Dias M. Guidance on the anonymization of clinical reports for the purpose of publication in accordance with policy 0070.

5. Summary of changes to the “External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use”

6. External guidance on the anonymization of clinical reports for the purpose of publication in accordance with EMA Policy

7. Directive 95/46/EC of the European Parliament and of the Council of 24 October 1995 on the protection of individuals with regard to the processing of personal data and on the free movement of such data,

8. The Working Party on the Protection of Individuals with regard to the Processing of Personal Data. Article 29 Data Protection Working Party. WP216. Opinion 05/2014 on Anonymization Techniques. 2014.

9. Information and Privacy Commisioner of Ontario, De-identification Guidelines for Structured Data, 2016.

10. El Emam K et al, De-identifying Clinical Trials Data, Applied Clinical Trials, 2015,

11. PhUSE De-Identification Working Group, De-Identification Standards for CDISC SDTM 3.2, 2015,

12. Health informatics. Pseudonymization, ISO, International Standard ISO/TS 25237:2008, 2008, Data on file

13. Tucker K et al, Protecting patient privacy when sharing patient-level data from clinical trials, BMC Medical Research Methodology 2016, 16(Suppl 1):77,

14. El Emam K and Arbuckle L, Anonymizing Health Data: Case Studies and Methods to Get You Started. O’Reilly, 2013,

15. US Department of Health and Human Services. Guidance Regarding Methods for Deidentification of Protected Health Information in Accordance with the Health Insurance Portability and Accountability Act (HIPAA) Privacy Rule 2012.

16. Aggarwal CC, Yu PS. A General Survey of Privacy-Preserving Data Mining Models and Algorithms. In: Aggarwal CC, Yu PS, editors. Privacy-Preserving Data Mining: Models and Algorithms. Boston, MA: Springer US; 2008. p. 11–52., Data on file

17. Mello MM, Francer JK, Wilenzick M, Teden P, Bierer BE, Barnes M. Preparing for Responsible Sharing of Clinical Trial data. N Engl J Med. 2013;369:1651–8.

18. Hughes, S., Wells, K., McSorley, P. and Freeman, A. Preparing individual patient data from clinical trials for sharing: the GlaxoSmithKline approach. Pharmaceut. Statist. 2014; doi: 10.1002/pst.1615. Data on file

19. Gibson B, Multi-Sponsor Data Transparency: A Group Approach To Sharing, Phuse, 2014,

20. Taichman DB, Backus J, Baethge C, Bauchner H, de Leeuw PW, Drazen JM,Fletcher J, Frizelle FA, Groves T, Haileamlak A. Sharing Clinical Trial Data: A Proposal from the International Committee of Medical Journal Editors. PLoS Med. 2016;13(1):e1001950.

21. European Federation of Pharmaceutical Industries and Associates (EFPIA) – PhRMA. Principles for Responsible Clinical Trial Data Sharing: Our Commitment to patients and researchers. 2013.

22. Transcelerate - Data De-identification and Anonymization of Individual Patient Data in Clinical Studies  – A Model Approach,

23. Manamley N, Mallett S, Sydes MR, Hollis S, Scrimgeour A, Burger HU, Urban HJ.Data sharing and the evolving role of statisticians. BMC Med Res Methodol. 2016 Jul 8;16 Suppl 1:75. doi: 10.1186/s12874-016-0172-9,

24. Information Commissioner’s Office (ICO) Code of Practice. Anonymization: managing data protection risk

25. Sharing clinical trial data: Maximizing benefits, minimizing risk. Institute of Medicine (IOM)

26. El Emam K, An Analysis of Anonymization Practices in Initial Data Releases Pursuant to EMA Policy 0070, Applied Clinical Trials, April 13, 2017

27. Chapter 4 - External guidance on the identification and redaction of commercially confidential information in clinical reports submitted to EMA for the purpose of publication in accordance with EMA Policy 0070

28. Henry-Eude Anne-Sophie Redaction Consultation Process, Assessment of justification for proposed redactions of commercially confidential information, June 24, 2015, London (webinar)

29. Henry-Eude Anne-Sophie Guidance to pharmaceutical industry on redacting commercially confidential information (CCI) in clinical reports, June 24, 2015, London (webinar)

30. EMA (European Medicines Agency). 2013. Publication and access to clinical-trial data.  (accessed October 15, 2014).

31. Wathion, N., and EMA. 2014. Finalisation of EMA policy on publication of and access to clinical trial data.  (accessed December 16, 2014).

32. O'Reilly E, Decision on own-initiative inquiry OI/3/2014/FOR concerning the partial refusal of the European Medicines Agency to give public access to studies related to the approval of a medicinal product, European Ombudsman, 2016.

33. Gøtzsche PC, AbbVie considers harms to be commercially confidential information: sign a petition, BMJ 2013;347:e7569. Data on file.

34. Silverman E, European ombudsman urges regulator to get tough on redacting study data, Stat, 2016,

Related Content
© 2024 MJH Life Sciences

All rights reserved.