OR WAIT null SECS
Stephen Senn, PhD, CStat, is Professor of Pharmaceutical and Health Statistics at University College London, Department of Statistical Science, 1-19 Torrington Place, London WC1E 6BT, UK, +44 20 7679 1698, fax +44 87 0052 3357, email: email@example.com. He is a member of the Applied Clinical Trials editorial board. His book, Dicing with Death (2003), a popular account of medical statistics, is published by Cambridge University Press.
How applying statistics to the design, conduct, and analysis of Phase I trials can improve clinical research.
On the morning of March 13, 2006, eight healthy volunteers forming the first cohort of a planned group escalation study of TeGenero's monoclonal antibody TGN 1412 at a Phase I clinical research facility adjacent to Northwick Park Hospital were given their treatment. Within a short period of time, the six men who had been allocated to TGN 1412 began showing signs of an adverse reaction. In fact, they were all showing the first signs of a severe cytokine storm. By that evening, all six had been admitted to intensive care.
None of the six died as a result of the treatment, but all suffered severe adverse reactions—and in some cases, these have had long-term consequences on their health.
This was possibly the most dramatic example of adverse events due to pharmaceuticals since the thalidomide tragedy of the early 1960s and the incident attracted considerable media attention for several days. It was also, of course, the cause of much scientific debate and regulatory discussion and the impact is clearly being felt in Europe in regulatory legislation and guidance for Phase I studies.
One of the bodies that reacted was the Royal Statistical Society (RSS), which set up a working party under my chairmanship to make recommendations from a statistical perspective as to what might be done to improve the design, conduct, and analysis of such trials. That this is a relevant concern of a statistical society might come as a surprise to readers of Applied Clinical Trials, but it will not surprise most statisticians.
First, the RSS has a tradition of commenting on matters of public interest. Indeed, more than 15 years ago a working party of the RSS looked at the issue of the competence of European drug regulatory agencies in statistical matters1 and came to the conclusion that there was an urgent need for them to employ statisticians. One should not mistake subsequence for consequence, but it is widely believed that the report was influential in improving the regulatory position—although an important impetus was also given by the International Conference on Harmonization (ICH). Whatever the reason, it is certainly now the case that many European regulatory authorities (but not all) employ statisticians.
Study Outcome in a Conventional Format
The second point is that statistics is, of course, highly relevant to the analysis of Phase I studies and, inevitably, their design—although in view of the cursory and often ambiguous statements on intended analysis of Phase I studies, one would hardly think so. Of course, when several individuals suffer severe cytokine storms within a few hours of administering a monoclonal antibody, you do not need a formal statistical analysis to tell you what happened. Indeed, a conventional statistical analysis would be extremely misleading, as it would ignore most of the information. For example, Table 1 summarizes the outcome of the study in conventional form.
Such a table is conventionally analyzed using Fisher's exact test, which yields a one-sided P-value of 0.036, which—clearly—does not remotely begin to do justice to the situation. The point is that such an analysis makes no use of the fact that the background rate of such events is extremely rare. This might perhaps be appropriate when analyzing the occurrence of the sort of common and nonserious "nocebo" effects, such as headache or upset stomach, that could also easily occur under placebo.
So, we are all agreed that you do not need statistics to work out that TGN 1412 was poison in the doses administered. However, that is not the point. First, such clinical trials are not designed with the expectation that they will have the sort of disastrous and dramatic conclusion that the trial of TGN 1412 did. On the contrary, this sort of outcome, while it cannot be excluded, ought to be extremely rare.
Furthermore, while an important requirement is the risk from the point of the individual—and this is not reduced for the first individual taking the treatment by staggering administration—it is nevertheless unwise when a new class of compound is being considered to have all subjects treated within a very short time of each other. Second, results need to be capable of being analyzed appropriately, whatever the outcome of the trial. Most trials, one hopes, will not end so disastrously, but one needs to always be in a position to analyze the results properly.
It would be pointless for me to repeat in great detail the recommendations of our report, which is available from the RSS or on the Web,2 but I will pick out two or three salient features.
First, we discovered that it is not at all easy to obtain reliable data on the frequency of adverse reactions in first-in-man studies. Of course, not all treatments will be judged equally risky in advance of their first use in man, yet to calibrate risk assessments it is surely necessary to know the background level of risk. Since sponsors will find it difficult to share results with each other, we believe that the regulatory authorities have to take the lead in setting up databases and systems for collecting information and for sharing data.
We also consider that the business of consent is inadequately handled at the moment. The standard we promote is that of open protocol, hidden allocation. That is to say, sponsors must be willing in principle to share every detail of the intended design with human subjects, including all details of the method of treatment allocation. Only the allocation itself can be hidden by mutual agreement. Of course, we are well aware that not all subjects will wish to read right through the protocol, and we are not suggesting that this should replace the more concise statement generally provided as part of the consent procedure. It is an attitude that needs changing, however.
The sort of attitude in Phase III trials that permits the scandal of placebo run-ins, whereby all patients are deliberately deceived as to what they are receiving, must be changed.3,4 Indeed, we recommend that a carefully researched and justified risk assessment report should be prepared prior to first-in-man studies and that this should be available to all parties: treating physicians, insurers, and, of course, volunteers.
What is an acceptable risk is, of course, a moot point, and here we have identified that there are two criteria that have to be satisfied: risk to the individual and risk to society.
Suppose it were the case that an acceptable level of risk was 1 in 2000 of a severe adverse reaction for a single administration of a treatment in man. Suppose, in fact, our best risk assessment was 1 in 1000. This is clearly too high, but by having 50% of subjects given placebo, the risk could be reduced to 1 in 2000. Is this strategy acceptable? We think not. Such a strategy would not reduce the expected number of adverse reactions per trial.
We recommend, therefore, that this dual perspective be employed for first-in-man-studies: This risk must be acceptable to an individual entering the trial, and the risk must be acceptable to society. This may require that for certain treatments—unlike the case of the trial of TGN 1412—a proper interval must be justified and observed.
There are also many features of trial design that need to be thought about more carefully. For example, the protocol of the trial of TGN 1412 specified dose cohorts of six active plus two placebo, presumably because at the end it was anticipated that the placebo results would be pooled. However, this only makes sense if one believes the trial will reach the final dose step and might make it more difficult to judge whether it should (what about interim analyses?). In any case, it presupposes that trends over time can be ignored.
This may or may not be a reasonable assumption, but any protocol should address this, and as far as we are aware, the protocol of TGN 1412 was not in the least exceptional in that it did not.
Hindsight is an exact science. We are well aware that everybody can be wise in retrospect about what happened at Northwick Park. Unfortunately for the six young men involved, such hindsight is too late. However, it behooves all of us who plan, run or analyze trials to learn the lessons and to plan better in the future. It is in this spirit that we have prepared our report and hope that others will find it useful.
I thank all of my colleagues on the working party for their contributions to producing the report. This article, however, gives my personal account of some features of the report. For the authoritative account of the views of the working party, the report itself should be consulted.
Stephen Senn, PhD, is Professor of Statistics at the University of Glasgow and a member of ACT's Editorial Advisory Board, email: firstname.lastname@example.org
1. S. Pocock, D. Altman, P. Armitage, D. Ashby, M. Bland et al., "Statistics and Statisticians in Drug Regulation in the United Kingdom," Journal of the Royal Statistical Society, Series A—Statistics in Society, 154, 413–419 (1991).
2. Working Party on Statistical Issues in First-in-Man Studies 2007, Report of the Working Party on Statistical Issues in First-in-Man Studies Royal Statistical Society, London http://www.rss.org.uk/first-in-man-report.
3. S.J. Senn, "Are Placebo Run Ins Justified?" British Medical Journal, 314, 1191–1193 (1997).
4. S.J. Senn, "The Misunderstood Placebo," Applied Clinical Trials, October 2001, 40–46.