OR WAIT 15 SECS
Authors comment on Kirby et al.'s remedies for placebo response.
In their article, "Reducing Placebo Response: Triple Blinding & Setting Expectations," (ACT, November 2005), Kirby and coauthors identify factors that increase placebo response. Such factors decrease the efficacy of signal detection in placebo-controlled trials. Fewer than half of all registration trials for new antidepressants and anxiolytics between 1985 and 1999 separated drug from placebo, where placebo responses were often large.1 In their article, Kirby et al. focus on interactions between subjects and site personnel that can bias assessments and influence study outcomes by increasing placebo response. Nonspecific therapeutic effects of interactions with caring staff, bilateral expectation of benefit, functional unblinding of raters, and ratings inflation are recognized factors that contribute to placebo response rates.
As remedies, Kirby et al. suggest the use of blinded raters and the provision of standardized instructions to subjects. They suggest that raters should not interact with subjects prior to or after the rating process and that raters should be blinded to study specifics, including randomization ratios and likely side effects of the investigational drug. They also advocate raters not be aware of study qualification criteria and that patient expectancies be managed during informed consent procedures and prior to each assessment.
Factors contributing to elevated placebo response rates are widely recognized as methodological confounds that diminish signal detection efficacy in clinical trials.2 Evidence of rater inflation to qualify subjects for study participation is striking3 and grows as required severity increases. Rating inflation may be counterbalanced, in a pernicious way, when identification of subject response, defined by low scores, is required for participation in relapse prevention trials. Feltner et al. found evidence that the same raters who inflated scores to initially enroll subjects in a study also deflated scores at the end of open-label treatment to randomize apparent "responders" into the relapse prevention phase of the trial.4 Such inflation and deflation of rating scores likely reflects confirmation bias on the part of the raters, best defined by Yogi Berra as: "If I hadn't believed it, I wouldn't have seen it."
The remedies proposed by Kirby et al. are theoretically attractive, but they face practical limitations. First, sites and investigators are paid for enrolling subjects, not for excluding them. Second, human raters are fundamentally limited in their ability to make reliable and unbiased assessments. Computer interviews are not subject to such limitations, providing procedurally consistent and reliable interactions with perfect repeatability. Computer assessments possess all the characteristics advocated by Kirby et al.
Assessment fidelity is not aided by the inclusion of the warm fuzzy inconsistencies produced by subject–rater interaction. More than 30,000 Hamilton Depression Rating Scale assessments have been obtained by interactive voice response (IVR) technology that is accepted by the FDA as a primary efficacy measure for adult outpatient antidepressant registration trials. IVR assessments were also the primary efficacy measures that supported FDA approval of Lunesta.
IVR uses touch-tone telephones to administer fully standardized interviews and scoring algorithms that cannot be influenced by treatment unblinding, both attractive attributes for clinical assessments. Convenience and constant availability make daily (or more frequent) assessment possible, an impractical goal for human rating procedures. Error checking of responses occurs at the time of data collection, and immediate storage to electronic files expedites subsequent analyses. Auditory interaction largely overcomes functional illiteracy, currently estimated to affect 22% of the U.S. population. Sensitive topics are disclosed more openly in nonhuman interviews. Subjects are familiar and comfortable with computer interactions over the telephone.
IVR assessments are more convenient, reliable, and consistent than human raters, presenting instructions and information exactly as programmed at every assessment. Turnover, vacations, and variability of staff regarding training and experience make such consistency impossible for humans to achieve. IVR assessments also cost less than human assessments.
Lest we worry this problem is unique to mental health or clinical trial raters, idealistic medicine residents were found to comply with self-defined practice "protocols" only 22% of the time. Given computer reminders of their own rules for practice, compliance improved to 51%, justifying McDonald's subtitle: "The Non-perfectability of Man."5
Humans are wonderfully varied, creative, adaptive, idiosyncratic, and variable. These are admirable attributes for solving many problems in accordance with desires and expectations. Computers are repetitive and consistent, and reliably execute tasks as programmed. Computer auto-pilots actually perform better airplane landings than humans in low visibility. Computer interviews are more consistent and maintain structured interview techniques better than humans can. They cannot be influenced by factors outside the interview process. Repetitive clinical assessments in randomized clinical trials are not the forte of human raters, even when they're paid well for doing them.
John Greist,* MD, is director and James Mundt, PhD,is a research scientist with Healthcare Technology Systems, Inc., 7617 Mineral Point Road, Suite 300, Madison, WI 53717, (608) 827-2440, fax (608) 827-2444, email: email@example.com.
*To whom all correspondence should be sent.
1. A. Khan, S. Khan, W.A. Brown, "Are Placebo Controls Necessary to Test New Antidepressants and Anxiolytics?" International Journal of Neuro-psychopharmacology, 5, 193–197 (2002).
2. J.H. Greist, J.C. Mundt, K. Kobak, "Factors Contributing to Failed Trials of New Agents: Can Technology Prevent Some Problems?" Journal of Clinical Psychiatry, 63 (suppl 2) 8–13 (2002).
3. D.J. DeBrota, M.A. Demitrack, R. Landin et al., "A Comparison Between Interactive Voice Response System-administered HAM-D and Clinician-administered HAM-D in Patients with Major Depressive Episode," NCDEU, Boca Raton, FL June 1–4, 1999.
4. D.E. Feltner, K. Kobak, J. Crockatt et al., "Interactive Voice Response (IVR) for Patient Screening of Anxiety in a Clinical Drug Trial," NCDEU, Phoenix, AZ, May 28, 2001.
5. C.J. McDonald, "Protocol-Based Computer Reminders, the Quality of Care and the Non-perfectability of Man," New England Journal of Medicine, 295, 1351–1355 (1976).