Innovative ePRO: Tapping into the Potential

June 1, 2006
Bill Byrom, PhD

Applied Clinical Trials

Applied Clinical Trials, Applied Clinical Trials-06-01-2006, Volume 0, Issue 0

Electronic solutions such as IVR systems, PDAs, and digital pens exhibit advantages over paper and pencil in PRO data collection.

In clinical trials, we ask patients to self-report data relating to a number of areas including efficacy, side effects, quality of life, health economics, medication adherence, and treatment satisfaction. Recent draft guidance from the FDA defines a patient reported outcome (PRO) as a measurement of any aspect of a patient's health status that comes directly from the patient, without interpretation of the patient's responses by a physician or anyone else.1 Electronic solutions present significant advantages in the collection of PRO data compared to paper and pencil. In particular, electronic patient diaries can eliminate conflicting or ambiguous data inherent in paper diaries and, importantly, can measure and maintain the required diary reporting schedule so that trialists can demonstrate the contemporaneous nature of their PRO data. In fact, studies in paper diary completion have shown that many patients complete their diaries retrospectively, often just prior to a clinic visit; some even complete their diaries ahead of time.2 Both these behaviors put into question the validity of PRO data collected using paper and are cited as a concern in the draft FDA guidance.1

In addition to enhancing the quality and integrity of data collected, the use of technology [which includes, for example, personal digital assistants (PDAs), interactive voice response (IVR) systems, mobile phone applications, and digital pens] can provide additional features that enable us to take new and enhanced measurements from patients. In fact, ePRO provides us with the potential to do far more than we ever could do using pencil and paper. The research community is actively exploring new possibilities and approaches that may become tomorrow's gold standards in assessing patients in both clinical trials and routine care. In this article, I overview a small selection of these new areas of innovation and comment on their potential application to clinical trials.

Cognitive function testing

Cognitive function testing in clinical trials is conventionally performed at the site using dedicated hardware and software. This can make testing large numbers of patients prohibitive, especially if cognition is not the primary measure of interest. However, ePRO solutions in mobile phone, PDA, and IVR applications have been successful in delivering cognitive testing batteries. This affords the possibility of measuring cognition remotely without requiring patients to attend a clinic, and can provide a cost-effective means of testing very large samples of patients.

The examples I present in this article use an IVR test battery that has been used in a number of published studies.3-5 This battery comprises four tests measuring aspects of attention and memory (see Table 1).

Table 1. Cognitive function test battery delivered using interactive voice response system.

An early validation study with this system3 investigated the comparability of the IVR test battery with the corresponding tests delivered conventionally via a PC and the ability of telephone testing to detect the known effects of aging on cognition. A sample of 138 volunteers aged 11 to 78 years were tested at their homes on a single occasion and performed both sets of tests in a randomized cross-over study design. In all cases, volunteers were able to follow the requirements of tests delivered using both the PC and IVR systems, indicating that the test battery can be successfully administered over the telephone. In addition, results from each volunteer showed high concordance between tests administered over the telephone and the corresponding tests delivered via the PC-based system. Finally, the IVR test battery was seen to detect the same pattern of age-related effects on cognition as observed using conventional testing in this study and others. For example, as illustrated in Figure 1, volunteers in higher age bands showed slower reaction times in the simple and time test3—a known phenomenon associated with the aging population.

Figure 1. Effects of aging on simple reaction times measured using a telephone test battery.3

Another study used the IVR test battery to explore the sedative effects of midazolam in patients with dental phobia.4 In the United Kingdom, dental phobia affects between 7% and 15% of the population, and as a result many sufferers attend a dental hospital for dental surgery because in this environment they are able to receive midazolam to help them undergo the procedures. In this study, 18 patients requiring a molar extraction on two occasions were treated with midazolam prior to surgery and, in a randomized cross-over design, received flumazenil or placebo following surgery on each occasion. Patients used the clinic telephone to undergo cognitive tests pre-midazolam at 1 and 2 hours postdose. They continued telephone tests from home at 3, 4, 5, 6, and 7 hours postdose. Midazolam was shown to slow reaction times and decrease attention in each of the four tests administered. Effects did not return to baseline by 7 hours. Despite the small sample size, the IVR test battery was able to show that these deteriorations were statistically significant. In addition, when subjects received flumazenil after surgery, the telephone tests detected significant partial reversal of the effects of midazolam, as illustrated in Figure 2, for the digit recognition test. These results suggest that telephone testing is sensitive enough to detect drug-related changes in cognition and their reversal, even when used in the less controlled environment of the patient's home.

Figure 2. Effects of midazolam and its temporary reversal using flumazenil on numeric working memory speed/accuracy in dental phobia patients using a telephone test battery.4

These studies indicate that the measurements required by computerized cognitive tests can be made using simple electronic solutions other than site-based hardware. This opens up greater opportunities for clinical trials where beneficial changes in cognition can add great additional value to the primary information collected. These approaches are perhaps most applicable to normal and mildly impaired populations, including, for example, patients with depression, epilepsy, chronic heart failure, hypertension, diabetes, insomnia, and cancer.

Voice and handwriting analysis

Most of us are taught our handwriting by a teacher who teaches us a specific script and writing style. However, although taught, we all develop individual differences in the way we write. Some would say that these individual expressions reflect aspects of our personality. For example, handwriting analysts would argue that backward-sloping handwriting suggests introversion, words close together suggest sociability, and missing i-dots and t-crosses suggest forgetfulness or carelessness. Where is this leading? Interestingly, handwriting effects are observed in some neurological conditions. For example, some Parkinson's disease patients exhibit micrographia (i.e., very small handwriting).

Metrics taken from handwriting have, in fact, been used as endpoints in clinical studies. The digital pen provides an effective means of recording not only the output of a handwriting exercise but also information on speed and writing technique. For example, in schizophrenic patients studies have illustrated that reduction in handwriting area is correlated with, and could even be a predictor of, dopamine (D2) receptor occupancy.6

Handwriting tests have also been used to study the effects of alcohol. In one study,7 20 healthy female volunteers aged 19 to 20 years were randomized to receive a large vodka and orange juice cocktail or placebo on two separate occasions according to a cross-over design. On each occasion, volunteers were subjected to a battery of performance tests, including a handwriting test in which they were required to quickly write down a number of defined words, with each word being written a number of times. The study showed the usual performance deterioration associated with alcohol, and also that similar effects were observed in the handwriting test. On average, the length of written words increased after consuming alcohol—mean word length increasing by 23% at 45 minutes postdose. This is illustrated in Figure 3, which shows the increase in length of the written word "black" for the median subject.

Figure 3. Mean word length increased 23% at 45 minutes postdose.7 (Reprinted with permission from Dr. Tiplady, University of Edinburgh.)

Interestingly, as well as having an influence on handwriting, drugs and conditions also have an effect on the way we speak. Voice acoustic measures have been shown to be sensitive indicators of disease severity and therapeutic response in many CNS disorders, including Parkinson's disease, depression, and schizophrenia. This is well illustrated by one study that investigated the relationship between voice metrics and depression severity.8 In this study, researchers captured and digitized voice samples from seven depression rating interview videos (one patient with mild depression, six with moderate/severe depression). These videos are commonly used in Investigator training events for depression clinical trials to enable the Investigators to standardize the way they rate patients using the Hamilton Depression Rating Scale (HAM-D). Voice samples comprised the first 10 seconds of uninterrupted speech, and voice acoustic measures were made on the middle 5-second interval starting at the onset of a word, using commercially available software. The measures investigated were speaking rate (number of syllables spoken in the 5-second interval), percent pause time (sum duration of pauses over 250 ms in duration, expressed as a percentage of the 5-second interval), and pitch variation (a measure of how highly intonated or monotonic the voice is). Results showed strong correlations between the HAM-D score and both speaking rate and pitch variation. Speech slowed and became more monotonic with increasing severity of depression (Figure 4).

Figure 4. Correlation between voice acoustics measures and Hamilton Depression Rating Scale scores.8

Normally, these kinds of voice acoustics measures are made in a specialist voice laboratory, but a recent study showed that voice samples suitable for voice acoustics analysis could be reliably captured remotely via the telephone using an IVR system.9 This research indicates that in the future it may be possible to make these measurements from patients in a simple and cost-effective manner in large clinical studies. More work is underway in validating and understanding these endpoints for use in future clinical trials.

Change from baseline

One challenge identified in the recent FDA PRO draft guidance1 is that PRO instruments that require patients to rely upon memory, especially recall over a period of time, may threaten the accuracy of the data collected. Try to remember what you ate for dinner a week ago. Unless it was a special occasion, you are unlikely to be able to. However, in clinical trials we often ask patients to rate their condition relative to a pretreatment baseline condition that occurred weeks earlier. This is the basis of the patient global impression of improvement (PGI-I) score in which patients rate themselves on a 7-point scale from 1 (very much better) to 7 (very much worse). A new approach using an electronic instrument developed by Healthcare Technology Systems called MERET (Memory Enhanced Retrospective Evaluation of Treatment) helps to address these concerns regarding the accuracy of recall. At baseline, patients describe their feelings and experiences related to their condition in their own voice and words. This is captured using an IVR system. At further time points this baseline recording can be played back to the patient, enabling them to accurately recall their pretreatment condition and rate themselves relative to it using the same 7-point scale as the conventional PGI-I. Interestingly, not only does the content of the recorded message help the patient to anchor recall of their baseline status, but the emotion, intonation, choice of words, and hesitation provide additional cues about their condition at baseline. As we saw with the voice acoustics study in depression, aspects of voice such as pitch variability and speaking rate correlate well with depression severity, and so these additional measures, subconsciously interpreted by the patient when listening to their pretreatment recording, provide enhanced insight into their baseline condition.


Clinical trials using MERET have provided encouraging results. One study in 74 depressed patients10 indicated that MERET was more sensitive to detecting treatment-related improvements than the standard PGI-I. In this study, depressed patients received up to four weeks of treatment with duloxetine or placebo. MERET recordings were made prior to entering the double-blind phase, and patients were asked to provide improvement ratings four weeks later. Patients self-rated changes on the PGI-I first, then listened to baseline MERET recordings before providing a second rating. Although in this study both PGI-I and MERET ratings showed significant improvements on active treatment compared to placebo, MERET scores showed greater separation and effect size compared to the conventional PGI-I. Using pre-treatment experiential anchors in this way appears to enhance the ability of patients to perceive the magnitude of their change in clinical condition. Unlike formal voice acoustics analysis, MERET and similar concepts are applicable for use in today's clinical trials and could apply to studies in any therapy area where patient-rated change in condition is measured.


Many clinical trials today collect self-report data from patients. Electronic methods for collecting this PRO data provide quality and integrity enhancements over paper, and enable additional measurements to be made from patients that have been impossible or impractical using pencil and paper. As a consequence, the research community is actively investigating new ways of collecting complex data that may provide us with greater insight into disease states. A few of these have been illustrated in this article, but many more are in development and testing in the academic community. Some of this research may in fact unearth tomorrow's gold standards.

Bill Byrom is product strategy director with ClinPhone Group Limited, Lady Bay House, Meadow Grove, Nottingham NG2 7EW, United Kingdom, +44 115 955 7333, email:


1. Food and Drug Administration. Draft Guidance for Industry. Patient-reported outcome measures: use in medical product development to support labeling claims. (FDA, Rockville, MD, Feb 2006).

2. A.A. Stone, S. Shiffman, J.E. Schwartz, J.E. Broderick, M.R. Hufford, "Patient Non-compliance with Paper Diaries," British Medical Journal, 324, 1193–1194 (2002).

3. K.A. Wesnes, T. Ward, G. Ayre, C. Pincock, "Development and Validation of a System for Evaluating Cognitive Functioning over the Telephone for Use in Late Phase Drug Development," European Neuropsychopharmacology, 9 (Suppl 5), S368 (1999).

4. N.M Girdler, J.P. Lyne, R. Wallace, N. Neave, A. Scholey, K.A. Wesnes, C. Herman, "A Randomized, Controlled Trial of Cognitive and Psychomotor Recovery from Midazolam Sedation Following Reversal with Oral Flumazenil," Anaesthesia, 57, 868–876 (2002).

5. P. McCue, A.B. Scholey, C. Herman, K.A. Wesnes, "Validation of a Telephone Cognitive Assessment Test Battery for Use in Chronic Fatigue Syndrome," Journal of Telemedicine and Telecare, 8, 337–343 (2002).

6. U. Kuenstler, U. Juhnhold, W.H. Knapp, H.J. Gertz, "Positive Correlation Between Reduction of Handwriting Area and D2 Dopamine Receptor Occupancy During Treatment with Neuroleptic Drugs," Psychiatry Research, 90, 31–39 (1999).

7. K. Farquhar, K. Lambert, B. Tiplady, P. Wright, "Handwriting as an Index of Ethanol Effects," Journal of Psychopharmacology, 15, A34 (2001).

8. M. Cannizzaro, B. Harel, N. Reilly, P. Chappell, P. Snyder, "Voice Acoustical Measurement of the Severity of Major Depression," Brain and Cognition, 56, 30–35 (2004).

9. M. Cannizzaro, N. Reilly, J. Mundt, P. Snyder, "Remote Capture of Human Voice Acoustical Data by Telephone: A Methods Study," Clinical Linguistics and Phonetics, 19, 649–658 (2005).

10. J.C. Mundt, D.J. DeBrota, H.K. Moore, J.H. Greist, "Memory Enhanced Retrospective Evaluation of Treatment (MERET): Anchoring Patients' Perceptions of Clinical Change in the Past," 45th Annual Meeting of the New Clinical Drug Evaluation Unit Program, Boca Raton, FL USA. Abstract I-30 (2005).

Related Content:

Trial Design