Evaluating the Effects of Therapy Masking

March 13, 2015
Ruth McBride

Ruth McBride, BS, is chief technology officer with Axio Research Corporation, a division of Solutia Pharmaceutical Services Division, 2601 4th Avenue, Suite 200, Seattle, WA, 98121, (206) 577-0217, fax (206) 547-4671, email: davidk@axioresearch.com.

Barbara A Ricker

Alexander R Arslan

Katherine M Seymour

April Slee

Jeffrey L Probstfield, MD

Applied Clinical Trials

An important criterion for a well-conducted clinical trial is minimization of bias in endpoint assessment.


An important criterion for a well-conducted clinical trial is minimization of bias in endpoint assessment. When possible, treatment assignments should be masked (or “blinded”) so the assignment is unknown to both the study participants and healthcare professionals treating the participants. Strategies for masking studies range from very elaborate, such as sham surgeries1,2, to the most common, which is simply providing a placebo pill that is visually identical to the active drug. While masking is a common strategy to reduce bias in randomized clinical trials (RCT), assessment of the adequacy of the treatment mask is seldom evaluated. Schultz reported that trials that were not double-blind yielded larger effects with odds ratios exaggerated by as much as 17%.3 The 2001 CONSORT Statement4 and the International Committee of Journal Editors5 both recommend disclosure of the methods used to blind therapy. The 2001 CONSORT Statement suggested clinical trials assess the strength of blinding, but made very few recommendations regarding how this should be accomplished. The updated 2010 CONSORT Statement6 recognized the difficulty of assessment of blinding, and the dubious nature of common assessment techniques. The theoretical measure of an adequate blind would have participants, healthcare providers, and outcome assessors able to correctly guess study assignment no more frequently than by chance alone. The 2010 CONSORT Statement describes interpretational and measurement difficulties causing concern about the usefulness of testing blinding, but they suggest detailing how trials are blinded and precisely who was blinded. Bang7 argues that “…empirical (quantitative) evidence is almost always superior to ignorance in any decision-making process and understanding/characterizing scientific phenomena.” Preservation of the blind has well-recognized importance for endpoints potentially influenced by patient perception, such as relapse in alcoholism or assessment of post-surgical pain.1,2,8 However, it is not inconceivable that patient knowledge of assignment could result in differential treatment and study visit compliance, which might introduce bias even in the case of events that are considered more objective. James8 argued that reanalysis of data should be conducted when the blind is compromised. In summary, there is consensus within the research community that blinding is important and should be evaluated, but no consensus about how to undertake a practical assessment of whether a specific strategy within a specific trial was adequate.

When the active treatment has a distinctive side effect, masking the active treatment against a placebo can be difficult. Niacin, in doses sufficient to affect lipid metabolism (> 1,000 mg), causes a noticeable flush and, frequently, itching. This side effect can be mitigated by taking 325 mg of aspirin approximately 30 minutes before taking the niacin. And, proprietary formulations of extended release products, such as that used in AIM-HIGH (Niaspan™, AbbVie, Inc.) can also decrease the flushing response. Several earlier studies involving niacin used a placebo “spiked” with 50 mg of immediate release crystalline niacin, an amount sufficient to induce flushing, but insufficient to have an effect on lipids. In the ADMIT study9, 50 mg of free niacin was added to the placebo. In another study conducted by Brown10 testing simvastatin or antioxidant vitamins; (niacin) or the combination of the two to prevent coronary disease, the placebo tablets also were spiked with niacin. The technique used in the Brown study was successfully repeated by Garg13 and by Zhao.14 Zhao reported flushing of any severity in 30% of the simvastatin and niacin treated groups and 23% in the simvastatin-placebo group (p =0.35). In the ARBITER 2 study15, the placebo did not contain any active niacin. At the conclusion of the study, the reported side effect of skin flushing was significantly higher in the niacin treated group than the placebo group (69.2% vs. 12.7% P<0.001), pointing to the difficulty in blinding participants and caregivers when testing a drug such as niacin.

We examined the relative success of masking therapy in the Atherothrombosis Intervention in Metabolic Syndrome with Low HDL/high triglycerides Impact on Global Health Outcomes (AIM-HIGH) trial, a randomized controlled clinical trial testing statin + “spiked” placebo versus statin + extended release niacin. The well-known flushing side effect of niacin was described to participants during the informed consent process, so participants in either arm who experienced flushing might suspect that their flushing was caused by niacin.


Overview of AIM-HIGH

The design and results of AIM-HIGH have been described elsewhere.16,17 AIM-HIGH was a randomized, double-blind trial to assess the effect of extended release niacin (Niaspan™,AbbVie, Inc.) on the rate of important cardiovascular events (time to first occurrence of coronary artery disease death, non-fatal myocardial infarction, hospitalization for acute coronary syndrome, ischemic stroke, or symptom-driven coronary or cerebral revascularization) compared to placebo in participants with a history of cardiovascular disease, low HDL and high triglycerides. All participants were treated (open label) with a statin with the possible addition of ezetimibe to achieve a target LDL-C of 40 to 80 mg/dL. During the planning phase of AIM-HIGH, the investigators recognized the difficulties of blinding caregivers and participants to niacin therapy because of the distinctive side effect profile associated with high doses of niacin: extreme flushing and itching. The extended-release formulation of niacin chosen for the trial was designed to minimize the symptoms associated with high doses of niacin; however, the package label warns of these symptoms and the trial included an open label run-in period to gradually increase the   dose from 500 mg/day to 2,000 mg/day to determine whether the participant would be able to tolerate the medication. During open-label run-in, all trial participants were educated about these potential side effects and ways to minimize the flushing expected with niacin. Participants who tolerated at least 1,500 mg per day of Niaspan were randomized to continue with Niaspan or to take a matching placebo. In an attempt to mask physicians, research coordinators and participants, placebo tablets were manufactured to contain 50 mg of immediate release (crystalline) niacin. This small amount of niacin has been shown to be sub-therapeutic for modifying lipids, but sufficient to induce a flushing response. At baseline, participants were asked about their prior experience with niacin or niacin products.


Assessing study drug masking

The study was stopped at the recommendation of the Data and Safety Monitoring Board (DSMB) prior to its planned completion because of a demonstrated lack of efficacy of niacin on the primary outcome.17 At the close-out visit, participants were asked to make their best guess regarding the therapy they were assigned, with the choices being: “Active Niaspan,” “Placebo,” and “No Idea.” Research coordinators were also asked to guess their participants’ treatment, and were offered the same choices. Physicians were not included in the questionnaires because the research coordinators conducted the majority of visits, spending the most time with the participants. The blind was not intentionally broken for any participants or research coordinators during the trial.


Statistical methods

The primary objective was to compare the results of participant and research coordinator treatment guesses compared to actual treatment assignment. The null hypothesis was that guesses are correct 50% of the time (as expected by chance), versus the two-sided alternative hypothesis that the percentage of correct guesses is significantly different than 50%. P-values and confidence intervals are based on the exact binomial distribution for participant guesses and from weighted linear regression for the percentage of correct guesses by clinical site. A correct guess was defined as guessing active Niaspan when the participant had been randomized to Niaspan or guessing placebo when the participant had been randomized to placebo. The proportion of correct guesses was defined to be the number who guessed correctly divided by the number of guesses among those who responded.

Secondary analyses included calculation and comparison of two blinding indices7, 8 and determining whether there was evidence that the presence of side effects, prior niacin use, and gender were associated with correct guesses. Logistic regression models used were to determine whether side effects, gender, and prior niacin use were associated with correct guesses.

For research coordinators, odds ratios were computed using generalized estimating equations to account for correlation within the site. We assumed one coordinator at each site completed all the close-out visit forms for participants at their site. We assumed that guesses from the same research coordinator for different participants could be correlated, but that guesses regarding participants from different sites were independent.

Two blinding indices (BI) were calculated.7,8 These two indices were chosen because they interpret the “No Idea” responses differently. The James BI is a variant of the kappa statistic, ranging from 0, indicating complete unblinding, to 1, indicating complete blinding. Therefore, it is a measure of the disagreement between random assignment and treatment guess. The basic assumption is that the “No Idea” responses are most indicative of the success of blinding and assigns a weight of 1 for these responses. A weight of 0 is assigned to correct responses, as these indicate unblinding. The variance of the James index is computed using the jackknife procedure.

The Bang BI can be used to assess studies with more than two treatment groups or more complex study designs. The Bang BI yields the proportion of correct guesses in each treatment group beyond what would be expected by chance. It assigns less weight to the “No Idea” responses, and is more sensitive to the value of the other responses. The Bang index ranges from -1, indicating that all responses were wrong to +1 indicating that all responses were correct (complete unblinding). A Bang index of 0 indicates random guessing. The Bang index and confidence interval are computed for each treatment group. The variance of the Bang index is computed using exact methods based on a trinomial distribution.

Analyses and graphs were performed using SAS version 9.2 (Carry, NC). Unless otherwise specified, all confidence intervals and p-values are two-sided.



Responses were available from 3,160 of 3,414 participants, 1,577 randomized to active Niaspan and 1,583 to placebo. Overall, 1,116 (35%, 95% CI: 34, 37%, p < 0.001 for comparison to 50% null hypothesis) correctly guessed their treatment assignment, while 997 (32%) guessed incorrectly and 1,047 (33%) chose “No Idea” (See Below).

For participants randomized to Niaspan, 60% guessed correctly and for participants randomized to placebo, 10.5% guessed correctly. Of the 3,217 guesses from the research coordinators, 901 (28%, 95% CI: 24%, 32%, p < 0.001) guessed correctly and 767 (24%) guessed incorrectly. The majority of answers from research coordinators, 1549 (48%), were “No Idea” (See Above).

Flushing was reported by 91% of placebo participants and 88% of Niacin participants. Within the placebo group, 57% of participants reporting flushing guessed active Niacin compared to 34% without flushing (p <0.0001). Twenty percent of placebo participants with no flushing correctly guessed placebo, compared to 10% who experienced flushing (p=0.0003). Within the active Niacin group, 62% of participants reporting flushing correctly guessed active Niacin, compared to 41% without flushing (p < 0.0001). The presence or absence of the expected side effects of flushing and itching was significantly associated with correct guesses of therapy by the participant (p < 0.01, Figure 1). Nausea, GI symptoms, changes in eyesight and gout may have an effect on correct guesses, but comparisons did not reach nominal significance. Experience with niacin prior to AIM-HIGH was significantly associated with correct guesses among participants, p=0.005. Gender had no effect on whether the participant correctly guessed the treatment assignment. When participants presented with itching, but not flushing, the research coordinators more frequently guessed the correct therapy, p=0.005 (Figure 2).

Figure 1: Odds ratio associated with correct guesses by participants.  Odds ratio greater than 1 indicates more likely to guess correctly.


The James BI was 0.645 (95%CI 0.637, 0.663) for guesses by the study participants and 0.722 (95%CI 0.707, 0.737) for guesses by research coordinators. These values indicate successful blinding for both groups, since the lower limit of the 95% confidence interval is greater than 0.5 (See Below).

The Bang BI for participants assigned to Niaspan was 0.53 (95%CI 0.49, 0.56) and -0.45 (95% -0.48, -0.41) for participants assigned to placebo. For guesses by research coordinators, the Bang index was 0.22 (95%CI 0.18, 0.25) for participants assigned to niacin and -0.13 (95%CI -0.17, -0.099) for those assigned to placebo. Responses by both participants and research coordinators indicate moderate success in blinding participants assigned to niacin and more incorrect responses than expected by random guessing in participants assigned to placebo.

There were 92 clinical sites in AIM-HIGH. Research coordinators had been working on the study from 0.25 to 5.75 years at the time that the study was stopped. Those who had been with the study longer made correct guesses more frequently than those who had joined the study more recently (mean duration with study for those guessing correctly, 4.07 years (95%CI 3.28, 3.43) and mean duration for those guessing incorrectly 3.36 years, (95%CI 3.97, 4.17, p < 0.001).



Overall, both participants and research coordinators correctly guessed treatment assignment less often than would be predicted by chance, which we interpret as evidence that the blind was satisfactory. More than 90% of participants guessed they were taking the active drug, or had no idea about which treatment they received. The calculation of both Bang and James BI’s support this observation. The presence of flushing was associated with guessing active drug in both arms, and the absence of flushing was associated with correctly guessing placebo. Most patients reported flushing at least once during the study. The majority of participants, both those assigned to Niaspan (60%) and those assigned to placebo (55%), believed that they were taking the active drug, though substantial proportion (33% overall) had no idea to which drug they were assigned. The high percentage guessing active treatment may be due to the spiked placebo dose of Niacin, which has been shown to induce a flush12, but not affect lipids. Alternatively, it could have been due to the participant’s desire to be on the active treatment (wishful thinking). In treatment-adjusted models, the presence of flushing and itching were associated with correct identification of treatment, though the number of participants guessing placebo was quite low. Taken together, these findings suggest that participants noticed side effects and may have used this information to guess treatment assignment, which underscores the importance of efforts to blind participants and research coordinators to a drug with known and pronounced side effects. In the absence of a definitive side effect, a difference in the proportion of correct guesses by treatment may raise concerns about blinding. However, the proportions of correct guesses are expected to be different by treatment assignment when a characteristic side effect of active treatment is simulated in the placebo group, and neither of the blinding indices we calculated were developed specifically for this situation. The spiked placebo strategy leads participants in both arms to believe they are taking active drug, though those assigned to active drug will guess correctly with this logic and those assigned to placebo will guess incorrectly. The realized results from AIM-HIGH support this theory-among participants who guessed either placebo or niacin, i.e., those who had some idea of what treatment they received, 86% overall (84% and 89% in placebo and niacin) did guess active niacin. Without the spiked placebo, the number of participants who correctly guessed niacin might be similar to what we saw, but the number who correctly guessed placebo would likely have been higher. Clearly, the ideal for both participants and for the integrity of the experiment is to eliminate differential side effects. When this is not possible, and the risk to participants is low, a technique such as this spiked placebo can strengthen the design of a study by preserving the blinding. The appropriate technique to retrospectively assess blinding in this setting is an important topic for future research.

At the time that participants and research coordinators were asked to guess therapy, the overall results of the trial had been presented to them and announced in a press release by the sponsor. So, while the participants may have hoped that they were on an effective therapy, the overall results were null, that is extended release niacin showed no benefit compared to placebo on the primary or any secondary endpoints. Some have recommended that assessing the success of study masking should be done prior to announcement of study results (Sackett, DL Letter to Editor, BMJ 328: 8May2004), since the study results may affect guesses. AIM-HIGH was stopped prior to its planned conclusion by its DSMB due to the observed lack of efficacy. This specific case in AIM-HIGH points out the importance of assessing the blind soon after randomization to avoid any potential issues with early stopping should such be dictated.

It should be noted that the results from AIM-HIGH were based on a blinded central review of medical records documenting the elements of the clinical endpoint, thus it is unlikely that the trial results were affected by a prior notion of study drug assignment on the part of the participant or research coordinator.

The presence/absence of clinical symptoms (flushing/itching) after switching from open label Niaspan at maximally tolerated dose at end of run-in to randomly assigned treatment may have influenced the correctness of the participant’s guess because the symptoms disappeared or decreased significantly between open label and randomized treatment, despite the small amount of immediate release niacin in the placebo tablets. While this is a possibility, there is no data currently to support this conclusion. Regardless, prior experience with niacin was associated with more accurate guesses. Development of gouty symptoms and visual impairment was relatively infrequent, and participants may not have associated these symptoms with Niaspan. Finally, while gastrointestinal symptoms were relatively frequent, there were similar proportions of these in the two treatment groups. Since participants were on multiple drugs for cardiovascular prevention and other indications, it is possible that such symptoms could have been attributed to other medications or causes and did not lead the participant to think that they were on one therapy or another.

Because the James BI weights “No Idea” or “do not know” more heavily than an incorrect guess, the 33% of participants in AIM-HIGH who answered “do not know” affected this score. In the null case where there are no “do not know” and the guesses are random, the James BI is 0.5. For AIM-HIGH, the James BI of 0.65 for participants and 0.72 for research coordinator indicates successful blinding, since the lower limit of the 95% confidence interval for the computed index is greater than 0.5 (See Below).

The Bang BI de-emphasizes the “do not know” in favor of a balanced proportion of correct vs. incorrect guesses. The Bang BIs for AIM-HIGH indicate that the magnitude of blinding was similar in the two treatment groups, but in opposite directions. This observation is consistent with the design objective to try to make placebo-assigned participants unsure about whether or not they were taking Niaspan. The positive Bang BI of 0.53 for the participants on niacin indicates they were more frequently correct in their guesses. The placebo group’s Bang BI was of about the same magnitude but negative, -0.45, indicating relative opposite guessing, that is, guessing that they were on active treatment when they were on placebo. The research coordinators’ Bang BI’s were also positive for active drug guesses and negative for placebo guesses, but much closer to zero or random guessing than for the participants’ Bang BI. The intention in AIM-HIGH was for all participants and related clinical staff to be unsure about the participant assignment. Bang admits that interpreting a BI of -1 can be “subjective because this may represent complete blinding or complete unblinding in the opposite direction.”7

The James BI may provide a more appropriate measure for AIM-HIGH data because “spiking the placebo” was designed to confuse participants. Participants and coordinators guessed in similar proportions either incorrectly or no idea. By design, the Bang BI assumes a situation in which participants have no idea what they received. Because of the known side-effect profile associated with Niaspan, it cannot be assumed that participants had no idea the study drug they received.

In a systematic review of reports of randomized, placebo controlled clinical trials, only 7% provided any data on the success of therapy masking 18, and fewer gave any data assessing the success of masking among both study participants and outcome assessors. Fergusson et al recommend that the CONSORT statement be modified to require assessment of blinding for all double-blind randomized trials.



The proportion of participants who had “no idea” which therapy they were taking may have been a polite response in the sense that the participant realized that remaining blinded was desirable. The question was asked neutrally by the research coordinators at the final in-clinic visit; we have no way of assessing whether the participant was responding honestly or not. Similarly, the research coordinator’s fully realized the importance of blinding in a clinical trial and may not have responded honestly. The response from over half of the research coordinators as “No Idea” may reflect their professional neutrality. There is no way to assess if research coordinator’s guesses were influenced by the response of the participants.



By “spiking” the placebo tablets with 50 mg of crystalline niacin, the AIM-HIGH investigators hoped to blind the participants to the study drug to which they were assigned randomly. Blinding indices such as those proposed by Bang and by James provide effective statistics to measure the success of treatment masking in different situations. We concluded that “spiking” the placebo was effective in blinding a large percentage of participants to their assigned treatment.

Trials designed to mask therapy assignment from participants and treating healthcare professionals should consider assessing the success of their attempts to mask therapy in all cases, and to do so before the trial results are known to either clinical staff or participants. 


Barbara A Ricker, MPH, BSN, University of Washington; Katherine M Seymour, BA, University of Washington; Alexander R Arslan, BS, Axio Research LLC; April Slee, MS, Axio Research LLC; Ruth McBride, ScB, Axio Research LLC; and Jeffrey L Probstfield, MD, University of Washington



1.     Moseley JB, O'Malley K, Petersen NJ, et al. A controlled trial of arthroscopic surgery for osteoarthritis of the knee. N Engl J Med 2002;347:81-88

2.     Sihvonen R, Paavola M, Malmivaara A, Itälä A, Joukainen A, Nurmi H, Kalske J, Järvinen TL; Finnish Degenerative Meniscal Lesion Study (FIDELITY) Group. Arthroscopic partial meniscectomy versus sham surgery for a degenerative meniscal tear. N Engl J Med. 2013 Dec 26;369(26):2515-24.

3.     Shultz, KF, Chalmers, I, Hayes, RJ, Altman, DG. Empirical Evidence of Bias. JAMA 1995; 273, 408-412.

4.     Mohler, D, Schulz, KF, Altman, D CONSORT Group. The CONSORT Statement: Revised Recommendations for Improving the Quality of Reports of Parallel-Group Randomized Trials. JAMA 2001; 285: 1987-1991.

5.     International Committee of Medical Journal Editors. Uniform requirements for manuscripts submitted to biomedical journals. Med Edu 1999; 33: 66-78.

6.     Schultz, KF, Altman, DG, and Moher, D; CONSORTGroup. CONSORT 2010 statement: updated guidelines for reporting parallel group randomized trials. PLOS Med 2010; 7: e1000251.

7.     Bang H, Flaherty SP, Kolahi J, Park J. Blinding assessment in clinical trials: A review of statistical methods and a proposal of blinding assessment protocol. Clin Res Regul Aff 2010; 27: 42-51.

8.James, K, Lee, K, Kraemer, H, Fuller, R. An Index for assessing blindness in a multi-center clinical trial: disulfiram for alcohol cessation – a VA cooperative Study. Stat Med,1996; 15: 1421-1434.

9.     Montori, VM, Bhandari, M, Devereux PJ, Manns, BJ, Ghali WA, Guyatt, GH. In the dark the reporting of blinding status in randomized controlled trials. J of Clin Epidemiol 2002; 55: 787-790.

10.  Coliguiri, B. Participant expectancies in double-blind randomized placebo-controlled trials: potential limitations to trial validity. Clin Trials 2010;7: 246-255.

11.  Egan, DA, Garg, R, Wilt, TJ, Pettinger, MB, et al for the ADMIT Investigators. Rationale and design of the arterial disease multiple intervention trial (ADMIT) Pilot Study. Am J Cardiol 1999; 83: 569-575.

12.  Brown BG, Zhao XQ, Chait A, et al. Simvastatin and niacin, antioxidant vitamins, or the combination for the prevention of coronary disease. N Engl J Med. 2001 345:1583-92.

13.  Garg, R, Elam, MB, Crouse, JR, et al. Effective and safe modification of multiple atherosclerotic risk factors in participants with peripheral arterial disease. Am Heart J 2000; 140: 792-803.

14.  Zhao XQ, Morse JS, Dowdy AA, et al. Safety and tolerability of simvastatin plus niacin in participants with coronary artery disease and low high-density lipoprotein cholesterol (The HDL Atherosclerosis Treatment Study). Am J Cardiol 2004; 93: 307-12.

15.  Taylor, AJ, Sullenberger, LE, Hyun, JL, Lee, JK, Grace, KA. Arterial Biology for the Investigation of the Treatment Effects of Reducing Cholesterol (ARBITER)2. Circ 2004; 110:3512-3517.

16.  The AIM-HIGH Investigators. The role of niacin in raising high-density lipoprotein cholesterol to reduce cardiovascular events in participants with atherosclerotic cardiovascular disease and optimally treated low-density lipoprotein cholesterol: rationale and study design. Am Heart J 2011; 161; 4771.e2-477.e2.

17.  The AIM-HIGH Investigators. Niacin in participants with low HDL cholesterol levels receiving intensive statin therapy. NEJM 2011; 365: 2255-2267.

18.  Fergusson D, Glass KC, Waring D, Shapiro S. Turning a blind eye: the success of blinding reported in a random sample of randomized, placebo controlled trials. BMJ 2004; 328:432.


Related Content:

Peer-Reviewed Articles