Journal of Postgraduate Medicine, Vol. 54, No. 3, July-September, 2008, pp. 214-216
A trial design that generates only ''positive'' results
Ernst E, Lee MS
Complementary Medicine, Peninsula Medical School, Universities of Exeter and Plymouth, 25 Victoria Park Road, Exeter EX2 4NT
Code Number: jp08073
AbstractIn this article, we test the hypothesis that randomized clinical trials of acupuncture for pain with certain design features (A + B versus B) are likely to generate false positive results. Based on electronic searches in six databases, 13 studies were found that met our inclusion criteria. They all suggested that acupuncture is effected (one only showing a positive trend, all others had significant results). We conclude that the 'A + B versus B' design is prone to false positive results and discuss the design features that might prevent or exacerbate this problem.
Keywords: Acupuncture, bias, clinical trial, false positive, trial design
Randomized clinical trials (RCTs) are designed to minimize bias in comparative tests of therapeutic effectiveness. However, they are not always free of bias. In recent years, we have seen a plethora of RCTs adopting a design where patients are randomized to receive either usual care (the control group) or usual care plus the experimental treatment. Schematically this design could be depicted as ′A + B versus B′. At first glance, such comparisons may seem reasonable. However, on closer inspection, doubts emerge regarding whether such RCTs are fair scientific tests of the experimental intervention. These doubts originate from the theoretical view that ′A plus B′ will invariably amount to more than ′B′ alone. Even in cases where treatment A is a pure placebo, its placebo and other nonspecific effects could lead to a better outcome in the experimental group than in the control group. This would be particularly likely if i) the experimental treatment is associated with sizable nonspecific effects, ii) a subjective outcome measure is used, and iii) the experimental intervention ′A′ causes a deterioration of the condition being treated.
Here we evaluate RCTs with the ′A + B versus B′ design to determine whether any of these trials produced ′negative′ findings, i.e. a result where the experimental group experiences outcomes which are either the same or worse than those of the control group. As hundreds of such studies exist, we focus on one particular situation which fulfils the above listed three criteria: acupuncture as a treatment of pain.
Materials and Methods
Electronic databases were searched from their respective inception up to 10 October 2007: Medline, AMED, British Nursing Index, CINAHL, EMBASE, and PsycInfo. The search terms used were: [acupuncture AND pain AND (usual care OR routine care OR standard care OR wait-list)] OR [acupuncture AND pain AND random]. In addition, screening of reference lists of all located articles and reviews was performed and our departmental files were hand searched.
Articles were considered if they reported an RCT in which human patients with any type of pain were treated with any type of acupuncture. Specifically, we included studies that fulfilled the following criteria: randomized patient allocation; pain as one outcome measure; control group receiving usual care; experimental group receiving usual care plus acupuncture. No language restrictions were imposed. We excluded feasibility studies, trials testing the effectiveness of TENS or laser treatment and studies that failed to include sufficient detail for assessment.
Hard copies of all articles were obtained and read in full by both authors. When an RCT had more than two treatment groups, we evaluated only the two groups (′A + B′ and ′B′) mentioned above. Both authors independently extracted key data [Table - 1]. The methodology quality of all RCTs was assessed using the modified Jadad score. 
We found 200 RCTs, 13 of which met our inclusion/exclusion criteria [Table - 1]. ,,,,,,,,,,,, Eleven RCTs had a parallel group design, ,,,,,,,,,, and two followed a partial crossover design. , Nine trials included two groups, ,,,,,,,, three had three groups ,, and one had four groups of patients [Table - 1]. 
The RCTs differed in many aspects: they recruited patients with different types of pain, employed a range of methods for quantifying pain, used different types of acupuncture administered according to different treatment schedules, employed different types of usual care, and varied in terms of Jadad score and sample size. Despite these differences, all studies generated results that suggest significantly better outcomes in the experimental (′A + B′) compared to the control group (′B′). The only exception is the RCT by Helms et al. ,  which only showed a non-significant trend in that direction, most likely due to a Type II error caused by its very small sample size (n = 12).
Randomized clinical trials with the ′A + B versus B′ design are currently popular, particularly for pragmatic studies. Our findings suggest that, in the realm of acupuncture for pain, this design is likely to generate false positive results. Alternatively acupuncture could, of course, be a highly effective intervention for reducing pain. However, recent RCTs which rigorously control for placebo effects by employing non-penetrating sham-devices as control interventions suggest that acupuncture is not superior to sham-acupuncture for pain.  Moreover, several RCTs with patients suffering from conditions other than pain (for which acupuncture has not been proven to be effective) indicate that this phenomenon may not be confined to studies of pain.
The reason for the inability of the ′A + B versus B′ design to generate negative results (under the above-stated three conditions) seems obvious: even in the absence of any specific therapeutic effect, the results of such studies would be positive due to nonspecific effects such as a placebo-effect, the additional care given to patients, the therapist-patient relationship or social desirability. A further contributor could be the disappointment experienced by patients of the control groups when not receiving the experimental treatments they may have hoped for.
The design issues highlighted here are well known to research methodologists. They are, however, not always appreciated by clinicians who read published articles and may tend to equate RCTs with the highest level of reliability. Often the authors of these articles fail to discuss the drawbacks of the ′A + B versus B′ design in sufficient detail. Certainly they were not emphasized by the authors of the studies included in this review.
Studies with three of more groups are usually designed to answer more than one research question. Therefore such RCTs might be valuable even if they include a comparison of ′A + B versus B′. Conclusions based specifically on any ′A + B versus B′ comparison within such trials can nevertheless be criticized on the basis of the above arguments. Similarly, "pragmatic" trials that adopt a different design (for instance ′A versus B′) are not affected by our criticism.
It would, of course, be an over-interpretation of our results to state categorically that RCTs with the ′A + B versus B′ designs can, in principle, only generate positive results. We have shown this to be likely only for acupuncture as a treatment for pain. In other therapeutic areas, negative trials with that design may exist. Crucially, however, these RCTs do not fulfill the three additional criteria outlined in the introduction of this article. Interventions that are less prone to generating false positive findings when tested in a ′A + B versus B′ study include those that not only do not improve but worsen the condition in question and treatments which do not generate sizable placebo-effects. Also, studies that employ objectively measurable endpoints might, in some cases, offer a protection against such false positive findings.
In conclusion, our systematic review of RCTs of acupuncture for pain control with the ′A + B versus B′ suggests that this trial methodology is likely to produce false positive results. Such studies may therefore not be adequate scientific tests of the effectiveness of therapeutic interventions.
Copyright 2008 - Journal of Postgraduate Medicine
The following images related to this document are available: