Control Groups in Pharmacotherapy and Psychotherapy Evaluations
Donald F. Klein, MD
Department of Psychiatry, Columbia University
Department of Therapeutics, College of Physicians and Surgeons and the New York State Psychiatric Institute
Whether placebos should be included in evaluative therapeutic research remains controversial. Some argue that placebos should be abandoned on pragmatic and ethical grounds and replaced with comparative trials against known effective agents (Rothman & Michels, 1994). However, placebo controls are essential if causal attribution regarding characteristic features is to be made. This methodology also assures that pragmatic therapeutic goals are met. An algebraic model is presented to demonstrate the critical importance of placebos in controlling for demand characteristics and spontaneous remission. The inherent difficulty in designing psychological placebos by fiat and the inadequacy of wait-list controls are addressed. The model demonstrates that inclusion of a pill-placebo group may be the optimal controlled design in psychotherapy as well as pharmacotherapy research and argues for collaborative psychotherapyâ?”pharmacotherapy research programs.
Supported in part by PHS Grant MH-30906, MHCRC-New York State Psychiatric Institute.
I thank Lauren Smith and Donald Ross for editorial and substantive contributions and Jon Stewart, Michael Liebowitz, and Ned Nunes for comments on early drafts.
Correspondence concerning this article may be sent via email to [email protected]
The relative values of naturalistic, quasi-experimental and experimental methods has led to enormous controversy. Here we focus on a narrower issue: Does a pill-placebo case management group provide a useful pragmatic comparison in the experimental evaluation of psychotherapy? To approach this, some basic issues are reviewed.
Naturalistic studies, until recently, have been the mainstay of therapeutic advance. Ill people were given a treatment, and if it was reported that many became well, the treatment was considered effective. This conclusion actually rests on an implicit historical control: the course of those with the illness who had gone untreated. Because many patients who came to treatment appeared to have a long-standing difficulty, the presumption was understandable that the illness would continue unless effectively treated.
This post hoc ergo propter hoc (after and therefore because of) logical fallacy led to the adoption of many interventions, some pharmacologically useful (e.g., aspirin, quinine, digitalis) and some (particularly those that stemmed from the primitive theories of the day) that were positively harmful (e.g., emetics, cathartics, bleeding, and purging). In fact, most such remedies yielded no pharmacological specific benefit but nevertheless often produced relief, due to the so-called placebo effect. This was demonstrated experimentally by showing that these putative treatments were no more efficacious than pharmacologically inert substances. It also became apparent that many illnesses improved or completely remitted without formal treatment on their own.
Therefore, there were two hypotheses in competition with the hypothesis of efficacy, supported by historical control: (a) the patients got better on their own (spontaneous remission) or (b) the treatment benefits were only equivalent to those produced by any credible treatment approach (placebo effect). This last was often referred to as suggestion.
Much early discussion about psychoanalytic efficacy revolved about whether the patientâ?Ts insights or benefits were due to suggestion. The attempt was made to argue that the objective psychoanalytic method, amplified by the analystâ?Ts own analysis, provided a sufficient safeguard. However, warring psychoanalytic schools, all claiming superior results based on opposing theories, were not reassuring.
Eysenck (1952) provided an even more pointed critique by arguing that spontaneous remission produced as good or better results than psychotherapy and psychoanalysis. This led to extended controversy. The argument revolved about attempts to arrive at firm estimates of spontaneous remission (which are not uniform across patient groups) as well as firm estimates of improvement during treatment, which were of dubious objectivity and reliability and usually did not incorporate into the final accounting those who failed to complete treatment, as this was defined by the therapist in the context of an indefinitely prolonged treatment. It was not possible to distinguish treatment failures from treatment dropouts.
The major logical point of Eysenckâ?Ts critique is that it is a mistake to rely on the assumption that the patient would not have become well in the absence of intervention. An historical control is of value under quite restricted circumstances. Pasteurâ?Ts rabies vaccine did not require a controlled trial because the antecedent was unmistakable (being bitten by a rabid animal) and the disease was invariably fatal. Survival demonstrated the treatmentâ?Ts efficacy because neither placebo effect nor spontaneous remission were reasonable alternative hypotheses.
The development of psychotropic agents in the 1950s met with considerable skepticism. It was argued that apparent benefits could be due to suggestion or spontaneous remission. This led to the rapid adoption of the randomized, placebo-controlled trial. Both treatment groups were, within chance sampling fluctuations, equivalent with regard to placebo effect and spontaneous remission, providing an elegant control. Other alternative hypotheses such as experimenter or patient bias or a confound of patient differences with treatment method were countered, respectively, by double-blinding and randomization.
The remarkable success of this procedure confronted psychotherapy advocates with a dilemma since the alternative hypotheses of suggestion and spontaneous remission could apply to the benefits of psychotherapy. Further, Jerome Frank (1961) compared short-term psychotherapy with pill placebo and found similar symptomatic benefits. However, this was largely ignored or discounted as an inadequate design because a classical control group incorporates all of the putative treatment aspects except for the proposed active ingredient. Frankâ?Ts comparative treatment groups differed in many ways. Thus, his uncomfortable findings were largely ignored. It will be shown below that inappropriate explanatory criteria were used to discount pragmatically useful information.
Instead, comparative psychotherapeutic research focused on so-called wait-list controls, between-treatment comparisons, and contrasts with “psychological placebos.” The problems with these designs are discussed below, but first a model of therapy will be developed. It is intended to clarify the issues and, in particular, address the allowable inferences if a pill-placebo case management group is included in a psychotherapeutic trial.
This particular design problem achieved recent prominence in the controversies over the National Institute of Mental Health (NIMH) Treatment of Depression Collaborative Research Program (TDCRP). This study was first contemplated in the late 1970s to compare interpersonal psychotherapy and cognitiveâ?”behavioral therapy in the treatment of major depression. However, the NIMH Scientific Advisory Committee to the TDCRP concluded that this simple comparison did not disconfirm the alternative explanatory hypotheses that all benefits were the result of placebo effects or spontaneous remission. Therefore, it was agreed that a standard reference treatment was necessary. Imipramine, which had been confirmed by many placebo-controlled trials, as well as clinical practice, was chosen. Because this treatment was to represent pharmacological care, the provision of medication was embedded in a case management format that eschewed any formal psychotherapeutic intervention. This yielded the opportunity to assess the relative merits, and perhaps the comparative indicators, of these different classes of treatment. This was clearly of public health importance.
However, imipramine had not, in all trials, been uniformly superior to placebo, and it was suspected that patients with atypical depression might be imipramine refractory. Therefore, to establish that the sample was actually imipramine responsive (which allows generalization of psychotherapy effects to the relevant population in which psychotherapy and medication may be fruitfully compared), it was necessary to include a pill-placebo case management treatment arm. This served as an internal calibration demonstrating good medication practices, as well as a medication treatable sample, and thus allowed generalization of the psychotherapy benefits to a medication-appropriate population.
However, there was an immediate conflict as to the legitimacy of comparing psychotherapeutic effects to pill-placebo case management outcome. How could these results be interpreted when it was evident that the treatments differed in multiple ways? Some argued that the only legitimate contrast with the placebo group was the imipramine group and that the placebo versus psychotherapy contrasts were invalidated by multiple confounds.
Explanatory Versus Pragmatic Trials
The role of pill-placebo case management can also be framed in terms of Schwartz and Lellouchâ?Ts (1967) distinction between explanatory and pragmatic trials. The objective of the explanatory trial is to gain knowledge of whether a pharmacological benefit exists and to acquire an estimate of the effect size attributable directly to the active ingredient in this experimental setting. In pragmatic trials, the intent is simply to make a costâ?”benefit decision. This last approach is emphasized in outcomes research. The outcomes approach argues that if complex Treatment A is better (costâ?”benefit) than complex Treatment B in apparently similar groups of patients, why should there be concern about narrow questions of causal allocation? Letâ?Ts use A and discard B.
The term pragmatic has a reassuring, practical quality. Those involved in actually delivering services are continually reminded of the need to be practical, that is, to minimize cost. There are different strategies adoptable in pragmatic interests. For instance, researchers can relentlessly pursue a series of dismantling studies, progressively paring down the expenses of the treatment while investigating whether comparable outcome quality is achieved. This is neither simple nor quick.
A psychotherapeutic comparison with pill placebo in this framework might be considered the most radical sort of dismantling pragmatic trial. It results in the same causal ambiguities, but if the null hypothesis cannot be disconfirmed in a sufficiently powerful trial, then the expensive, lengthy procedures of a series of progressive dismantling trials can be avoided.
It is extraordinarily difficult, in the context of psychotherapy, to develop true explanatory clinical trials. This may be the reason that researchers invested in understanding the mechanisms of inducing psychotherapeutic benefit often pursue so-called process analyses. The intention is to demonstrate, on an individual longitudinal basis, that certain interventions, or attitudes, or interactions regularly precede therapeutic progress. The explanatory goal is to gain causal information with the hope of eventually improving practice. The major problem, of course, is that a heavy investment in understanding process, without an assurance that one is studying a treatment that contains characteristic effective therapeutic ingredients, puts the cart before the horse.
We have the peculiar situation in which the pill-placebo case management control, in the context of a pharmacotherapeutic clinical trial, yields an explanatory test with pragmatic implications. In contrast, the pill-placebo case management control in the context of a psychotherapeutic clinical trial does not allow a simple causal inference but nonetheless is an ultrapragmatic strategy.
Model of Placebo, Characteristic Treatment, and Incidental Variables
Following Basham (1986) and Grunbaum (1986), let E represent the sum of all extraneous factors, balanced across conditions through the use of random assignment, and let T represent the sum of all factors present in the formal treatment. The natural course of the illness, including spontaneous improvement or deterioration, is subsumed in E, so that the absence of treatment equals E. The total effect of treatment equals E + T. D represents the demand characteristics associated with treatment or placement in a control condition. Demand characteristics, as defined by Orne (1962), consist of the patientâ?Ts knowledge, expectations, and intentions regarding an experimental situation and, in this case, the process of psychotherapy. This results in an artifact that does not reflect the efficacy of the hypothesized treatment factors.
Then let us expand the notation to encompass patientsâ?T reaction to being placed in the experimental treatment and wait-list control settings. The deferring of treatment during a wait-list control equals E + DC, where DC represents all demand artifacts (patient reactions and expectancies) resulting from being placed in the wait-list condition. In the active treatment, it is noted that T now includes DT, which represents the analogous treatment condition artifact. Both DC and DT can be either negative or positive. Table 1 provides a summary of the variables used in the model.
Summary of Variables Used in Model Notation Variable Definition
E All extraneous factors
T All factors in formal treatment
TS Characteristic-specific treatment factors
TNS Nonspecific treatment factors
D Demand characteristics
DC Wait list
Basham (1986, p. 89) critiqued the all too common use of the so-called waiting-list control in psychotherapy as follows:
At the start of the study, subjects read and sign a consent form telling them they will be participating in a study and will be randomly assigned to either a delayed or immediate treatment condition. Having thus been informed they are in an experiment, subjects are likely to develop condition-specific expectancies about whether or not they will improve. Most likely, the subjects in the immediate treatment condition develop an increased expectancy of improving, whereas subjects in the waiting-list condition develop a reduced expectancy of improving and may be further demoralized by having received the less preferred condition.
Therefore, DC does not equal DT, and the net effect of these two artifacts is fully confounded with the effects of treatment. If a treatment is superior to wait list, it may simply indicate that DT is superior to DC, as seems very likely.
In the pharmacological situation, DC and DT are equated by the use of pill placebo and double blinding. With double blinding, neither subjects nor evaluators are aware of what is taken, thus equating subjectsâ?T and evaluatorsâ?T expectancies among the various treatments. If the measure is objective (e.g., the number of dollars earned), the need for evaluator blinding decreases. When the evaluator is also the therapist, the blind is often broken. Therefore, independent blinded evaluations and objective measures are extremely desirable in both pharmacotherapy and psychotherapy trials.
In the pharmacological placebo-controlled design, in contrast to the wait-list design, we subdivide T by defining DP as the demand artifact associated with receiving a pill, TNS as the incidental treatment factors, and TS as the sought-for characteristic treatment factor. We then have placebo effect = E + DP + TNS, whereas treatment = E + DP + TNS + TS. Therefore, one can, by subtraction, allocate the treatment effect to TS, which produces the differential outcome, if any.
Much of the debate over placebos in psychotherapy research is the result of theoretical and terminological ambiguity. Critelli and Neumann (1984) stated that “placebo” is not clearly defined. Commonly, following pharmacological practice, it is defined as an inert agent, but in the psychotherapeutic context this is misleading. The use of placebos, both pharmacological and psychological, rests on the fact that they are not inert, because they affect subjects’ perceptions, reactions, and expectancies; however, they are also not specific in the sense of requiring a particular component. (One might argue that they are specific in the sense that their efficacy is uniformly due to alleviating a particular antecedent patient characteristic: demoralization.). Critelli and Neumann argued that the concept of placebo is best understood in terms of the common factors associated with various types of therapy, such as expectancy, contact with a therapist, and therapeutic alliance.
Grunbaum (1986) criticized Critelli and Neumann (1984) for their conceptualization of placebo as the common factors in therapy because different therapies can share characteristic causal treatment components (including unrecognized ones). He distinguished characteristic factors, which are the hypothesized effective treatment components, from incidental factors, which include demand characteristics.
The control group in an experimental design is an intentional placebo controlling for demand characteristics and spontaneous worsening or improvement. Two different therapies compared with a placebo (composed of incidental factors) could both prove effective even if active ingredients are common to both therapies. The placebo is crucial in demonstrating that the improvement is not the result of the incidental aspects of treatment. Thus, the claim that placebos are only necessary to delineate particular specific mechanisms is untrue (Horvath, 1988; Wilkins, 1986) because with complex treatments the therapeutic mechanisms often remain obscure.
In psychotherapy research, the psychological placebo equivalent would be a psychological intervention that was inactive with regard to the theoretically defined active ingredients of the treatment (e.g., relaxation for Pavlovian desensitization). Pill placebos and psychological placebos seem to estimate the same effect, but pill placebos have an advantage because psychological placebos are only declared placebos by dubious, theoretical fiat (Horvath, 1988). Further, placebo = E + DP + TNS, but treatment = E + DT + TNS + TS. The difference equals (DT – DP) + TS. Unfortunately, the demand characteristics cannot be equated by double blinding. Therefore, comparison of a psychotherapy with a supposed psychotherapy placebo does not unequivocally estimate a characteristic benefit, despite assumptions to this effect.
The demand characteristics associated with treatment may well differ from the demand characteristics associated with the placebo treatment, especially if the placebo treatment is of poor credibility, thus lowering DP. This confounds the detection of a characteristic treatment effect. Further, there is the lingering suspicion that the constructed psychological placebo may actually include theoretically irrelevant but unrecognized actually beneficial or even toxic components, which cannot be the case with pill placebo. The construction of a psychotherapy placebo (including only incidental factors) is no mean feat. As Adair, Sharp, and Huynh (1990) showed, one can classify different groups of supposed placebos and show a range of effects relative to no treatment. Heimberg and Liebowitz (unpublished data) found that pill placebo appeared more effective on non-self-measures than what was considered a psychological placebo, but the reverse was true from patient self-ratings.
Basham (1986, p. 88) stated that
it is crucial in terms of the experimentâ?Ts internal validity that the treatment factors contained in the placebo group are strictly a subset of the factors in the total treatment. If such a component control condition is not used, the placebo group is no longer a formal control group, and valid statements about the causal role of specific treatment factors can no longer be made.
This is entirely correct with regard to explanatory trials.
Therefore, Basham (1986) recommended moving to comparing (pragmatically) putatively active treatments. This changes the question “Does it work?” to relative questions such as “Which works best?” or “How do they differ?” or “Which should I use?” Including demand artifacts and differentiating nonspecific from specific treatment factors, we have the following: Treatment A = E + DT +TNS +TS(A) and Treatment B = E + DT + TNS + TS(B). This amounts to estimating the relative difference between the two active agents TS(A) and TS(B). However, this assumes, probably incorrectly, equal demand characteristics, that is, DT(A) = DT(B).
Bashamâ?Ts (1986) argument for comparative studies is precisely the argument made by Klein and Rabkin (1984). Given the difficulty in constructing a psychological placebo, one should first find a difference between two credible, putatively active therapies and then pursue dismantling studies in an attempt to arrive at causal mechanisms (or the least costly treatment that yields equivalent benefits).
However, this may be unsatisfactory because for many psychotherapies the entire motor of change may be the common antidemoralizing aspects of therapy (e.g., providing a prestigious ally, a framework for understanding, prescribing what are claimed to be effective activities, and in general raising hope). That is, there is no TS. Therefore, even if several studies show that Treatment A is equivalent to Treatment B, there is still no basis to assert that they are doing anything beyond antidemoralization (E + DT + TNS). Worse, even if Psychotherapy A is better than Psychotherapy B, it may be that in this sample, Psychotherapy B has toxic components (DT(A) is not equal to DT(B)) such as relaxation for panic disorder (Heide & Borkovec, 1984). There is no assurance that the superior Treatment A possesses any specific benefits (TS(A)), although the pragmatist may discard Treatment B.
Comparison of Psychotherapy to Pill Placebo
This brings us back to pill placebo, which has no pharmacological activity, either toxic or beneficial. As before, pill placebo = E + DP + TNS, and Psychotherapy A = E + DT + TNS + TS(A). The difference between psychotherapy and pill placebo equals TS(A) + (DT – DP). If psychotherapy is better than pill placebo, we cannot attribute this difference solely to TS(A) because it might be due to the difference between DT and DP.
In fact, this is exactly the same equation developed for the difference between psychotherapy and a putative psychotherapy placebo. Comparative psychotherapeutic studies are no more causally stringent than comparisons of pill placebo with psychotherapy.
I believe that both logic and concern for the public health suggest that psychotherapeutic evaluations be conducted jointly with pharmacologic evaluations. The advantage is two-fold. First, pill placebo cannot contain any unsuspected active factors (positive or negative). Second, for patients who accept the utility of pharmacotherapy, credibility is assured.
The scientific goal of unequivocal causal attribution is not met, but what about the pragmatic, evaluative goal? It would be reassuring if a psychotherapy is clearly superior to pill placebo because it is unlikely that DT is much superior to DP, which implies a beneficial psychotherapy effect over and above placebo effects.
However, what if psychotherapy proves no better than pill placebo? What can be said then? Again, no unequivocal causal allocation can be made, but if one wants to maintain that TS(A) was actually effective, one would have to unreasonably maintain that DP – DT approximates TS(A). At any rate, who would be comfortable in promulgating a psychotherapy that could not beat pill-placebo case management? Therefore, in the context of comparing pharmacotherapy and psychotherapy, pill-placebo case management provides a useful, pragmatic, understandable benchmark.
Dush (1986) objected to additive models, such as this one, that include main effects for the therapeutic context posited to exist for both placebo and active treatment groups and estimates main effects by taking the difference in group outcomes. Dush argued that treatment Ã- situation interactions, such as a placebo Ã- setting effect, should be included in the model. Because it is impossible to separate treatment main effects from the treatment Ã- situation interactions, one cannot properly analyze the data. (This is not a logical necessity but an ethical problem because the surreptitious administration of medication would eliminate this confound in pharmacotherapy trials.)
However, treatment Ã- situation interactions can be reasonably regarded as part of the treatment effects because the therapeutic situation is a component of the treatment. Crucially, neither the main effects nor the interactions would appear without an active agent. If it can be concluded that the treatment is superior to placebo, what pragmatic or explanatory difference does it make if drug superiority is part main effect and part interaction?
Just as in the case of a two-way analysis of variance with one observation per cell, in which row Ã- column interactions and within-cell error are subsumed under error because they cannot be separated, it seems reasonable to subsume both treatment main effects and interactions under the concept of treatment effects. Therefore, Dushâ?Ts (1986) critique does not invalidate the present model. It does point out that treatment context may be of consequence and is subject to further study. Stratifying for varying treatment contexts may provide useful information.
Simply comparing two psychotherapies does not address the alternative explanatory hypotheses that all apparently equivalent improvement is due to spontaneous remission or placebo effect. This discussion affirms that psychotherapeutic evaluation against a so-called wait-list control is grossly misleading. Further, pragmatic differences between psychotherapies may be due to different demand characteristics rather than differences in characteristic efficacies.
Simply comparing a new treatment to a so-called standard treatment is so vulnerable to sampling definitions and variability that it is quite possible that the sample selected is unsuitable for the standard treatment. If the new putative treatment was equivalent to the “standard” treatment in this sample, it would simply indicate equivalent inefficacy. But this would be untestable in this experimental design. (This is discussed elsewhere; see Klein, 1995; Rothman & Michels, 1994.) Internal placebo calibration is necessary.
The proliferation of psychotherapy studies against wait-list or other incredible controls should be stemmed or at best considered hypothesis generating rather than hypothesis testing. The TDCRP design (psychotherapy, pharmacotherapy, pill-placebo case management) should serve as a minimum contemporary standard for definitive trials in the important area of comparative therapies. It may be amplified by a specified psychotherapy placebo. Deviations from this design require explanation and justification rather than a bland assumption that the alternative hypotheses to characteristic treatment effect have been adequately met. Amplifying this design to include a combination psychotherapy and medication treatment, various sequential treatments, or well described treatments as usually provided in clinical practice would be of great pragmatic value.
Adair, J. G., Sharpe, D., & Huynh, C. L. (1990). The placebo control group: An analysis of its effectiveness in educational research. Journal of Experimental Education, 59, 67â?”86.
Basham, R. B. (1986). Scientific and practical advantages of comparative design in psychotherapy outcome research. Journal of Consulting & Clinical Psychology, 54, 8â?”94.
Critelli, J. W., & Neumann, K. F. (1984). The placebo: Conceptual analysis of a construct in transition. American Psychologist, 39, 32â?”39.
Dush, D. M. (1986). The placebo in psychosocial outcome evaluations. Evaluation & The Health Professions, 9, 421â?”438.
Eysenck, H. J. (1952). The effects of psychotherapy: An evaluation. Journal of Consulting Psychology, 16, 319â?”324.
Frank, J. D. (1961). Persuasion and healing. Baltimore, MD: Johns Hopkins University Press.
Grunbaum, A. (1986). The placebo concept in medicine and psychiatry. Psychological Medicine, 16, 19â?”38.
Heide, F. J., & Borkovec, T. D. (1984). Relaxation-induced anxiety: Mechanisms and theoretical implications. Behaviour Research & Therapy, 22, 1â?”12.
Horvath, P. (1988). Placebos and common factors in two decades of psychotherapy research. Psychological Bulletin, 104, 214â?”225.
Klein, D. F. (1995). Response to Rothman and Michels on placebo-controlled clinical trials. Psychiatric Annals, 25, 401â?”403.
Klein, D. F., & Rabkin, J. G. (1984). Specificity and strategy in psychotherapy research and practice. In J. Williams & R. Spitzer (Eds.), Psychotherapy research: Where are we and where should we go? New York: Guilford Press.
Orne, M. T. (1962). Implications for psychotherapy derived from current research on the nature of hypnosis. American Journal of Psychiatry, 118, 1097â?”1103.
Rothman, K. F., & Michels, K. D. (1994). The continuing unethical use of placebo controls. The New England Journal of Medicine, 331, 394â?”398.
Schwartz, D., & Lellouch, J. (1967). Explanatory and pragmatic attitudes in therapeutical trials. Journal of Chronic Diseases, 20, 637â?”648.
Wilkins, W. (1986). Placebo problems in psychotherapy research: Socialâ?”psychological alternatives to chemotherapy concepts. American Psychologist, 41, 551â?”556.