You are seeing this message because your Web browser does not support basic Web standards. Find out more about why this message is appearing and what you can do to make your experience on this site better.


ABOUT JAMA
Advanced Search

Welcome   | My Account | E-mail Alerts | Access Rights | Sign In


  Vol. 298 No. 4, July 25, 2007 TABLE OF CONTENTS
  JAMA
  •  Online Features
  Review
 This Article
 •Abstract
 •PDF
 •Correction
 •Send to a friend
 • Save in My Folder
 •Save to citation manager
 •Permissions
 Citing Articles
 •Citation map
 •Citing articles on HighWire
 •Citing articles on ISI (2)
 •Contact me when this article is cited
 Related Content
 •Related letters
 •Similar articles in JAMA
 Topic Collections
 •Evidence-Based Medicine
 •Statistics and Research Methods
 •Review
 •Alert me on articles by topic

Data Extraction Errors in Meta-analyses That Use Standardized Mean Differences

Peter C. Gøtzsche, MD, DrMedSci; Asbjørn Hróbjartsson, MD, PhD; Katja Maric, MSc; Britta Tendal, MSc

JAMA. 2007;298:430-437.

ABSTRACT

Context  Meta-analysis of trials that have used different continuous or rating scales to record outcomes of a similar nature requires sophisticated data handling and data transformation to a uniform scale, the standardized mean difference (SMD). It is not known how reliable such meta-analyses are.

Objective  To study whether SMDs in meta-analyses are accurate.

Data Sources  Systematic review of meta-analyses published in 2004 that reported a result as an SMD, with no language restrictions. Two trials were randomly selected from each meta-analysis. We attempted to replicate the results in each meta-analysis by independently calculating SMD using Hedges adjusted g.

Data Extraction  Our primary outcome was the proportion of meta-analyses for which our result differed from that of the authors by 0.1 or more, either for the point estimate or for its confidence interval, for at least 1 of the 2 selected trials. We chose 0.1 as cut point because many commonly used treatments have an effect of 0.1 to 0.5, compared with placebo.

Results  Of the 27 meta-analyses included in this study, we could not replicate the result for at least 1 of the 2 trials within 0.1 in 10 of the meta-analyses (37%), and in 4 cases, the discrepancy was 0.6 or more for the point estimate. Common problems were erroneous number of patients, means, standard deviations, and sign for the effect estimate. In total, 17 meta-analyses (63%) had errors for at least 1 of the 2 trials examined. For the 10 meta-analyses with errors of at least 0.1, we checked the data from all the trials and conducted our own meta-analysis, using the authors' methods. Seven of these 10 meta-analyses were erroneous (70%); 1 was subsequently retracted, and in 2 a significant difference disappeared or appeared.

Conclusions  The high proportion of meta-analyses based on SMDs that show errors indicates that although the statistical process is ostensibly simple, data extraction is particularly liable to errors that can negate or even reverse the findings of the study. This has implications for researchers and implies that all readers, including journal reviewers and policy makers, should approach such meta-analyses with caution.



INTRODUCTION
 Jump to Section
 •Top
 •Introduction
 •Methods
 •Results
 •Comment
 •Conclusions
 •Author information
 •References

Results from trials that have measured the same outcome on the same scale, eg, diastolic blood pressure in mm Hg, can readily be combined in a meta-analysis by calculating the weighted mean difference.1 Sometimes, trials have used outcomes of a similar nature but that were measured on different scales, eg, pain on a 5-point ranking scale or on a 100-mm visual analog scale, or depression on a clinician-rated scale such as the Hamilton Rating Scale for Depression2 or a self-rating scale such as the Beck Depression Inventory.3 In such cases, it is necessary to standardize the measurements on a uniform scale before they can be pooled in a meta-analysis. This is done by calculating the standardized mean difference (SMD) for each trial, which is the difference in means between the 2 groups, divided by the pooled standard deviation of the measurements.1 By this transformation, the outcome becomes dimensionless and the scales become uniform, eg, for the same degree of pain, values measured on a 100-mm analog scale would be expected to be 20 times larger than values measured on a 5-point ranking scale, but the standard deviation would also be expected to be 20 times larger.

Although simple in principle, it is not known how reliable this method is in practice. In contrast to a meta-analysis of binary data, which usually involves only the extraction of the number of patients and events from the trial reports, a meta-analysis using SMDs requires much more sophisticated data handling, and there are many pitfalls. Standard errors may be mistaken for standard deviations, which will inflate the estimates substantially, and standard deviations may need to be calculated or estimated from P values or other data. Some trials may have used changes from baseline instead of values after treatment but may have failed to report data that allow the calculation of within-patient standard deviations. Data extractors also need to know the direction of the scales, which is not always clear in the trial reports. When a high value on one scale means a poor effect, eg, on a depression scale, but a good effect on another scale, eg, a mood scale, it is necessary to change the sign of those values that mean the opposite. Adding to this complexity is that trial authors often give changes from baseline as positive values when they should have been negative, eg, when the average value after treatment is lower than the baseline value, or they say they have used changes from baseline when in reality they have used values after treatment. In 1 case, the review authors used the wrong sign for some of the estimates, which led to an erroneous conclusion of harm and retraction of the review, that, when corrected and republished, concluded that the intervention was beneficial.4

We studied whether trial SMDs in published meta-analyses are accurate and described the frequency and nature of any data extraction errors and their impact on the meta-analysis result.


METHODS
 Jump to Section
 •Top
 •Introduction
 •Methods
 •Results
 •Comment
 •Conclusions
 •Author information
 •References

We performed a PubMed search on March 3, 2005, for meta-analyses that had used the SMD and that were published in 2004. We used the search strategy (effect size or standardised mean difference or standardized mean difference or SMD) and (systematic review [title and abstract {tiab}] or meta-analysis [publication type {pt}] or review [pt]). There were no language restrictions.

We included meta-analyses with abstracts that reported an SMD or indicated that there was such a result in the article. The first result in the abstract or in the results section if there was none in the abstract was our index result.

We excluded meta-analyses if (1) the index result was clearly not based exclusively on randomized trials; (2) the index result was based on crossover trials; (3) the index result was not based on at least 2 trials; (4) the authors had used Bayesian statistics; (5) the authors had performed an individual patient data meta-analysis; (6) the meta-analysis had been performed by ourselves; or (7) the meta-analysis was not restricted to humans.

For each meta-analysis, the intervention that appeared to be the authors' primary interest was labeled the experimental intervention. It was easy to determine from the title, introduction, graphs, statistical advice, or grants which intervention was experimental. The other intervention, whether active or inactive, was defined as control. We noted the SMD and its timing for the index result, interventions, disease, any explicit statements about methods for selection of 1 of several possible outcomes or time points in a trial, statistical methods used for pooling, whether values after treatment or changes from baseline had been used, source of funding, and conflicts of interest.

We randomly selected 2 trials from each meta-analysis by using a random numbers table, starting at a new place in the table for every new trial. In one case, the selected trial report could not be retrieved, so we randomly selected another. We extracted outcome data from the trial reports, ensuring that the data extractor on a trial report was different from the one on the corresponding meta-analysis. The trial data extractor was provided with a data sheet with information on the experimental intervention, disease and measurement scale, including any timing if available in the meta-analysis, eg, Hamilton depression score after 6 weeks. Furthermore, the data extractor was informed about the trial result, with its 95% confidence interval (CI), and the group sizes, means and standard deviations for the particular trial's outcome if available, the statistical method used for pooling, and whether final values or changes had been used.

The reason for the lack of blinding was that we wished to see whether we could replicate the published results. We therefore focused on what the authors of the meta-analysis had done and not on what they could have done instead, eg, selected another, perhaps more appropriate, scale when several had been used for measuring the same outcome. Trial data extractors retrieved the necessary information for calculating the SMD from each trial report, including the direction of the effect in relation to the scale used, and could write comments.

Two persons extracted data independently and disagreements (which were mainly caused by simple oversight) were resolved by discussion. We contacted the authors of the meta-analyses for clarification when we could not replicate their data, or when essential data in the trial report for the calculations were missing, ambiguous, or appeared to be erroneous. When the authors had received unpublished data from the trial authors, we used the same unpublished data for our calculations.

Our main outcome was the proportion of meta-analyses for which 1 or both of our 2 trial SMDs differed from that of the authors by 0.1 or more, either for the point estimate or for its CI. We chose 0.1 as the cut point because many commonly used treatments have an effect of 0.1 to 0.5 compared with placebo. For example, the effect of acetaminophen on pain in patients with osteoarthritis is SMD –0.13 (95% CI, –0.22 to –0.04),5 the effect of antidepressants on mood in trials with active placebos is SMD 0.17 (95% CI, 0.00-0.34),6 the effect of physical and chemical methods to reduce house dust mite allergens on asthma symptoms is SMD –0.01 (95% CI, –0.10 to 0.13),7 whereas the effect of inhaled corticosteroids on asthma symptoms is relatively large, SMD –0.49 (95% CI, –0.56 to –0.43).8 Furthermore, an error of 0.1 can be important when 2 active treatments have been compared, for there is usually little difference between active treatments.

We used Microsoft Excel for our initial calculations of Hedges adjusted g, and Review Manager9 and Comprehensive Meta Analysis10 for our final estimates.


RESULTS
 Jump to Section
 •Top
 •Introduction
 •Methods
 •Results
 •Comment
 •Conclusions
 •Author information
 •References

We identified 148 potentially eligible reviews. Fifty-five were excluded based on the abstracts, another 61 after reading the full text, and 5 after reading the 2 randomly selected trial reports (Figure 1). The main reasons for exclusion were lack of a reported pooled SMD in the meta-analysis (n = 35) or for the individual trials (n = 16) and that the reviews were clearly not based solely on randomized trials (n = 29).


Figure 1
View larger version (109K):
[in this window]
[in a new window]
[as a PowerPoint slide]
 
Figure 1. Flowchart for Selection of Meta-analyses

RCT indicates randomized controlled trial; SMD, standardized mean differences.


We included 27 reviews,11-37 of which 16 were Cochrane reviews. Two reviews had industry funding, 18 nonindustry funding, 1 had no funding, 5 had no statements about funding, and 1 was unclear. All 16 Cochrane reviews had a conflict of interest statement, which is a standard heading, whereas 9 of the other 11 reviews had no such declaration.

The outcome in our index meta-analysis result was a clinical or functional score in 10 reviews, depression in 5, pain in 4, and other in 8. It was unclear whether the calculations were preferentially based on change from baseline or on final values in 15 meta-analyses; in 7, change from baseline was used; in 4, final values; and in 1, both approaches. In 22 reviews, the statistical method used for meta-analysis was Hedges adjusted g; in 3, Cohen d; and in 2, the method was not stated. Five reviews explicitly reported use of unpublished data in relation to one or both trials we selected.

Accuracy of the Published Data

In 10 of the 27 meta-analyses (37%), we could not replicate the result or its 95% CI within our predefined cut point of 0.1 for at least 1 of the 2 randomly selected trials38-49 (Figure 2). Seven meta-analyses (26%) had a trial with a discrepancy of 0.2 or more in the point estimate, and 4 (15%) a discrepancy of 0.6 or more, with a maximum of 1.45.48


Figure 2
View larger version (101K):
[in this window]
[in a new window]
[as a PowerPoint slide]
 
Figure 2. Cases for Which Our Calculated Standardized Mean Differences (SMDs) Differed From That of the Authors

The meta-analyses differed by 0.1 or more for the point estimate or its 95% confidence interval (CI) for at least 1 of 2 selected trials. (For den Boer et al,19 we used the original signs for differences.) The size of the data markers indicates the relative weight of the data. IQR indicates interquartile range.


Common errors were that the authors' number of patients, means, standard deviations, and sign for the effect estimate were wrong (after we had taken into account that some authors had reversed the sign for all trials, for convenience, to obtain a positive value for a beneficial effect; Figure 2).

We also found errors that led to a discrepancy of less than 0.1 in the SMD, eg, wrong standard deviation,30, 50 the use of number of patients and standard deviations at baseline rather than after treatment,27, 51 wrong time point,24, 52 and double counting of the control group when there were 2 treatment groups.26, 53 In total, we found 17 meta-analyses (63%) with errors for at least 1 of the 2 trials examined.

Other Problems

Multiplicity of Available Data. The authors of a meta-analysis of osteoporosis had based their calculations on exact P values, although means and standard deviations were available, but we found that the P values in both trials were seriously wrong.36 We replicated the authors' SMDs from the P values, but when we used means and standard deviations for the same outcome, we found an SMD of 0.34 vs the authors' 0.55 for the first trial,54 and 1.42 vs 0.60 for the second.55 In the second trial,55 there were 12 different data sets to choose from: intact or hemiplegic side, 2 measurement methods for bone mineral content, and values after treatment or changes, and 4 sets of P values. The SMDs for these 12 possibilities varied between –0.02 and 1.42.

Ten meta-analyses (37%) described methods for selection of 1 of several possible outcomes in a trial. In 4, however, the selected outcome was the most frequently reported one, which suggests that it might have been a post hoc decision rather than having been stated in a review protocol. Two meta-analyses had pooled the reported outcomes for each trial,21, 31 but pooling was inappropriate for one trial in which psychometric scales had been pooled with number of visits to the infirmary for psychiatric prison inmates21, 46 (if a person is mentally disturbed, he may score high on a psychometric scale but low on visits to a physician because his problems keep him from making an appointment; in fact, the SMD was 0.67 for 1 of the psychometric scales and –0.70 for 1 of the visit outcomes).

Eight meta-analyses (30%) had statements about the selection of 1 of several possible time points in a trial, but they were often unclear or appeared to have been post hoc decisions. One meta-analysis stated that "Day three clinical score was most often reported,"32 another that it had "trial durations of at least 6 weeks and for most 12 or more weeks, which is sufficient time for antidepressant effects to occur."31 In a third meta-analysis, the length varied between 2 and 8 weeks and the 2-week data were used because they included all study participants in both trials.14 A fourth meta-analysis selected "results obtained during the whole circumcision procedure,"16 but in 1 of the trials,43 there were 9 different data sets, corresponding to various time points. In a fifth meta-analysis,19 the authors had used 8-week data for one of the trials but 20-week data for the other when only half of the patients in the experimental group remained, although data were reported for each of the 20 weeks separately. Over these 20 weeks, the SMD varied substantially, between –0.73 and 0.41 (Figure 3).45


Figure 3
View larger version (39K):
[in this window]
[in a new window]
[as a PowerPoint slide]
 
Figure 3. Standardized Mean Differences and 95% Confidence Intervals (CIs)

A trial of depression using the Beck Depression Inventory score and comparing self-directed bibliotherapy with cognitive behavioral therapy.45 There were 40 patients at baseline and 28 remained after 20 weeks. (We changed the sign for the effect as the author did.) The size of the data markers indicates the relative weight of the data.


Adjusted Data. In a meta-analysis of nursing care, the authors had used statistically adjusted data and found an SMD of 0.31, whereas we found an SMD of 0.21, based on unadjusted data.22, 56 Because we could replicate the authors' result with adjusted data, we did not consider this a discrepancy but nevertheless believe that one should use unadjusted data in meta-analyses since trial authors are more prone to use adjustment when it results in smaller P values than unadjusted analyses.57

In another meta-analysis, the authors had "adjusted" their data by subtracting baseline values from values after treatment.27, 50 Because of dropouts and missing data, there were more patients at baseline. We calculated other SMDs than the authors reported and believe such corrections should be avoided because the patients at baseline are different from those after treatment.

Non-Gaussian Distributions. The data were often not normally distributed, and in some cases, the deviations from normality were substantial. In 6 meta-analyses, the standard deviation was larger than half the mean for at least 1 of the 2 trials, although the scale did not allow values below 0. In 3 meta-analyses, the SD even exceeded the mean, and in one case, the average number of sick days was 5.5 while the SD was 25.26, 53 Calculation of the SMD may be questionable in such cases.

Replication of Full Meta-analyses. For the 10 meta-analyses with important errors in 1 or both of our 2 selected trial results, we checked the data from all the trials and did our own meta-analysis, using the authors' methods. We shared our results with the authors, including those for the individual trials and asked them whether they could explain the differences.

For 7 (70%) of these meta-analyses,11, 13, 18, 21, 25, 32, 35 we could not replicate the authors' pooled result within our cut point of 0.1 in SMD for the point estimate or its CI, and for 5 of them, the discrepancy exceeded 0.2 (Figure 4). Because of our findings, 1 of these 7 meta-analyses was retracted by the editor who was also an author of the meta-analysis,11 in another, the authors reported a significant effect we could not reproduce,21 and in a third, we found a significant effect in contrast to the authors.32


Figure 4
View larger version (53K):
[in this window]
[in a new window]
[as a PowerPoint slide]
 
Figure 4. Replication of the 10 Meta-analyses for Which Our Calculated Standardized Mean Differences for at Least 1 of 2 Selected Trials Differed From That of the Authors

Seven meta-analyses differed by 0.1 or more for the point estimate or its 95% confidence interval (CI). (For den Boer et al,19 we used the original signs for differences.) The size of the data markers indicates the relative weight of the data.



COMMENT
 Jump to Section
 •Top
 •Introduction
 •Methods
 •Results
 •Comment
 •Conclusions
 •Author information
 •References

We found erroneous SMD estimates of potential clinical relevance for at least 1 of our 2 selected trials in 10 of 27 meta-analyses (37%). When we tried to replicate the 10 full meta-analyses by including all the trials, we found erroneous pooled estimates in 7 of them (70%).

Our choice of 0.1 as a cut point for errors can be discussed, but there were also many errors that were larger than 0.2, and several were larger than 0.6. Because it can be difficult for readers to grasp what a certain SMD means, we suggest that authors of meta-analyses use the pooled SMD to calculate back what the effect corresponds to on a commonly used scale, eg, an analog scale for pain or Hamilton scale for depression.

Although the error rates were high, they are very likely underestimates. First, we only checked a single outcome in only 2 randomly selected trials in each meta-analysis. Second, we did not check the full meta-analyses in the majority of cases for which we did not find errors of at least 0.1 in the SMDs in the 2 selected trials. But we could not avoid finding errors even in those meta-analyses. For example, we noted incidentally that in 1 of them,37 there was extreme heterogeneity for some of the trials that we had not selected; in one trial, SMD was –1.38 (95% CI, –2.07 to –0.68), corresponding to a large, significantly beneficial effect, and in another, the SMD was 0.80 (95% CI, 0.02-1.57), corresponding to a large, significantly harmful effect, with a distance of 0.70 between the borders of the 2 nonoverlapping CIs. This suggests that 1 of the estimates is highly likely to be wrong. Third, when we checked the full meta-analyses in the remaining cases, we found many additional errors. Of the 40 new trials for this analysis, we found errors in 16 (40%); in 12 of these, the discrepancy in SMD exceeded 0.2, and in 6, it exceeded 0.6. Some errors were extremely large but tended to neutralize each other as they went in both directions, eg, in 1 meta-analysis, the 4 largest discrepancies were 0.47, –1.35, 1.33, and –1.4532; in another, the 3 largest discrepancies were –0.79, 0.64, and 0.65.35

It should be noted that the use of SMD in meta-analyses is far more common than our results suggest. We had narrow inclusion criteria and excluded many meta-analyses because they were not based solely on randomized trials, or because there were insufficient data for our analyses (Figure 1). Furthermore, our PubMed search must have missed many meta-analyses because authors quite often do not indicate in their abstract that they have used the SMD. It is therefore likely that our sample consisted of meta-analyses that were relatively well done, well reported, and therefore well indexed, and that the problems could be more pronounced than we have described. We also note that our search technique may have led to oversampling of Cochrane reviews because the abstracts and methods of these reviews are standardized.1

Our study was small and needs to be replicated. It is also a limitation that we were primarily interested in detecting and discussing the possible consequences of obvious errors in published meta-analyses. The persons who extracted data from the trial reports were therefore aware of the data that had been used in the corresponding meta-analysis in order to focus on what the authors of the meta-analysis had done and not on what they could have done, sometimes with better justification, as illustrated in our examples.

There are only a few previous studies on the accuracy of continuous and ordinal-scale data in meta-analyses. A statistician with experience in systematic reviews found errors in 20 of 34 published Cochrane reviews in cystic fibrosis and other genetic disorders.58 This study was not limited to checking continuous data, but for these, some of the same types of errors were reported as those we found. The authors gave no data on the discrepancies but only noted that they did not lead to "substantial changes in any conclusion." In another study, we tried to replicate a meta-analysis of analgesic effects of placebos59-60 but found many serious errors, and after correcting for them, we rejected the authors' claim of large effects of placebo.61

Our study suggests that statistical expertise and considerable attention to detail are required to get SMDs right. We found examples from which it was necessary to extract information from analysis of variance tables, results of F tests and graphs with a nonzero origin; to combine baseline and follow-up data; and to judge whether results reported as medians and percentiles could be used with reasonable approximations. We also found examples of errors made by the trial authors, eg, an asymmetric CI, which is impossible for an untransformed continuous outcome; grossly erroneous P values; and apparently erroneous unpublished data (Figure 2).

It is usually recommended that 2 observers extract trial data independently and compare their findings,1 and a study based on 30 trials of the effect of melatonin for sleep disorders showed that single-data extraction with verification by a second observer led to more errors than double data extraction.62 However, in 1 of our meta-analyses,11 the data were very different from those in the trial reports for both included trials, although the review reported to have used 2 independent observers (Figure 2). This suggests that this precaution may not have taken place as reported or that the observers may not have checked what was entered in a statistical program and what was published.

Because data handling can so easily go wrong, it was unfortunate that it was rarely clear what the meta-analysts had done. Although we consulted the "Methods" sections and knew which estimates the meta-analysts had arrived at when we tried to replicate them, we often had to do extensive detective work in order to understand where they came from, for there was too little detail in the reviews. Cochrane reviews were the easiest to follow because graphs are always published that show—for each trial —the number of patients, the mean and standard deviation for each group, and the SMD and its CI. Other meta-analyses sometimes gave only the point estimate for the SMD.

The reporting could be improved if authors adhered to the Quality of Reporting of Meta-analyses (QUOROM) guidelines63 that are currently being updated under the name of PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-analyses).

We find it essential that meta-analysts report detailed data for each trial and detailed methods on why, how, and which trial data they extracted and whether decision rules on selection of outcomes and time points were prespecified in a review protocol and adhered to. Although our sample was limited, we found examples that SMDs in the same trial varied between –0.02 and 1.42 for the same type of outcome, and between –0.73 and 0.41 for the same outcome measured at different time points. These variations are extreme compared with the small effects some of our treatments have over placebo and the even smaller differences between most active treatments, and they suggest that the potential for error due to erroneous data handling and bias is far greater in meta-analyses of continuous and ordinal-scale outcomes than in those of binary data.

Further Research

Our study is the first step toward elucidating the reliability of the SMD when used in practice as a uniform outcome measure in meta-analyses. We will explore in another study the observer variability, when the same meta-analysis is performed by independent researchers using the same protocol. There is no tradition among statisticians for letting several people analyze the same set of raw data independently and comparing their results. However, observer variation studies among clinicians have shown that clinicians' diagnostic skills and mutual agreement is generally small and, indeed, much smaller than what they thought themselves before their beliefs were put on trial.64 It would be interesting to know whether the same applies to statisticians and other methodologists.

We will also explore whether meta-analyses using the weighted mean difference suffer from similar problems as meta-analyses using SMD.


CONCLUSIONS
 Jump to Section
 •Top
 •Introduction
 •Methods
 •Results
 •Comment
 •Conclusions
 •Author information
 •References

The high prevalence of errors that may potentially negate or even reverse the findings of the included studies implies that all readers, including journal reviewers and policy makers, should approach meta-analyses using SMDs with caution. Editors should be particularly careful when editing SMD meta-analyses.


AUTHOR INFORMATION
 Jump to Section
 •Top
 •Introduction
 •Methods
 •Results
 •Comment
 •Conclusions
 •Author information
 •References

Corresponding Author: Peter C. Gøtzsche, MD, DrMedSci, Nordic Cochrane Centre, Rigshospitalet, Dept 3343, Blegdamsvej 9, DK-2100 Copenhagen Ø, Denmark (pcg{at}cochrane.dk).

Author Contributions: Dr Gøtzsche had full access to all of the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.

Study concept and design: Gøtzsche, Hróbjartsson.

Acquisition of data: Gøtzsche, Hróbjartsson, Maric, Tendal.

Analysis and interpretation of data: Gøtzsche, Hróbjartsson, Tendal.

Drafting of the manuscript: Gøtzsche.

Critical revision of the manuscript for important intellectual content: Gøtzsche, Hróbjartsson, Maric, Tendal.

Statistical analysis: Gøtzsche, Tendal.

Administrative, technical, or material support: Gøtzsche, Tendal.

Study supervision: Gøtzsche.

Financial Disclosures: None reported.

Funding/Support: The study was not funded.

Additional Contributions: We thank senior statistician Julian Higgins, MRC, Biostatistics Unit, University of Cambridge, England, for comments on the manuscript and the following authors for providing additional information on their meta-analyses: Ruth Barclay-Goddard, MHSc, University of Manitoba; Barbara Brady-Fryer, RN, University of Ottawa, Ottawa, Ontario; Peter den Boer, MD, University Hospital Groningen, Groningen, the Netherlands; Chen Junmin, MD, Australasian Cochrane Centre, Melbourne, Australia; Chris Deery, MD, Edinburgh Dental Institute, Edinburgh, Scotland; Pasquale Frisina, PhD, City University of New York, New York; Peter Gibson, MB, BS, FRACP, John Hunter Hospital, Newcastle, Australia; Peter Griffiths, MD, Florence Nightingale School of Nursing and Midwifery at King's College, London, England; Kåre B. Hagen, MD, Diakonhjemmet Hospital, Oslo, Norway; Lisa Hartling, BScPT, MSc, University of Alberta, Edmonton; Jan Kool, PhD, Klinik Valens Rehabilitationszentrum, Valens, Switzerland; Gert Kwakkel, MD, University Hospital Vrije Universiteit, Amsterdam, the Netherlands; Hugh McGuire, Trials Search Coordinator, Cochrane Depression, Anxiety and Neurosis Group, London, England; Colleen Murphy, International Medical Corps, Santa Monica, California; Dr Edward Nuñes, MD, New York State Psychiatric Institute, New York; Hema Patel, MD, MSc, FRCP, Montreal Children's Hospital, Montreal, Quebec; M. Florent Richy, MSc, University of Liège, Belgium; Natasha Wiebe, MMath, University of Alberta, Edmonton. None of those acknowledged received compensation for their contributions.

Author Affiliations: Nordic Cochrane Centre, Rigshospitalet, Copenhagen, Denmark.


REFERENCES
 Jump to Section
 •Top
 •Introduction
 •Methods
 •Results
 •Comment
 •Conclusions
 •Author information
 •References

1. Higgins JPT, ed, Green S, ed. Cochrane Handbook for Systematic Reviews of Interventions, 4.2.5. http://www.cochrane.org/resources/handbook/hbook.htm. Updated May 2005. Accessed May 31, 2005.
2. Hamilton M. A rating scale for depression. J Neurol Neurosurg Psychiatry. 1960;23:56-62. ISI | PUBMED
3. Beck AT, Ward CH, Medelson M, Mock J, Erbaugh J. An inventory for measuring depression. Arch Gen Psychiatry. 1961;4:561-571. ISI | PUBMED
4. Murray E, Burns J, See TS, Lai R, Nazareth I. Interactive Health Communication Applications for people with chronic disease. Cochrane Database Syst Rev. 2005(4):CD004274. PUBMED
5. Towheed TE, Maxwell L, Judd MG, Catton M, Hochberg MC, Wells G. Acetaminophen for osteoarthritis. Cochrane Database Syst Rev. 2006(1):CD004257. PUBMED
6. Moncrieff J, Wessely S, Hardy R. Active placebos versus antidepressants for depression. Cochrane Database Syst Rev. 2004(1):CD003012. PUBMED
7. Gøtzsche PC, Johansen HK, Schmidt LM, Burr ML. House dust mite control measures for asthma. Cochrane Database Syst Rev. 2004(4):CD001187.
8. Adams NP, Bestall JC, Lasserson TJ, Jones PW, Cates CJ. Fluticasone versus placebo for chronic asthma in adults and children. Cochrane Database Syst Rev. 2005(4):CD003135. PUBMED
9. Manager R. Review Manager [computer program]. Version 4.2 for Windows. Copenhagen, Denmark: The Nordic Cochrane Centre, The Cochrane Collaboration; 2003.
10. Comprehensive Meta Analysis [computer program]. Version 2.2.030; Englewood, NJ: Biostat Inc; July 2006.
11. Brosseau L, Welch V, Wells G, et al. Low level laser therapy (Classes I, II and III) for treating osteoarthritis. Cochrane Database Syst Rev. 2004(3):CD002046. PUBMED
12. Barlow J, Coren E. Parent-training programmes for improving maternal psychosocial health. Cochrane Database Syst Rev. 2004(1):CD002020. PUBMED
13. Edmonds M, McGuire H, Price J. Exercise therapy for chronic fatigue syndrome. Cochrane Database Syst Rev. 2004(3):CD003200. PUBMED
14. Barclay-Goddard R, Stevenson T, Poluha W, Moffatt ME, Taback SP. Force platform feedback for standing balance training after stroke. Cochrane Database Syst Rev. 2004(4):CD004129. PUBMED
15. Castro-Rodriguez JA, Rodrigo GJ. Beta-agonists through metered-dose inhaler with valved holding chamber versus nebulizer for acute exacerbation of wheezing or asthma in children under 5 years of age: a systematic review with meta-analysis. J Pediatr. 2004;145(2):172-177. FULL TEXT | ISI | PUBMED
16. Brady-Fryer B, Wiebe N, Lander JA. Pain relief for neonatal circumcision. Cochrane Database Syst Rev. 2004(4):CD004217. PUBMED
17. Deery C, Heanue M, Deacon S, et al. The effectiveness of manual versus powered toothbrushes for dental health: a systematic review. J Dent. 2004;32(3):197-211. FULL TEXT | ISI | PUBMED
18. Chen J, Liu C. Methotrexate for ankylosing spondylitis. Cochrane Database Syst Rev. 2004(3):CD004524. PUBMED
19. den Boer PC, Wiersma D, Van den Bosch RJ. Why is self-help neglected in the treatment of emotional disorders? a meta-analysis. Psychol Med. 2004;34(6):959-971. FULL TEXT | ISI | PUBMED
20. Ekeland E, Heian F, Hagen KB, Abbott J, Nordheim L. Exercise to improve self-esteem in children and young people. Cochrane Database Syst Rev. 2004(1):CD003683. PUBMED
21. Frisina PG, Borod JC, Lepore SJ. A meta-analysis of the effects of written emotional disclosure on the health outcomes of clinical populations. J Nerv Ment Dis. 2004;192(9):629-634. ISI | PUBMED
22. Griffiths PD, Edwards MH, Forbes A, Harris RL, Ritchie G. Effectiveness of intermediate care in nursing-led in-patient units. Cochrane Database Syst Rev. 2004(4):CD002214. PUBMED
23. Gross AR, Hoving JL, Haines TA, et al. Manipulation and mobilisation for mechanical neck disorders. Cochrane Database Syst Rev. 2004(1):CD004249. PUBMED
24. Hagen KB, Hilde G, Jamtvedt G, Winnem M. Bed rest for acute low-back pain and sciatica. Cochrane Database Syst Rev. 2004(4):CD001254. PUBMED
25. Hartling L, Wiebe N, Russell K, Patel H, Klassen TP. Epinephrine for bronchiolitis. Cochrane Database Syst Rev. 2004(1):CD003123. PUBMED
26. Kool J, de Bie R, Oesch P, Knusel O, van den Brandt P, Bachmann S. Exercise reduces sick leave in patients with non-acute non-specific low back pain: a meta-analysis. J Rehabil Med. 2004;36(2):49-62. ISI | PUBMED
27. Kwakkel G, van Peppen R, Wagenaar RC, et al. Effects of augmented exercise therapy time after stroke: a meta-analysis. Stroke. 2004;35(11):2529-2539. FREE FULL TEXT
28. Latham NK, Bennett DA, Stretton CM, Anderson CS. Systematic review of progressive resistance strength training in older adults. J Gerontol A Biol Sci Med Sci. 2004;59(1):48-61. ISI | PUBMED
29. Merry S, McDowell H, Hetrick S, Bir J, Muller N. Psychological and/or educational interventions for the prevention of depression in children and adolescents. Cochrane Database Syst Rev. 2004(1):CD003380.