Contradicted and Initially Stronger Effects in Highly Cited Clinical Research
- Author Affiliations: Department of Hygiene and Epidemiology, University of Ioannina School of Medicine, Ioannina, Greece, and the Institute for Clinical Research and Health Policy Studies, Department of Medicine, Tufts-New England Medical Center, Boston, Mass.
- Corresponding Author: John P. A. Ioannidis, MD, Department of Hygiene and Epidemiology, University of Ioannina School of Medicine, Ioannina 45110, Greece (jioannid{at}cc.uoi.gr).
Abstract
Context Controversy and uncertainty ensue when the results of clinical research on the effectiveness of interventions are subsequently contradicted. Controversies are most prominent when high-impact research is involved.
Objectives To understand how frequently highly cited studies are contradicted or find effects that are stronger than in other similar studies and to discern whether specific characteristics are associated with such refutation over time.
Design All original clinical research studies published in 3 major general clinical journals or high-impact-factor specialty journals in 1990-2003 and cited more than 1000 times in the literature were examined.
Main Outcome Measure The results of highly cited articles were compared against subsequent studies of comparable or larger sample size and similar or better controlled designs. The same analysis was also performed comparatively for matched studies that were not so highly cited.
Results Of 49 highly cited original clinical research studies, 45 claimed that the intervention was effective. Of these, 7 (16%) were contradicted by subsequent studies, 7 others (16%) had found effects that were stronger than those of subsequent studies, 20 (44%) were replicated, and 11 (24%) remained largely unchallenged. Five of 6 highly-cited nonrandomized studies had been contradicted or had found stronger effects vs 9 of 39 randomized controlled trials (P = .008). Among randomized trials, studies with contradicted or stronger effects were smaller (P = .009) than replicated or unchallenged studies although there was no statistically significant difference in their early or overall citation impact. Matched control studies did not have a significantly different share of refuted results than highly cited studies, but they included more studies with “negative” results.
Conclusions Contradiction and initially stronger effects are not unusual in highly cited research of clinical interventions and their outcomes. The extent to which high citations may provoke contradictions and vice versa needs more study. Controversies are most common with highly cited nonrandomized studies, but even the most highly cited randomized trials may be challenged and refuted over time, especially small ones.
- KEYWORDS:
- PERIODICALS
- RANDOMIZED TRIALS
- RESEARCH
Clinical research on important questions about the efficacy of medical interventions is sometimes followed by subsequent studies that either reach opposite conclusions or suggest that the original claims were too strong. Such disagreements may upset clinical practice and acquire publicity in both scientific circles and in the lay press. Several empirical investigations have tried to address whether specific types of studies are more likely to be contradicted and to explain observed controversies. For example, evidence exists that small studies may sometimes be refuted by larger ones.1-2
Similarly, there is some evidence on disagreements between epidemiological studies and randomized trials.3-5 Prior investigations have focused on a variety of studies without any particular attention to their relative importance and scientific impact. Yet, most research publications have little impact while a small minority receives most attention and dominates scientific thinking and clinical practice. Impact is difficult to measure in all its dimensions. However, the number of citations received by a publication is a surrogate of the attention it has received in the scientific literature and its influence on scientific debate and progress. Citations are readily and objectively counted in established databases.6 High citation count does not necessarily mean that these studies are accepted; citations may sometimes be critical of an article. Nevertheless, citation count is a measure of how much a study has occupied the thinking of other scientists and has drawn attention—for good or bad.
It is important to evaluate the replication of clinical research studies that have the highest citation impact. How frequently are such studies eventually contradicted by other research or are found to have too strong results compared with subsequent evidence? Is this more common for specific types of studies? Answering these questions would be useful for interpreting the results of influential clinical research.
METHODS
Eligible Original Studies
Eligible original studies for this analysis included all publications that had received more than 1000 Institute for Scientific Information (ISI)–indexed6 citations; had been published between 1990 and 2003 in the 3 general medical journals with the current highest impact factor (New England Journal of Medicine, JAMA, Lancet) or in medical specialty journals with impact factor exceeding 7.0 (according to the Journal Citation Reports 2003) that are likely to publish clinical research (including in decreasing impact factor, the Journal of the National Cancer Institute, Gastroenterology, Annals of Internal Medicine, Circulation, Journal of Clinical Oncology, Archives of General Psychiatry, Blood, Hepatology, American Journal of Respiratory and Critical Care Medicine, Diabetes, Brain, Annals of Neurology, Journal of the American College of Cardiology, Diabetes Care, Journal of the American Society of Nephrology, Arthritis and Rheumatism, and the American Journal of Psychiatry); addressed the efficacy of therapeutic or preventive interventions; and pertained to primary data (excluding reviews and meta-analyses).
Citation counts for articles published between January 1, 1990, and December 31, 2003, in these journals were downloaded from ISI. Citation counts are censored on August 20, 2004. All articles with more than 1000 citations were screened further. Studies with group authorship may be cited in various ways; therefore, I summed up citations cataloged under different entries for the same article (using the first author name, group abbreviations, and anonymous entries).7 The total citation count does not capture the few citations for which wrong name, journal, volume, or page might have been cited. Since citations depend on the time interval since publication, a separate citation count was limited to the first 3 years after the publication year.
Other Clinical Research on the Same Questions
For each eligible original study, a search was performed to identify whether there had been any other concurrently or subsequently published clinical research addressing the same question. Other research was considered eligible, only if the sample size was close to or larger than that of the highly cited original study or if it used a theoretically better controlled design. Thus, for highly cited randomized trials, I perused all randomized trials having at least 30% of the sample size of the eligible highly cited original study. Whenever available, quantitative meta-analyses of trials were used as summaries of trial results. Whenever several pertinent meta-analyses were available, the one including the largest number of studies was preferred. For highly cited nonrandomized studies, subsequently published pertinent randomized trials and meta-analyses thereof were eligible regardless of sample size; nonrandomized evidence was also considered, if randomized trials were not available.
Concurrently or subsequently published evidence was identified in PubMed using searches that combined terms pertaining to the tested interventions, disease and outcome, and terms pertinent to the search of randomized trials and meta-analyses. Searches followed the Cochrane algorithms for finding meta-analyses and randomized trials.8
Data Extraction and Classification of Studies
For each eligible original study, I recorded the study name, intervention, disease and outcomes of interest, study design, sample size, main conclusions, and citation counts. For the articles presenting or summarizing other relevant research, I recorded the study design, total sample size, and the findings as compared with those of the original highly cited study.
Highly cited studies were classified as negative (when they claimed the tested experimental intervention was ineffective, harmful, or no better from the control intervention), unchallenged (when no other clinical research of eligible design and sample size was available to validate the claimed efficacy), contradicted, initially stronger effects, or replicated effects. The classification of studies in these categories was based on the final interpretation of the results by the authors in the “Abstract” and “Discussion” sections of their original publications. Highly cited articles were classified according to whether their authors suggested that an intervention was overall effective or ineffective. When both benefits and harms or caveats were presented, I focused on the net conclusion of whether the experimental intervention merits consideration for use in clinical practice. Subsequent research was classified in the same manner. Contradiction was declared when the original highly cited study claimed the intervention to be effective, while subsequent research showed it to be ineffective. When both original and subsequent research claimed the intervention was effective, studies were compared further regarding the effect size for the major clinical outcome, the durability of the treatment effect, and the generalizability and applicability to various settings. Initially stronger effects were defined when the relative risk reduction for the main outcome in the subsequent research was half or less compared with what had been proposed by the original highly cited study (regardless of whether confidence intervals might overlap or not), or when the subsequent research showed that the originally proposed benefit was of short duration or its applicability and generalizability was limited. Classification of the studies independently by another investigator yielded a highly similar profile (weighted Cohen κ = 0.92).
Correlates of Contradicted or Initially Stronger Effects
Among original highly cited studies with efficacy claims, analyses examined whether those with contradicted or initially stronger effects differed from the replicated and unchallenged ones in study design, publication year, sample size, type of disease (heart disease vs other), journal of publication, citation count, early citation count, and average citations per year after publication. Comparisons used the Mann-Whitney U test for continuous variables and Fisher exact test for binary variables.
Comparison of Highly Cited Articles Against Less Cited Articles
To evaluate whether highly cited studies differ from other studies that are not so highly cited in their findings and potential for contradiction, a control group of articles pertaining to the assessment of interventions was also assembled. Control-group articles were 1:1 matched for journal, year of publication, and design (randomized vs nonrandomized) against each of the highly cited articles. Control articles were selected by screening chronologically the contents of the pertinent journals for each pertinent year starting July 1 (to ensure approximately similar follow-up for citations with the highly cited articles against which they were matched). Other research was searched and the control articles were categorized in a similar fashion as described for the highly cited articles above. Differences between highly cited and control articles were examined with conditional logistic regression to account for matching.
Analyses
Analyses were performed in SPSS version 12.0 (SPSS Inc, Chicago, Ill) and StatXact (Cytel Corp, Boston, Mass). P values are 2-tailed, and P<.05 was considered statistically significant.
RESULTS
Eligible Studies
One hundred fifteen articles published between 1990 and 2003 had received more than 1000 citations (major general clinical journals, n = 91; specialty journals, n = 24). Of those, 66 were excluded (nonsystematic reviews or editorials, n = 20; meta-analyses, n = 7; case-control studies of risk factors, n = 12; prevalence or incidence studies, n = 8; cohort studies of risk factors, n = 3; recommendations, n = 3; prognostic models, n = 4; time-trend analysis, n = 1; case series, n = 1; presentations of interviews, instruments, or assays n = 3, classification criteria n = 4). The remaining 49 articles were eligible (Table 1)9-57 of which 47 had appeared in major general medical journals. They included 43 randomized trials, 4 prospective cohorts, and 2 case series. In recent years (1998 through 2003), the 3 general journals have published an almost equal number of highly cited articles (New England Journal of Medicine, n = 4; JAMA, n = 3; Lancet, n = 3). A smaller proportion of highly cited articles published in specialty journals than those published in general journals were eligible for the analysis (2/24 vs 47/91, P<.001), because highly cited articles in specialized journals were mostly nonsystematic reviews or editorials (10/24); classification criteria (4/24); or descriptions of standardized interviews, instruments, and assays (3/24). Many diverse disciplines were represented, but the most common topic was heart disease (n = 27).
Table 1. Eligible Highly Cited Studies
Four eligible highly cited studies showed no efficacy for the tested interventions. They contradicted prior claims for potential efficacy of vitamin E, beta carotene, and retinol for lung cancer and/or coronary artery disease; and showed an increased risk of coronary artery disease with hormone therapy in postmenopausal women (Table 2).
Table 2. Other Research and Current State of Knowledge
Of the 45 eligible highly cited studies with efficacy claims (Table 2), 7 (16%) were contradicted by subsequent research, and another 7 (16%) were found to have initially stronger effects. In all these 14 cases (Box 1), subsequent studies were either larger or better controlled (randomized vs a nonrandomized original study). The findings of 20 highly cited articles (44%) were replicated (also with a larger sample size in subsequent research compared with the original highly cited study) and 11 (24%) had remained largely unchallenged.58-78
Box 1. Contradicted and Initially Stronger Effects in Highly Cited Studies
Contradicted Findings
The Nurses’ Health Study,13 a prospective cohort, found a 44% relative risk reduction in coronary artery disease events in women receiving hormone therapy. A small randomized trial42 found major beneficial effects of this intervention on surrogate markers of coronary artery disease (lipoprotein and fibrinogen levels) claiming that this should translate to a major clinical benefit. Although the latter trial was not refuted at the level of surrogate outcomes, inferences for the anticipated effects on clinical outcomes were contradicted. The Women’s Health Initiative,46 a large randomized trial, found that estrogen and progestin significantly increased the relative risk of coronary events by 29% among postmenopausal women, and refuting results were also seen in another large randomized trial, the Heart and Estrogen/progestin Replacement Study (HERS).44
Two large prospective cohort studies, the Health Professionals Follow-Up study20 and the Nurses’ Health Study,21 found that vitamin E was significantly associated with a decreased risk of coronary artery disease and a trial of 2002 patients also suggested a 47% relative risk reduction for cardiovascular deaths or nonfatal myocardial infarction with vitamin E.51 However, an even larger randomized trial66 subsequently showed absolutely no beneficial effect for vitamin E on coronary artery disease (relative risk 1.05 for cardiovascular deaths and 1.02 for myocardial infarction).
A small randomized trial (n = 200) suggested that the human IgM monoclonal antibody to endotoxin could almost halve mortality due to gram-negative sepsis.15 A subsequent randomized trial of more than 10-fold larger sample size62 found a nonsignificant 11% relative risk increase for mortality.
Finally, a small series of 9 patients22 proposed that nitric oxide inhalation is very effective in patients with respiratory distress syndrome by improving oxygenation. However 5 randomized trials involving 535 patients67 failed to show any clinical benefit.
Initially Stronger Effects
The early results of a trial on zidovudine monotherapy in asymptomatic patients with human immunodeficiency virus infection9 showed a significant 60% relative risk reduction against disease progression in the first year. The short-term benefit was not exaggerated. Yet this effect was short-lived and the benefit was lost after 18 months both in the same trial and also as shown in a subsequent meta-analysis.58
A randomized trial of 395 patients18 showed that immediate angioplasty was superior to thrombolysis with tissue plasminogen activator in acute myocardial infarction, achieving a 58% relative risk reduction for death or reinfarction. However, a subsequent meta-analysis with more than 2500 patients65 suggested that the benefit is probably much smaller (relative risk reduction 30%) and the largest and most recent trial that involved both specialized and nonspecialized centers had not shown any sizeable benefit of angioplasty (nonsignificant 20% risk reduction for death and nonsignificant 33% risk reduction for reinfarction).
Two randomized trials of 410 and 520 patients, respectively,26-27 showed that stents were superior to balloon angioplasty for management of coronary artery disease with 31% and 42% relative risk reductions, respectively, in the need for revasularization. Current evidence, as summarized by a meta-analysis of almost 10 000 patients, suggests that the benefit is probably much smaller that originally thought (approximately 10% relative risk reduction), and unblinding may have led to an increased effect on repeat angioplasty in these trials.69
Another trial suggested a prime role for tissue plasminogen activator in acute ischemic stroke.29 However, subsequent evidence has narrowed indications and the intervention is considered effective mostly when given very early after symptom onset.70
Carotid endarterectomy was initially reported to achieve a 5.9% absolute risk reduction for stroke or death, projected at 5 years,43 in patients with asymptomatic stenosis of the carotid artery exceeding 60%. A meta-analysis of several trials suggested a more modest benefit with 2% absolute risk reduction at 3.1 years.75
Finally, a cohort study of 805 people found a 68% adjusted relative risk reduction for coronary artery disease with flavonoids48 while a meta-analysis of prospective cohorts with total sample size exceeding 100 000 suggests only a 20% relative risk reduction in the top vs bottom third of flavonoid uptake.76
RETURN TO TEXTComparison of Contradicted or Initially Stronger vs Replicated or Unchallenged Findings
Five of 6 highly cited nonrandomized studies had been contradicted or had initially stronger effects while this was seen in only 9 of 39 highly cited randomized trials (P = .008). Table 3 shows that trials with contradicted or initially stronger effects had significantly smaller sample sizes and tended to be older than those with replicated or unchallenged findings. There were no significant differences on the type of disease. The proportion of contradicted or initially stronger effects did not differ significantly across journals (P = .60). There was also no significant difference in the number of citations received in the first 3 years between these 2 groups or in the overall number of citations over time although the citations per year tended to be nonsignificantly fewer in trials with contradicted or initially stronger effects.
Table 3. Comparison of Characteristics and Citation Counts of Randomized Trials With Contradicted or Initially Stronger Effects vs Those With Replicated or Unchallenged Findings
Comparison of Highly Cited Articles Against Less-Cited Control Articles
Of the 49 articles in the control group79-127 (with median of 157 citations, range 38-815, until 2004), the findings of 2 articles91, 119 were contradicted128-129 and 8 studies82, 90, 92, 95-96,109-110,117 had initially stronger effects130-137 (Box 2); 20 articles79-81,83, 86-89,101, 103-104,106, 108, 111-112,118, 123, 125-127 contained “positive” findings that were replicated,68, 138-155 8 studies93, 97-98,102, 107, 114-115,120 remained unchallenged, and 11 studies84-85,94, 99-100,105, 113-114,119-120,122 did not have any “positive” results; in 7 articles with some “positive” finding,79, 87, 91, 98, 108, 112, 120 there were also other interventions evaluated that had “negative” results although this mixture of “positive” and “negative” results had not been observed in any of the highly cited articles. The control articles had a larger number of “negative” findings compared with the highly cited articles (matched odds ratio [OR], 8; 95% confidence interval [CI], 1.8-34; P = .006 for any “negative” finding; and matched OR, 3.3; 95% CI, 0.92-12.0, P = .07 for exclusively “negative” findings). The highly cited articles did not have a smaller proportion of contradicted or initially stronger effects than the control articles if anything there was a trend for more contradicted or initially stronger effects in the highly cited articles (matched OR, 1.6; 95% CI, 0.6-4.0; P = .35; matched OR, 6.0; 95% CI, 0.7-50; P = .10 when limited to contradicted findings).
Box 2. Contradicted and Initially Stronger Effects in Control Studies
Contradicted Findings
In a prospective cohort,91 vitamin A was inversely related to breast cancer (relative risk in the highest quintile, 0.84; 95% confidence interval [CI], 0.71-0.98) and vitamin A supplementation was associated with a reduced risk (P = .03) in women at the lowest quintile group; in a randomized trial128 exploring further the retinoid-breast cancer hypothesis, fenretinide treatment of women with breast cancer for 5 years had no effect on the incidence of second breast malignancies.
A trial (n = 51) showed that cladribine significantly improved the clinical scores of patients with chronic progressive multiple sclerosis.119 In a larger trial of 159 patients, no significant treatment effects were found for cladribine in terms of changes in clinical scores.129
Initially Stronger Effects
A trial (n = 28) of aerosolized ribavirin in infants receiving mechanical ventilation for severe respiratory syncytial virus infection82 showed significant decreases in mechanical ventilation (4.9 vs 9.9 days) and hospital stay (13.3 vs 15.0 days). A meta-analysis of 3 trials (n = 104) showed a decrease of only 1.8 days in the duration of mechanical ventilation and a nonsignificant decrease of 1.9 days in duration of hospitalization.130
A trial (n = 406) of intermittent diazepam administered during fever to prevent recurrence of febrile seizures90 showed a significant 44% relative risk reduction in seizures. The effect was smaller in other trials and the overall risk reduction was no longer formally significant131; moreover, the safety profile of diazepam was deemed unfavorable to recommend routine preventive use.
A case-control and cohort study evaluation92 showed that the increased risk of sudden infant death syndrome among infants who sleep prone is increased by use of natural-fiber mattresses, swaddling, and heating in bedrooms. Several observational studies have been done since, and they have provided inconsistent results on these interventions, in particular, they disagree on the possible role of overheating.132
A trial of 54 children95 showed that the steroid budenoside significantly reduced the croup score by 2 points at 4 hours, and significantly decreased readmissions by 86%. A meta-analysis (n = 3736)133 showed a significant improvement in the Westley score at 6 hours (1.2 points), and 12 hours (1.9 points), but not at 24 hours. Fewer return visits and/or (re)admissions occurred in patients treated with glucocorticoids, but the relative risk reduction was only 50% (95% CI, 24%-64%).
A trial (n = 55) showed that misprostol was as effective as dinoprostone for termination of second-trimester pregnancy and was associated with fewer adverse effects than dinoprostone.96 A subsequent trial134 showed equal efficacy, but a higher rate of adverse effects with misoprostol (74%) than with dinoprostone (47%).
A trial (n = 50) comparing botulinum toxin vs glyceryl trinitrate for chronic anal fissure concluded that both are effective alternatives to surgery but botulinum toxin is the more effective nonsurgical treatment (1 failure vs 9 failures with nitroglycerin).109 In a meta-analysis135 of 31 trials, botulinum toxin compared with placebo showed no significant efficacy (relative risk of failure, 0.75; 95% CI, 0.32-1.77), and was also no better than glyceryl trinitrate (relative risk of failure, 0.48; 95% CI, 0.21-1.10); surgery was more effective than medical therapy in curing fissure (relative risk of failure, 0.12; 95% CI, 0.07-0.22).
A trial of acetylcysteine (n = 83) showed that it was highly effective in preventing contrast nephropathy (90% relative risk reduction).110 There have been many more trials and many meta-analyses on this topic. The latest meta-analysis136 shows a nonsignificant 27% relative risk reduction with acetylcysteine.
A trial of 129 stunted Jamaican children found that both nutritional supplementation and psychosocial stimulation improved the mental development of stunted children; children who got both interventions had additive benefits and achieved scores close to those of nonstunted children.117 With long-term follow-up, however, it was found that the benefits were small and the 2 interventions no longer had additive effects.137
RETURN TO TEXTCOMMENT
Original highly cited articles about medical interventions are published almost exclusively in 3 general medical journals. Actually, there has been an approximate equal share of very highly cited articles among these 3 journals since 1998 as impact factor differences have diminished among these 3 journals. Articles in specialty journals that reach such high numbers of citations are usually review articles or articles describing tools useful to specific diseases rather than original data. Contradicted and potentially exaggerated findings are not uncommon in the most visible and most influential original clinical research: 16% of the top-cited clinical research articles on postulated effective medical interventions that have been published within the last 15 years have been contradicted by subsequent clinical studies and another 16% have been found to have initially stronger effects than subsequent research. Contradiction or initially stronger effects have been encountered in 5 of 6 cases for which nonrandomized designs were used, but even randomized trials have not escaped controversy. More than a third of the top-cited randomized trials published from 1990 through 1995 have already been affected, while for more recent trials, the time frame is still early and more may be contradicted in the future. Sample size seems to be important, with smaller sample sizes in trials that have met controversy vs those that have not.
The classification of studies in this analysis involves many judgments pertaining to the complexity of studying a given research question with somewhat different populations, interventions, durations, and outcomes. However, these studies are widely known for their inferences and this is also proven by the high interrater agreement. Nevertheless, it should also be acknowledged that although the classification was performed in duplicate, the searches were performed by only 1 investigator. It is unavoidable that some other investigators may feel differently about the categorization of specific studies, especially for topics that may also have heavy debates surrounding them. However, this is unlikely to change the aggregate picture about refutation rates.
The examination of contradictions and refutations offers a fascinating look at the process of science. Four of the highly cited articles examined herein were refuting investigations with “negative” results. However, in a sense, even the other highly cited articles with “positive” results refuted prior knowledge and practice by introducing new concepts and proposing new interventions. We should acknowledge that there is no proof that the subsequent studies and meta-analyses were necessarily correct. A perfect gold standard is not possible in clinical research, so we can only interpret results of studies relative to other studies. Whenever new research fails to replicate early claims for efficacy or suggests that efficacy is more limited than previously thought, it is not necessary that the original studies were totally wrong and the newer ones are correct simply because they are larger or better controlled. Alternative explanations for these discrepancies may include differences in disease spectrum, eligibility criteria, or the use of concomitant interventions.156 Different studies on the same question are typically not replicas of each other. In fact discrepancies may be interesting on their own because they require careful scrutiny of the data and reappraisal of our beliefs. Thus, it is probably not surprising that the citation rate of these refuted studies did not seem to be much affected. Nevertheless, the controversy generates considerable uncertainty for clinical practice and none of the contradicted interventions is currently recommended by practice guidelines.
The mere fact that a study is highly cited suggests that there is a strong active interest in the questions addressed from a clinical or research perspective. This may increase the chances that other, larger trials may eventually be conducted. However, for most clinical questions of interest, no large trials are ever conducted and evidence is based only on small trials or nonrandomized studies.157 Small trials or meta-analyses thereof may often be refuted subsequently by large trials1-2 when such large trials are performed. Small studies using surrogate markers may also sometimes lead to erroneous clinical inferences.158 There were only 2 studies with typical surrogate markers among the highly cited studies examined herein, but both were subsequently contradicted in their clinical extrapolations about the efficacy of nitric oxide22 and hormone therapy.42 In the case of initially stronger effects, the differences in the effect sizes could often be within the range of what would be expected based on chance variability. This reinforces the notion that results from clinical studies, especially early ones, should be interpreted using not only the point estimates but also the uncertainty surrounding them. However, besides differences in effect sizes, most initially stronger effects pertained also to issues of durability, generalizability, or applicability of the proposed effects, as discussed above. Thus, clinicians should be aware that these important aspects may not be fully settled when an important treatment breakthrough is announced.
A third of the most-cited clinical research seems to have replication problems, and this seems to be as large, if not larger, than the vast majority of other, less-cited clinical research. The current analysis found that matched studies that were not so highly cited had a greater proportion of “negative” findings and similar or smaller proportions of contradicted results as the highly cited ones. Publication bias159-160 and time-lag bias161-162 favoring the rapid and prominent publication of “positive” findings may underlie some of the observed phenomena. Highly cited articles are already a selected sample with underrepresentation of “negative” findings compared with the average article on interventions published in major journals. It is possible that high-profile journals may tend to publish occasionally very striking findings and that this may lead to some difficulty in replicating some of these findings.163 Poynard et al164 evaluated the conclusions of hepatology-related articles published between 1945 and 1999 and found that, overall, 60% of these conclusions were considered to be true in 2000 and that there was no difference between randomized and nonrandomized studies or high- vs low-quality studies. Allowing for somewhat different definitions, the higher rates of refutation and the generally worse performance of nonrandomized studies in the present analysis may stem from the fact that I focused on a selected sample of the most noticed and influential clinical research. For such highly cited studies, the turnaround of “truth” may be faster; in particular nonrandomized studies may be more likely to be probed and challenged than nonrandomized studies published in the general literature.
Finally, a certain proportion of highly cited trials may remain unchallenged. Sometimes the evidence from the original study may seem so overwhelming that further similar studies are deemed unethical to perform. The original study may be widely considered as a milestone for clinical practice and may provide the gold standard for testing new interventions. However, sometimes other, validating research may be in the works. Clinical research is time-consuming and challenging results may take several years to generate and publish. Therefore evidence from recent trials, no matter how impressive, should be interpreted with caution, when only one trial is available. It is important to know whether other similar or larger trials are still ongoing or being planned. Therefore, transparent and thorough trial registration is of paramount importance165 in order to limit premature claims for efficacy.
Author Contributions: Dr Ioannidis had full access to all of the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.
Financial Disclosures: None reported.
Acknowledgment: I thank Dr Tom Trikalinos for classifying independently the status of the highly cited articles.
REFERENCES
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
- 8.
- 9.
- 10.
- 11.
- 12.
- 13.
- 14.
- 15.
- 16.
- 17.
- 18.
- 19.
- 20.
- 21.
- 22.
- 23.
- 24.
- 25.
- 26.
- 27.
- 28.
- 29.
- 30.
- 31.
- 32.
- 33.
- 34.
- 35.
- 36.
- 37.
- 38.
- 39.
- 40.
- 41.
- 42.
- 43.
- 44.
- 45.
- 46.
- 47.
- 48.
- 49.
- 50.
- 51.
- 52.
- 53.
- 54.
- 55.
- 56.
- 57.
- 58.
- 59.
- 60.
- 61.
- 62.
- 63.
- 64.
- 65.
- 66.
- 67.
- 68.
- 69.
- 70.
- 71.
- 72.
- 73.
- 74.
- 75.
- 76.
- 77.
- 78.
- 79.
- 80.
- 81.
- 82.
- 83.
- 84.
- 85.
- 86.
- 87.
- 88.
- 89.
- 90.
- 91.
- 92.
- 93.
- 94.
- 95.
- 96.
- 97.
- 98.
- 99.
- 100.
- 101.
- 102.
- 103.
- 104.
- 105.
- 106.
- 107.
- 108.
- 109.
- 110.
- 111.
- 112.
- 113.
- 114.
- 115.
- 116.
- 117.
- 118.
- 119.
- 120.
- 121.
- 122.
- 123.
- 124.
- 125.
- 126.
- 127.
- 128.
- 129.
- 130.
- 131.
- 132.
- 133.
- 134.
- 135.
- 136.
- 137.
- 138.
- 139.
- 140.
- 141.
- 142.
- 143.
- 144.
- 145.
- 146.
- 147.
- 148.
- 149.
- 150.
- 151.
- 152.
- 153.
- 154.
- 155.
- 156.
- 157.
- 158.
- 159.
- 160.
- 161.
- 162.
- 163.
- 164.
- 165.








