Peer Review
JAMA. 2002;287(21):2786-2790. doi: 10.1001/jama.287.21.2786

Measuring the Quality of Editorial Peer Review

  1. Tom Jefferson, MD;
  2. Elizabeth Wager, MA;
  3. Frank Davidoff, MD
  1. Author Affiliations: Health Reviews Ltd, Rome, Italy (Dr Jefferson); Sideview, Princes Risborough, England (Ms Wager); Annals of Internal Medicine, Philadelphia, Pa (Dr Davidoff).

Abstract

Context  The quality of a process can only be tested against its agreed objectives. Editorial peer-review is widely used, yet there appears to be little agreement about how to measure its effects or processes.

Methods  To identify outcome measures used to assess editorial peer review as performed by biomedical journals, we analyzed studies identified from 2 systematic reviews that measured the effects of editorial peer review on the quality of the output (ie, published articles) or of the process itself (eg, reviewers' comments).

Results  Ten studies used a variety of instruments to assess the quality of articles that had undergone peer review. Only 1, nonrandomized study compared the quality of articles published in peer-reviewed and non–peer-reviewed journals. The others measured the effects of variations in the peer-review process or used a before-and-after design to measure the effects of standard peer review on accepted articles. Eighteen studies measured the quality of reviewers' reports under different conditions such as blinding or after training. One study compared the time and cost of different review processes.

Conclusions  Until we have properly defined the objectives of peer-review, it will remain almost impossible to assess or improve its effectiveness. The research needed to understand the broader effects of peer review poses many methodologic problems and would require the cooperation of many parts of the scientific community.

A fundamental tenet of all scientific and scholarly work is that every aspect of it must be subjected to critical appraisal; only those findings and principles that withstand such appraisal become established. Although much appraisal occurs as work is in progress (and some after it has been published), work that is submitted for publication undergoes critical appraisal, known as peer review, as part of the editorial process.

Editorial peer review is therefore an extension of the basic principles of science and scholarship. It has existed for more than 200 years1 and has achieved near universal application for assessing research reports before publication. Despite its wide acceptance, peer review has been subjected to a variety of criticisms,2 and, indeed, surprisingly little is known about its effects on the quality and utility of published information,3 much less about its beneficial or adverse social, psychological, or financial effects.

The same can be said about critical appraisal in scholarly work generally. However, uncertainty about the effects of peer review is not simply a matter for academic concern. Clinical decisions must be made on the best available evidence, usually systematic reviews and meta-analyses, but these can be misleading if they are based on invalid, incomplete, inaccurate, or duplicate information, or if the review articles themselves are poorly done. Any process affecting the assessment and dissemination of clinical evidence therefore has a direct bearing on patient care.

In this article we review the criteria used by others to measure the effects of peer review, consider what this implies about the aims of peer review, especially in relation to clinical evidence, and suggest ways in which its effects might be measured more rigorously.

METHODS

In 2 systematic reviews of the effects of editorial review and technical editing, we identified published articles that evaluated the peer review process and identified the criteria used in those studies to evaluate peer review. Our first review considered processes that occur between submission of a paper and a decision on publication; the second considered the processes that occur between acceptance and publication. Both systematic reviews were performed using Cochrane methodology. The methods and primary findings of the reviews are published elsewhere.3-6

RESULTS

We included 19 studies in our systematic review of the effects of peer review; these are described separately.3 Two studies were identified from our review on technical editing since they included information about changes that occur to papers between submission and acceptance or did not distinguish the preacceptance and postacceptance processes.5 We identified 8 other studies that measured the quality of papers or reviews but did not compare peer-review processes. The outcome measures used in these studies are shown in online Table A (available in PDF format ).7-35 Brief descriptions of the 8 studies not described in companion papers are also shown.

Ten studies measured various aspects of the quality of papers that had undergone peer review. Only one study8 compared the quality of articles published in peer-reviewed and non–peer-reviewed journals, but it used a nonrandomized design and the findings may have been confounded by other factors, such as differences in the quality of studies submitted to the different journals. The other studies measured the effects of variations in the peer-review process or used a before-and-after design to measure the effects of standard peer review in a particular journal. A major limitation of most studies is that they assessed the quality only of accepted papers, and measured the changes that took place between submission or acceptance and publication. Only the studies of economic submissions12 and statistical quality9 included papers that were rejected by the target journal. Of the 10 studies, only 213-14 used journal readers to assess the quality of papers, the others were based on editors' assessment. Virtually every study used its own rating instrument. These included between 7 and 36 items rated using 2- to 10-point scales. Most scales appeared to be unvalidated but, in 1 case when the scoring system was tested, it was found to have low reliability.10 The 2 studies of readability7, 15 used published scales that have not been validated for use in this setting.

Eighteen studies measured the quality of reviewers' reports under different conditions such as blinding or training.17-34 Three of these included an assessment by the authors whose work had been reviewed.27-28,33 The others used editors to judge the quality of reviews. Instruments used to rate review quality ranged from 2- to 10-item scales, most were rated using a 2- to 5-point system, but 1 used a visual analog scale, and 1 used ratings from 1 to 100. One of these scales had a published validation.36 Four studies examined the amount of agreement among reviewers,30-31 between reviewers and editors19 or between reviewers and readers.27 One study compared the time and cost of different review processes.35

The aspects of reviews most commonly rated were those relating to the methodological soundness of the reviewed study, its importance, originality and presentation. Several studies also attempted to assess the tone or courteousness of the review. One study measured the number of errors that a review detected.25 Three considered the speed of review.11, 23, 35 Aspects of articles examined were more wide ranging, including quality assessments of each section (introduction, methods, results, and discussion) and also subjective measures of the article's relevance, overall quality, readability, and comprehensibility.

COMMENT

Analysis of published studies on editorial peer review reveals the diversity of study questions and end points. This suggests that peer review is expected to have a wide range of effects, that its true effects have not been determined, or that the aims of peer review have not been identified properly. Our review also showed that the term peer review is used to describe a number of processes, most commonly gathering opinions from external experts, but also review by in-house editors, and that it may not always be possible to make a clear distinction between peer review and technical editing.

Based on our reviews of studies and the larger literature of opinion about peer review, we suggest that its aims may be categorized as (1) selecting submissions for publication (by a particular journal) and rejecting those with irrelevant, trivial, weak, misleading, or potentially harmful content, and (2) improving the clarity, transparency, accuracy, and utility of the selected submissions.

The selection of submissions depends on assessment of their quality and how well they match the journal's scope and aims. The quality criteria may be categorized as the importance, relevance, usefulness, and methodological and ethical soundness of the research and the clarity, accuracy, and completeness of the report.

The main purpose of medical research is to improve health or the delivery of health care. If peer review is regarded as one stage in this process, it might be expected to have measurable effects on health status. However, outcomes such as this are difficult to assess because they are affected by numerous other factors. Surrogate outcomes, such as process measures, are much easier to assess, but may not provide a reliable measure of more meaningful indicators of success.

In Table 1 we summarize the possible effects of editorial peer review on the quality of reports of clinical research, provide definitions for these, and suggest indicators that could be used to assess them.

Table. Indicators of the Quality of the Output of Editorial Peer-Review of Clinical Studies and Methods to Assess Them

Research so far has measured only certain aspects of peer review, largely focused on variations in the processes used rather than comparing the effects of peer review with those of other systems. Given the resources spent on peer review and the importance placed on it, this is unsatisfactory even though it may reflect the fact that no part of the process of scientific evaluation has been rigorously studied. Ironically, however, the fact that peer review is so well entrenched makes it harder to study, since scientists and editors may be unwilling to take part in randomized studies if they believe that the current system serves them well. How, therefore, should the scientific community proceed in its evaluation of peer review?

Ideally, this would be assessed by large-scale, long-term research into 2 cohorts of studies, randomized to undergo either peer review or an alternative method of assessment, such as random selection for publication. Given the complexity of factors at play a multivariate analysis may be necessary. However, researchers might not be prepared to accept such randomization, and knowledge of the trial could bias the results. It would be important to ensure that both groups of studies were of equal average quality—for example by examining submissions to a single journal. The follow-up period would have to be lengthy to allow for changes in health status or health care delivery to occur as a consequence of publication.

Another difficulty in studying the effects of peer review is that the quality components of a manuscript are often interlinked, and it is meaningless to study them in isolation. For example, a methodologically flawed study or incomplete report will detract from the publication's usefulness.

If the scientific community could agree on the objectives of peer review and collaborate in the assessment of its effects, we could start identifying some of the practices for which evidence of effect is better than that of controls. We propose that the following questions should be tested collaboratively across journal settings:

  • Does peer review identify submissions of higher quality than other selection methods (or chance)?

  • Does peer review improve the clarity, transparency, accuracy, and utility of published papers meaningfully beyond that of the submitted version?

This would involve assessing the quality of both published and unpublished submissions using well-validated instruments. We suggest that measures of quality should include the importance, relevance, usefulness, and methodological and ethical soundness of clinical studies. Such research would also involve tracking submissions between journals since submissions rejected by one journal often go on to be published by another.37

Although there is some evidence that peer review and editing improves articles between submission and publication, the effectiveness of its selection and filtering functions remains virtually untested.3, 5 Yet, despite this lack of evidence, peer review is well established in most academic disciplines. It is therefore possible that peer review is retained for different reasons than those stated. For example, it may serve to protect journals' reputations or to provide acceptability for commercially-funded studies. Using unpaid reviewers probably reduces some aspects of the work of in-house editors, although it also carries administrative costs. Peer review is also so well established that it has become part of the system for assessing academic merit in appointments and promotions. The broader functions of peer review, including its social and psychological effects such as increasing the credibility and prestige of published work, are rarely acknowledged and have not, to our knowledge, ever been seriously studied.

CONCLUSIONS

Given the widespread use of peer review, it is surprising that so little is known of its aims or effects although the same might be said of several other, well-established processes of scientific appraisal. The financial costs of peer review to the scientific community are difficult to estimate but should not be ignored.38 There is also anecdotal evidence that peer review has shortcomings and may even have harmful effects.2 Yet, until we have properly defined the aims of peer review it will remain almost impossible to estimate the effectiveness of the process or to improve it systematically.

The research needed to evaluate the effects of peer review poses many methodological problems and would require the cooperation of large numbers of authors and editors. The growth of electronic publishing has increased the urgency of establishing an effective and efficient system for evaluating scientific information but may also offer opportunities to explore alternatives to the current peer-review system.39 Until such research is undertaken, the ability of peer review to improve the quality of published research and, ultimately, improve the dissemination of reliable health information will remain uncertain.

Acknowledgments

Author Contributions: Study concept and design: Jefferson.

Acquisition of data: Jefferson, Wager.

Analysis and interpretation of data: Jefferson, Wager, Davidoff.

Drafting of the manuscript: Wager, Davidoff.

Critical revision of the manuscript for important intellectual content: Jefferson, Davidoff.

Study supervision: Jefferson

Disclosure: Dr Tom Jefferson co-edited the book Peer Review in Health Sciences, Ms Wager wrote 2 chapters in the book, and they are co-authors of a book entitled How to Survive Peer Review. Dr Davidoff was the editor of a peer-reviewed journal. All authors are active peer reviewers and have published articles in peer-reviewed journals.

Acknowledgment: We thank Philip Alderson and Philippa Middleton for their contributions to the original systematic reviews, Iain Chalmers and John Overbeke for helpful comments on early drafts of the manuscript, and Fiona Godlee for her review of the submitted version.

Corresponding Author and Reprints: Ms Liz Wager, Sideview, Station Road, Princes Risborough, HP27 9DE, UK, (e-mail: liz{at}sideview.demon.co.uk)

References

  1. 1.
  2. 2.
  3. 3.
  4. 4.
  5. 5.
  6. 6.
  7. 7.
  8. 8.
  9. 9.
  10. 10.
  11. 11.
  12. 12.
  13. 13.
  14. 14.
  15. 15.
  16. 16.
  17. 17.
  18. 18.
  19. 19.
  20. 20.
  21. 21.
  22. 22.
  23. 23.
  24. 24.
  25. 25.
  26. 26.
  27. 27.
  28. 28.
  29. 29.
  30. 30.
  31. 31.
  32. 32.
  33. 33.
  34. 34.
  35. 35.
  36. 36.
  37. 37.
  38. 38.
  39. 39.
« Previous | Next Article »Table of Contents

More in JAMA & Archives Journals