Evaluating peer reviews. Pilot testing of a grading instrument
I. D. Feurer, G. J. Becker, D. Picus, E. Ramirez, M. D. Darcy and M. E. Hicks
Journal of Vascular and Interventional Radiology, Nashville, TN.
OBJECTIVE--To measure the reliability and preliminary validity of a grading
instrument for editors to evaluate the quality of peer reviews. DESIGN--The
consecutive sample design included 53 reviews of 23 manuscripts. Reviews
were systematically assigned to interrater reliability (n = 41; power
greater than 0.90 to detect a difference of greater than one point) and
preliminary criterion-related validity (n = 12) subsamples. Content
validity was closely examined. SETTING--Nonclinical. PARTICIPANTS--Three
graders evaluated reliability. One individual examined content validity and
two editors tested preliminary criterion-related validity. INTERVENTION
(INSTRUMENT)--Attributes reflecting two basic dimensions, review content
and format, were identified and scored (values are possible points/percent
contribution): timeliness, 3/21%; grade sheet, 1/7%; etiquette, 1/7%;
sectional narratives, 3/21%; citations, 2/14%; narrative summary, 2/14%;
and insights, 2/14%. A scoring guide was provided. MAIN OUTCOME
MEASURES--Statistical analyses used to test the interrater reliability of
the total score included the intraclass correlation coefficient and
analysis of variance with the expectation to uphold the null hypothesis.
Kendall's coefficient of concordance was used to test preliminary
criterion-related validity. RESULTS--The intraclass correlation coefficient
was .84 (P < .001) and a lack of difference between mean scores was
demonstrated by analysis of variance (P = .46). Content validity was
confirmed and preliminary criterion-related validity was indicated
(Kendall's coefficient of concordance = .94, P = .038). CONCLUSIONS--The
instrument is reliable. Content validation has been completed, and further
criterion-related validation is warranted.