You are seeing this message because your Web browser does not support basic Web standards. Find out more about why this message is appearing and what you can do to make your experience on this site better.


ABOUT JAMA
Advanced Search

Welcome   | My Account | E-mail Alerts | Access Rights | Sign In


  Vol. 296 No. 9, September 6, 2006 TABLE OF CONTENTS
  JAMA
  •  Online Features
  Review
 This Article
 •Abstract
 •PDF
 •Send to a friend
 • Save in My Folder
 •Save to citation manager
 •Permissions
 Citing Articles
 •Citation map
 •Citing articles on HighWire
 •Citing articles on ISI (13)
 •Contact me when this article is cited
 Related Content
 •Related letters
 •Related article
 •Similar articles in JAMA
 Topic Collections
 •Evidence-Based Medicine
 •Review
 •Alert me on articles by topic

Instruments for Evaluating Education in Evidence-Based Practice

A Systematic Review

Terrence Shaneyfelt, MD, MPH; Karyn D. Baum, MD, MSEd; Douglas Bell, MD, PhD; David Feldstein, MD; Thomas K. Houston, MD, MPH; Scott Kaatz, DO; Chad Whelan, MD; Michael Green, MD, MSc

JAMA. 2006;296:1116-1127.

ABSTRACT

Context  Evidence-based practice (EBP) is the integration of the best research evidence with patients' values and clinical circumstances in clinical decision making. Teaching of EBP should be evaluated and guided by evidence of its own effectiveness.

Objective  To appraise, summarize, and describe currently available EBP teaching evaluation instruments.

Data Sources and Study Selection  We searched the MEDLINE, EMBASE, CINAHL, HAPI, and ERIC databases; reference lists of retrieved articles; EBP Internet sites; and 8 education journals from 1980 through April 2006. For inclusion, studies had to report an instrument evaluating EBP, contain sufficient description to permit analysis, and present quantitative results of administering the instrument.

Data Extraction  Two raters independently abstracted information on the development, format, learner levels, evaluation domains, feasibility, reliability, and validity of the EBP evaluation instruments from each article. We defined 3 levels of instruments based on the type, extent, methods, and results of psychometric testing and suitability for different evaluation purposes.

Data Synthesis  Of 347 articles identified, 115 were included, representing 104 unique instruments. The instruments were most commonly administered to medical students and postgraduate trainees and evaluated EBP skills. Among EBP skills, acquiring evidence and appraising evidence were most commonly evaluated, but newer instruments evaluated asking answerable questions and applying evidence to individual patients. Most behavior instruments measured the performance of EBP steps in practice but newer instruments documented the performance of evidence-based clinical maneuvers or patient-level outcomes. At least 1 type of validity evidence was demonstrated for 53% of instruments, but 3 or more types of validity evidence were established for only 10%. High-quality instruments were identified for evaluating the EBP competence of individual trainees, determining the effectiveness of EBP curricula, and assessing EBP behaviors with objective outcome measures.

Conclusions  Instruments with reasonable validity are available for evaluating some domains of EBP and may be targeted to different evaluation needs. Further development and testing is required to evaluate EBP attitudes, behaviors, and more recently articulated EBP skills.



INTRODUCTION
 Jump to Section
 •Top
 •Introduction
 •Methods
 •Results
 •Comment
 •Author information
 •References

Physicians often fail to implement clinical maneuvers that have established efficacy.1-2 In response, professional organizations have called for increased training in evidence-based practice (EBP) for all health care professions and at all levels of education.3-6 Evidence-based practice may be defined as the integration of the best research evidence with patients' values and clinical circumstances in clinical decision making.7

As educators implement EBP training, they need instruments to evaluate the programmatic impact of new curricula and to document the competence of individual trainees. Prior systematic reviews of EBP training summarized the effectiveness of educational interventions,8-13 but only 1 that was conducted in 1999 also included a detailed analysis of evaluation instruments.8 Although there are multiple components of EBP (Box), as of 1998 the published instruments focused on critical appraisal to the exclusion of other EBP steps, measured EBP knowledge and skills but did not objectively document behaviors in actual practice, and often lacked established validity and reliability.8 In 2002, Hatala and Guyatt17 noted that "ironically, if one were to develop guidelines for how to teach [evidence-based medicine] based on these results, they would be based on the lowest level of evidence." Since then, instruments have been developed to try to address the deficits in evaluation. In addition, EBP has become more sophisticated, requiring additional skills. For example, in identifying evidence, practitioners must be able to appraise, select among, and search emerging electronic secondary "preappraised" information resources.18 In applying evidence to decision making, they must explicitly integrate patient preferences and clinical context.7


Box. Definitions of Variables and Terminology Used in This Study

Description: Format of instrument; choices include written or Web-based test, self-report survey, OSCE with standardized patients, other OSCE, portfolio, audiotape of teaching sessions, record audit, chart-stimulated recall, direct observation (clinical evaluation exercise), rating scale, and other
Development: Free-text description of development
EBP domains
Knowledge: Knowledge about EBP
Skills: EBP skills are distinguished from knowledge by participants applying their knowledge by performing EBP steps in some type of clinical scenario, such as with a standardized patient, written case, computer simulation, OSCE, or direct observation.
       Ask: Converting the need for information (about prevention, diagnosis, prognosis, therapy, causation, etc) into an answerable question
       Acquire: Tracking down the best evidence with which to answer that question
       Appraise: Critically appraising that evidence for its validity (closeness to the truth), impact (size of the effect), and applicability (usefulness in one's own clinical practice)
       Apply: Applying the evidence in clinical decision making (includes both individualizing the evidence [such as recasting number needed to treat for the patient's baseline risk] and integrating the evidence with the patient's preferences and particular clinical circumstances)
Attitude: Attitudes toward EBP
Behaviors: Actual performance of EBP in practice
       Enacting EBP steps in practice: Actually enacting EBP steps (such as identifying clinical questions) in the course of patient care activities
       Performing evidence-based clinical maneuvers: Performing evidence-based maneuvers in trainee's actual practice, such as prescribing angiotensin-converting enzyme inhibitors for congestive heart failure with depressed left ventricular function or checking hemoglobin A1c in patients with diabetes
       Affecting patient outcomes: Trainee's patients experience improved or favorable outcomes, such as lower blood pressure

Feasibility: Documentation of some measure of ease of implementation; choices include time required to administer instrument, time required to score instrument, expertise required to score instrument, cost to administer and score, administrative support required, other
Interrater reliability: Statistical test ({kappa} or correlation coefficient) of the agreement among 2 or more raters' scoring of the responses. Applied only to instruments that required some level of judgment to score, such as free-text responses. In contrast, reliability testing was deemed not applicable for instruments that required no rater judgment to score, such as multiple-choice tests. Credited as "tested" if a quantitative assessment was done. Credited as "established" if the corresponding statistical test was significant.
Participants (number, discipline, and level): Participants in whom the instrument was tested; options include undergraduate medical students (year), residents (specialty), fellows (specialty), faculty physicians, practicing physicians, nurses in training, practicing nurses, allied health professionals, and other health care professionals
Validity: For all types except content validity, credited as "tested" if a quantitative assessment of a particular type of validity was done; credited as "established" if the corresponding statistical test was significant*
Based on content: External review of the instrument by experts in EBP
Based on internal structure
       Internal consistency: Statistical test to establish the relationship between items within either the entire instrument or a prespecified section of the instrument
       Dimensionality: Factor analysis to determine if the instrument measured a unified latent construct or, if specified in advance, discrete subthemes
Based on relationship to other variables
       Responsive: Ability to detect the impact of an EBP educational intervention; requires statistical comparison of same participant's scores before and after an EBP educational intervention
       Discriminative: Ability to discriminate between participants with different levels of EBP expertise; requires statistical comparison of instrument scores among participants of different levels of EBP ability
       Criterion: Statistical test of the relationship between the instrument scores and participants' scores on another instrument with established psychometric properties

Abbreviations: EBP, evidence-based practice; OSCE, observed structured clinical examination.

*Classification of validity is based on the Standards for Educational and Psychological Testing of the Joint Committee on Standards for Educational and Psychological Testing of the American Educational Research Association, the American Psychological Association, and the National Council on Measurement in Education14 and other recommendations.15-16

RETURN TO TEXT


Because of these changes, we performed a systematic review of EBP evaluation instruments and strategies, documenting their development, format, learner levels, EBP evaluation domains, psychometric properties, and feasibility. Our 2 goals were to provide guidance for EBP educators by highlighting preferred instruments based on evaluation needs and to make recommendations for EBP education research based on the current state of the EBP evaluation science.


METHODS
 Jump to Section
 •Top
 •Introduction
 •Methods
 •Results
 •Comment
 •Author information
 •References

Identification of Studies

To identify evaluation instruments, we searched the MEDLINE, EMBASE, Cumulative Index to Nursing and Allied Health Literature (CINAHL), Health and Psychosocial Instruments (HAPI), and Educational Resources Information Center (ERIC) databases from 1980 through April 2006. Search terms included evidence-based medicine; critical appraisal; clinical epidemiology; journal club; clinical question; medical informatics; medical informatics applications; information storage and retrieval; databases, bibliographic; integrated advanced information management systems; MEDLARS; education; clinical trials; controlled clinical trials; multicenter studies; and program evaluation. We also manually searched the reference lists of retrieved articles, tables of contents of 8 major medical education journals (Academic Medicine, Medical Education, Teaching and Learning in Medicine, Medical Teacher, Advances in Health Sciences Education, Medical Education OnLine, Journal of Continuing Education in the Health Professions, and BioMed Central Medical Education), several EBP Internet sites,4, 19-23 and the authors' personal files. The Internet sites were chosen based on author experience as loci that might contain instruments not identified by other strategies.

We included studies that (1) reported an instrument or strategy that evaluated EBP knowledge, skills, attitudes, behaviors, or patient outcomes; (2) contained a sufficient description of the instrument or strategy to permit analysis; and (3) presented results of testing the performance of the instrument or strategy. We did not exclude any articles based on study design. Given the breadth of our review and the large number of articles initially captured by our search strategy, it was not feasible to translate the non–English-language articles to determine their suitability for inclusion. Thus, we limited our analysis to studies published in English. For 1 study, we contacted the authors for clarification. Studies that reported only satisfaction with a curriculum were excluded. Two authors (T.S. and M.G.) independently evaluated each article in the preliminary list for inclusion, and disagreements were resolved by consensus.

Data Extraction

We developed and piloted a standardized data form to abstract information from the included articles. A randomly assigned set of 2 raters, representing all permutations of the 6 raters, independently abstracted information from each of the included articles. In this process and in the article inclusion process, raters were not blinded to any portion of articles. After submitting their original abstraction forms to a central location, the pairs of raters resolved their differences by consensus. The abstraction variables included description and development of the EBP evaluation instrument; number, discipline, and training levels of participants; EBP domains evaluated; feasibility assessment; and type, method, and results of validity and reliability assessment14 (see Box for definitions). We determined interrater reliability for the article inclusion process and for the data abstraction process based on data from all included articles. {kappa} Statistics were calculated and interpreted according to the guidelines of Landis and Koch.24

Quality Categorization of Studies

We did not use restrictive inclusion criteria related to study quality. However, we did define 3 levels of instruments, based on (1) the type, extent, methods, and results of psychometric testing and (2) suitability for different evaluation purposes. For use in the summative evaluation of individual trainees, we identified instruments with the most robust psychometric properties generally and, in particular, the ability to distinguish between participants of different levels of EBP experience or expertise (level 1). These instruments had to be supported by established interrater reliability (if applicable), objective (non–self-reported) outcome measures, and multiple (≥3) types of established validity evidence (including evidence of discriminative validity).

For use in evaluating the programmatic effectiveness of an EBP educational intervention, we identified a second group of instruments supported by established interrater reliability (if applicable) and "strong evidence" of responsive validity, established by studies with a randomized controlled trial or pre-post controlled trial design and an objective (non–self-reported) outcome measure (level 2). These instruments generally have less robust psychometric properties than level 1 instruments, which must be supported by 3 or more different types of validity evidence. However, level 2 instruments must be supported by higher-level ("strong") evidence for responsive validity in particular. The criteria for "strong evidence" are stricter than the definition of responsive validity (Box) used for the general classifications in this review. Instruments meeting all of the criteria for level 1 may also have "strong evidence" for responsive validity (as indicated in the table footnotes) but this is not required for this designation.

Finally, considering the evaluation of EBP behaviors, we anticipated that few of the instruments would meet either of the preceding thresholds. Therefore, we used a single criterion of an objective (non–self-reported) outcome to distinguish a group of relatively high-quality measures in this domain (level 3).

In cases in which an instrument included 2 distinct pieces (with different formats) intended to evaluate 2 distinct EBP domains, we applied the quality criteria separately to each. For descriptive purposes, we included both subinstruments in the tables and indicated if one or both met the psychometric threshold.

We calculated descriptive statistics for the characteristics and psychometric properties of the evaluation instruments. Analyses were performed using Stata Special Edition version 9.0 (Stata Corp, College Station, Tex).


RESULTS
 Jump to Section
 •Top
 •Introduction
 •Methods
 •Results
 •Comment
 •Author information
 •References

Inclusion criteria were met by 115 articles25-60, 61-100, 101-140 representing 104 unique assessment strategies (8 instruments were used in >1 study, and 1 study was reported in 2 articles) (Figure). There was substantial interrater agreement for the article inclusion process ({kappa} = 0.68; 95% confidence interval [CI], 0.47-0.89), as well as for the assessments of validity based on content ({kappa} = 0.70; 95% CI, 0.49-0.91) and based on internal structure ({kappa} = 0.71; 95% CI, 0.47-0.95). There was moderate agreement on the assessment of validity based on relationships to other variables ({kappa} = 0.52; 95% CI, 0.35-0.70).


Figure 600131
View larger version (111K):
[in this window]
[in a new window]
[as a PowerPoint slide]
 
Figure. Search for and Selection of Articles for Review

EBP indicates evidence-based practice.
*Articles could be found in more than 1 database (see "Methods" section of text for details of search strategies, databases, and names of the 8 journals whose tables of contents were searched).
{dagger}Reasons for exclusion not mutually exclusive.


Characteristics of EBP Evaluation Instruments

The participants' health care professions discipline and training level and the evaluated EBP domains are shown in Table 1 (see Box for definitions). The majority of instruments targeted students and postgraduate trainees, while nonphysicians were rarely evaluated. The instruments most commonly evaluated EBP skills (57%), followed by knowledge and behaviors (both 38%), followed by attitudes (26%). Among the EBP skills, critical appraisal of evidence was included in the greatest proportion of instruments.


View this table:
[in this window]
[in a new window]
[as a PowerPoint slide]
 
Table 1. Characteristics of EBP Evaluation Instruments*


Thirty (86%) of the 35 evaluation approaches for the "acquire" step related exclusively to skills in searching MEDLINE or similar bibliographic databases for original articles. Of the 5 instruments considering alternative electronic information sources, 4 specifically evaluated awareness, preference for, or skills in searching specific secondary evidence-based medical information resources (including the Cochrane Library, Database of Abstracts of Reviews of Effectiveness, ACP Journal Club, and Clinical Evidence)25, 33, 43, 135 while the remaining one42 merely referred to "Web sites." Similarly, among the instruments evaluating the "apply" step, only 5 (38%) of 13 went beyond the ability to consider research evidence to also assess the ability to integrate the evidence with the patient's particular clinical context and preferences. Evaluation approaches included standardized patient ratings of students explaining a therapeutic decision after reviewing research evidence,38-39 scoring of residents' free-text justification of applying results of a study to a "paper case,"28 and documenting decision making before and after access to a research abstract41 or MEDLINE search.49

Most of the instruments evaluating EBP behaviors measured the use of EBP steps in practice. Of these, only 6 (18%) of 34 used objective outcome measures31, 52-54,113, 137 with the remaining relying on retrospective self-reports. Only 3 instruments measured the performance of evidence-based clinical maneuvers in practice,57-58,140 and 2 evaluated the effect of an EBP teaching intervention on patient outcomes.57-58

Feasibility and Psychometric Testing

Feasibility of implementation was reported for 19 (18.3%) of the 104 instruments. Among these, 13 reported the time required to administer or score the instrument,40, 45, 60-61,64, 66, 69, 72, 88, 99, 113, 123, 126 4 described the expertise required for scoring,37, 47, 65, 72 and 4 estimated the financial costs of implementation.28, 54, 100, 114 Investigators performed interrater reliability testing on 21 (41.2%) of the 51 instruments for which it was appropriate, most commonly using {kappa} statistics and correlation coefficients.

Investigators conducted at least 1 type of validity testing in 64% and established it in 53% of the 104 EBP evaluation instruments (Table 2). However, multiple (≥3) types of validity evidence were established for only 10% of the instruments. Investigators most commonly sought (57%) and established (44%) evidence for validity based on relationships to other variables. Among these, responsive validity was most commonly tested and established, followed by discriminative and criterion validity.


View this table:
[in this window]
[in a new window]
[as a PowerPoint slide]
 
Table 2. Psychometric Characteristics of Evidence-Based Practice Evaluation Instruments*


Eight instruments were used in subsequent studies, either for furthervalidation or to evaluate programmatic impact of an EBP curriculum. One instrument60 was used in 3 later studies61-63; 1 instrument140 was used in 2 later studies55-56; and 6 instruments33, 41, 59, 64, 133, 137 were used in 1 subsequent study each.32, 65, 87, 134, 136, 139

Quality Categorization of Instruments

Level 1 Instruments. Table 3 summarizes the EBP evaluation domains, format, and psychometric properties of the instruments supported by established interrater reliability (if applicable), objective (non–self-reported) outcome measures, and multiple (≥3) types of established validity evidence (including evidence for discriminative validity). These instruments are distinguished by the ability to discriminate between different levels of expertise or performance and are therefore suited to document the competence of individual trainees. Furthermore, the robust psychometric properties in general support their use in formative or summative evaluations. The Fresno Test25 and Berlin Questionnaire59 represent the only instruments that evaluate all 4 EBP steps. In taking the Fresno Test, trainees perform realistic EBP tasks, demonstrating applied knowledge and skills. However, more time and expertise are required to grade this instrument. The multiple-choice format of the Berlin Questionnaire restricts assessment to EBP applied knowledge but also makes it more feasible to implement. The other instruments in Table 3 evaluate a narrower range of EBP.


View this table:
[in this window]
[in a new window]
[as a PowerPoint slide]
 
Table 3. Level 1 Instruments (Individual Trainee Formative or Summative EBP Evaluation)*


Level 2 Instruments. In addition to 4 of the instruments in Table 3,26, 59-60,64 9 instruments fulfilled the criteria for strong evidence of responsive validity (Table 4). These are appropriate to consider for evaluating programmatic (rather than individual) impact of EBP interventions. Six evaluated EBP knowledge and skills.27-31,37 Among these, only one27 measured all 4 EBP steps. Residents articulated clinical questions, conducted MEDLINE searches, performed calculations, and answered free-text questions about critical appraisal and application of the evidence. In this study, gains in skills persisted on retesting at 6 months, indicating both concurrent and predictive responsive validity. The instrument described by Green and Ellis28 required free-text responses about the appraisal of a redacted journal article and application of the results to a patient. The 3 multiple-choice tests29-31 detected improvements in trainees' EBP knowledge. However, in 2 of the studies, this gain did not translate into improvements in critical appraisal skills as measured with a test article29 or the incorporation of literature into admission notes.30 Finally, in Villanueva et al,37 librarians identified elements of the patient-intervention-comparison-outcome (PICO) format141 in clinical question requests, awarding 1 point for each included element. In a randomized controlled trial of instruction in clinical question construction, this instrument detected improvements in this skill.


View this table:
[in this window]
[in a new window]
[as a PowerPoint slide]
 
Table 4. Level 2 Instruments (Programmatic EBP Curriculum Evaluation)*


Four EBP behavior instruments met the criteria for strong evidence of responsive validity and an objective outcome measure.31, 52, 57, 113 Among these, 3 measured the enactment of EBP steps in practice.31, 52, 113 Ross and Verdieck31 analyzed audiotapes of resident-faculty interactions, looking for phrases related to literature searching, clinical epidemiology, or critical appraisal. Family practice residents' "evidence-based medicine utterances" increased from 0.21 per hour to 2.9 per hour after an educational intervention. Stevermer et al113 questioned residents about their awareness and knowledge of findings in recent journal articles relevant to primary care practice. Residents exposed to academic detailing recalled more articles and correctly answered more questions about them. Focusing on the "acquire" step, Cabell et al52 electronically captured trainees' searching behaviors, including number of log-ons to databases, searching volume, abstracts or articles viewed, and time spent searching. These measures were responsive to an intervention including a 1-hour didactic session, use of well-built clinical question cards, and practical sessions in clinical question building. One EBP behavior instrument in this category evaluated EBP practice performance and patient outcomes using medical record audits. Langham et al57 evaluated the impact of an EBP curriculum, documenting improvements in practicing physicians' documentation, clinical interventions, and patient outcomes related to cardiovascular risk factors.

Although 3 controlled studies demonstrated the responsive validity of having librarians score MEDLINE search strategies34, 36 or clinical question formulations43 according to predetermined criteria, these did not meet the criteria for interrater reliability testing.

Level 3 Instruments. In addition to the 5 EBP behavior instruments included in levels 1 and 2,31, 52, 57, 113, 137 4 others used objective outcome measures but did not demonstrate strong evidence of responsive validity or multiple sources of validity evidence (Table 5).53-54,58, 140 Two of these consisted of electronic learning portfolios that allowed trainees to document their enactment of EBP steps.53-54


View this table:
[in this window]
[in a new window]
[as a PowerPoint slide]
 
Table 5. Level 3 Instruments (EBP Behavior Evalution)*


The remaining 2 instruments measured the performance of evidence-based maneuvers or patient outcomes. Ellis et al140 devised a reliable method for determining the primary therapeutic intervention chosen by a practitioner and classifying the quality of evidence supporting it. In this scheme, interventions are (1) supported by individual or systematic reviews of randomized controlled trials, (2) supported by "convincing nonexperimental evidence," or (3) lacking substantial evidence. This instrument was subsequently used in 2 pre-post (but uncontrolled) studies of EBP educational interventions.55-56 Finally, Epling et al58 performed a record audit before and after residents developed and implemented a diabetes clinical guideline.


COMMENT
 Jump to Section
 •Top
 •Introduction
 •Methods
 •Results
 •Comment
 •Author information
 •References

We found that instruments used to evaluate EBP were most commonly administered to medical students and postgraduate trainees and evaluated skills in searching for and appraising the evidence. At least 1 type of validity evidence was demonstrated in 53% of instruments (most commonly based on relationship to other variables), but multiple types of validity evidence were established for very few.

Educators need instruments to document the competence of individual trainees and to evaluate the programmatic impact of new curricula. Given the deficits of instruments previously available, it is not surprising that in 2000 only a minority of North American internal medicine programs objectively evaluated the effectiveness of their EBP curricula.143 Currently, there is a much wider selection of instruments, some of which are supported by more robust psychometric testing. While, like their predecessors, the currently available instruments most commonly evaluate critical appraisal, many more also measure the other important EBP steps. Among the instruments evaluating EBP behaviors, most continue to measure the performance of EBP steps by self-report. However, new instruments objectively document EBP steps and document the performance of evidence-based clinical maneuvers.

The choice of an EBP evaluation instrument should be guided by the purpose of the evaluation and the EBP domains of interest. The instruments in Table 3 are appropriate for evaluating the competence of individual trainees. Although they have reasonably strong psychometric properties, we believe that in the absence of well-defined passing standards for different learner levels, they should not yet be used for high-stakes evaluations, such as academic promotion or certification.

To evaluate the programmatic impact of EBP educational interventions, educators may turn to instruments with strong evidence of responsive validity (Table 4) and whose evaluation domains correspond with the objectives of their curricula. A conceptual framework for evaluating this aspect of EBP teaching has been developed by the Society of General Internal Medicine Evidence-Based Medicine Task Force.144 It recommends considering the learners (including their level and particular needs), the intervention (including the curriculum objectives, intensity, delivery method, and targeted EBP steps), and the outcomes (including knowledge, skills, attitudes, behaviors, and patient-level outcomes). With the exception of the instruments also included in Table 3, educators should use caution in using these instruments to assess the EBP competence of individual trainees because they were developed to evaluate the effectiveness of specific curricula and lack evidence for discriminative validity.

Only 5 EBP behavior instruments met the 2 highest quality thresholds in our analysis. Notwithstanding the psychometric limitations, however, it is important to document that trainees apply their EBP skills in actual practice. Our review identified several studies that documented EBP behaviors through retrospective self-report. However, this approach may be extremely biased, as physicians tend to underestimate their information needs and overestimate the degree of their pursuit.145 We recommend that educators restrict their selection of instruments to those with objectively measured outcomes.

Regarding the enactment of EBP steps in practice, analyzing audiotapes of teaching interactions31 and electronically capturing searching behavior52 showed responsive validity. However, we believe that these approaches fail to capture the pursuit and application of information in response to particular clinical questions, rendering them poor surrogates for EBP behaviors. Evidence-based practice learning portfolios,53-54 which serve as both an evaluation strategy and an educational intervention, may represent the most promising approach to document the performance of EBP steps. However, their use in any assessment with more serious consequences than formative evaluation must await more rigorous psychometric testing.

In addition to documenting the performance of EBP steps, educators are charged with documenting behavioral outcomes of educational interventions further downstream, such as performance of evidence-based clinical maneuvers and patient-level outcomes.146 The reliable approach of rating the level of evidence supporting clinical interventions has been widely used.140 In 2 studies, this approach detected changes following an EBP curriculum55 or supplying physicians with a literature search56 but, in the absence of controlled studies, did not meet our threshold for strong evidence of responsive validity. This system appears most suited to evaluating changes in EBP performance after an educational intervention or over time. To use it to document an absolute threshold of performance would require knowing the "denominator" of evidence-based therapeutic options for each trainee's set of patients, making it impractical on a programmatic scale. The performance of evidence-based maneuvers may also be documented by auditing records for adherence to evidence-based guidelines or quality indicators. Hardly a new development, this type of audit is commonly performed as part of internal quality initiatives or external reviews. Our review found 2 examples of quality audits used to evaluate the impact of EBP training.57-58

Assessing EBP attitudes may uncover hidden but potentially remediable barriers to trainees' EBP skill development and performance. However, while several instruments contain a few attitude items, few instruments assess this domain in depth.33, 50-51,134 Moreover, no attitude instruments in this review met our quality criteria for establishment of validity. One instrument demonstrated responsive validity in an uncontrolled study51 and another demonstrated criterion validity in comparison with another scale.50

There are limitations that should be considered in interpreting the results of this review. As in any systematic review, it is possible that we failed to identify some evaluation instruments. However, we searched multiple databases, including those containing unpublished studies, using a highly inclusive search algorithm. Because our search was limited to English-language journals, we would not capture EBP instruments described in other languages. This might introduce publication bias if such instruments differ systematically from those appearing in English-language journals. Our exclusion of insufficiently described instruments may have biased our analysis if these differed systematically from the others. Our abstraction process showed good interrater reliability, but the characteristics of some EBP evaluation instruments could have been misclassified, particularly in determining validity evidence based on relationship to other variables. In 2 similar reviews of professionalism instruments, there was considerable inconsistency among experts in assigning types of validity evidence.147-148

Our findings, which identified some gaps in EBP evaluation, have implications for medical education research. First, it must be determined whether the current generation of evaluation approaches can be validly used to evaluate a wider range of clinicians, such as nurses and allied health professionals. This is supported by the Institute of Medicine's call for interdisciplinary training.3 Second, there is a need for development and testing of evaluation approaches in 2 content areas of EBP knowledge and skills. Within the "acquire" step, approaches are needed to document trainees' ability to appraise, select, and search secondary electronic medical information resources to find syntheses and synopses of original research studies.18 There is also a need to evaluate trainees' competence in applying evidence to individual patient decision making, considering the evidence (customized for the patient), clinical circumstances, and patient preferences.7 Finally, the science of evaluating EBP attitudes and behaviors continues to lag behind the evaluation of knowledge and skills. Medical education researchers should continue to explore approaches that balance psychometric robustness with feasibility.


AUTHOR INFORMATION
 Jump to Section
 •Top
 •Introduction
 •Methods
 •Results
 •Comment
 •Author information
 •References

Corresponding Author: Terrence Shaneyfelt, MD, MPH, University of Alabama School of Medicine, Veterans Affairs Medical Center, 700 S 19th St, Birmingham, AL 35233 (terry.shaneyfelt{at}med.va.gov).

Author Contributions: Dr Shaneyfelt had full access to all of the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.

Study concept and design: Shaneyfelt, Green.

Acquisition of data: Shaneyfelt, Baum, Bell, Feldstein, Kaatz, Whelan, Green.

Analysis and interpretation of data: Shaneyfelt, Houston, Green.

Drafting of the manuscript: Shaneyfelt, Green.

Critical revision of the manuscript for important intellectual content: Shaneyfelt, Baum, Bell, Feldstein, Houston, Kaatz, Whelan, Green.

Statistical analysis: Shaneyfelt, Houston, Green.

Administrative, technical, or material support: Shaneyfelt, Bell, Green.

Study supervision: Shaneyfelt, Green.

Financial Disclosures: None reported.

Disclaimer: Drs Baum and Green have published articles that were included as part of this review. Neither abstracted data from their own published works.

Acknowledgment: This study was conducted as a charge of the Society of General Internal Medicine Evidence-Based Medicine Task Force, of which 3 authors (Drs Shaneyfelt, Whelan, and Green) are members. We thank Heather Coley, MPH, Department of Medicine, University of Alabama School of Medicine, for her assistance in compiling data on the citation of articles in various electronic databases.

Author Affiliations: Department of Medicine, University of Alabama School of Medicine, and Department of Veterans Affairs Medical Center, Birmingham (Drs Shaneyfelt and Houston); Department of Medicine, University of Minnesota Medical School, Minneapolis (Dr Baum); Department of Medicine, Division of General Internal Medicine, David Geffen School of Medicine at University of California, Los Angeles (Dr Bell); Department of Medicine, University of Wisconsin School of Medicine and Public Health, Madison (Dr Feldstein); Henry Ford Hospital, Detroit, Mich (Dr Kaatz); Department of Medicine, University of Chicago, Chicago, Ill (Dr Whelan); and Department of Medicine, Yale University School of Medicine, New Haven, Conn (Dr Green).


REFERENCES
 Jump to Section
 •Top
 •Introduction
 •Methods
 •Results
 •Comment
 •Author information
 •References

1. McGlynn EA, Asch SM, Adams J, et al. The quality of health care delivered to adults in the United States. N Engl J Med. 2003;348:2635-2645. FREE FULL TEXT
2. Hayward RA, Asch SM, Hogan MM, Hofer TP, Kerr EA. Sins of omission: getting too little medical care may be the greatest threat to patient safety. J Gen Intern Med. 2005;20:686-691. FULL TEXT | ISI | PUBMED
3. Institute of Medicine. Health Professions Education: A Bridge to Quality. Washington, DC: National Academies Press; 2003.
4. Accreditation Council for Graduate Medical Education Outcome Project: general competencies. http://www.acgme.org/outcome/assess/compList.asp. Accessed April 2006.
5. Association of American Medical Colleges. Contemporary Issues in Medicine, II: Medical Informatics and Population Health. Washington, DC: Association of American Medical Colleges; 1998.
6. American Board of Internal Medicine Self-Evaluation of Practice Performance. http://www.abim.org/moc/sempbpi.shtm. Accessed November 2005.
7. Haynes RB, Devereaux PJ, Guyatt GH. Clinical expertise in the era of evidence-based medicine and patient choice. ACP J Club. 2002;136:A11-A14. PUBMED
8. Green ML. Graduate medical education training in clinical epidemiology, critical appraisal, and evidence-based medicine: a critical review of curricula. Acad Med. 1999;74:686-694. ISI | PUBMED
9. Norman GR, Shannon SI. Effectiveness of instruction in critical appraisal (evidence-based medicine) skills: a critical appraisal. CMAJ. 1998;158:177-181. ABSTRACT
10. Taylor R, Reeves B, Ewings P, Binns S, Keast J, Mears R. A systematic review of the effectiveness of critical appraisal skills training for clinicians. Med Educ. 2000;34:120-125. FULL TEXT | ISI | PUBMED
11. Ebbert JO, Montori VM, Schultz HJ. The journal club in postgraduate medical education: a systematic review. Med Teach. 2001;23:455-461. FULL TEXT | ISI | PUBMED
12. Parkes J, Hyde C, Deeks J, Milne R. Teaching critical appraisal skills in health care settings. Cochrane Database Syst Rev. 2001;(3):CD001270. PUBMED