You are seeing this message because your Web browser does not support basic Web standards. Find out more about why this message is appearing and what you can do to make your experience on this site better.


ABOUT JAMA
Advanced Search

Welcome   | My Account | E-mail Alerts | Access Rights | Sign In


  Vol. 287 No. 10, March 13, 2002 TABLE OF CONTENTS
  JAMA
  •  Online Features
  Original Contribution
 This Article
 •Abstract
 •PDF
 •Send to a friend
 • Save in My Folder
 •Save to citation manager
 •Permissions
 Citing Articles
 •Citation map
 •Citing articles on HighWire
 •Citing articles on ISI (50)
 •Contact me when this article is cited
 Related Content
 •Related letter
 •Related articles
 •Similar articles in JAMA
 Topic Collections
 •Informatics/ Internet in Medicine
 •Internet
 •Quality of Care, Other
 •Alert me on articles by topic

Evaluation of a Consumer-Oriented Internet Health Care Report Card

The Risk of Quality Ratings Based on Mortality Data

Harlan M. Krumholz, MD; Saif S. Rathore, MPH; Jersey Chen, MD, MPH; Yongfei Wang, MS; Martha J. Radford, MD

JAMA. 2002;287:1277-1287.

ABSTRACT

Context  Health care "report cards" have attracted significant consumer interest, particularly publicly available Internet health care quality rating systems. However, the ability of these ratings to discriminate between hospitals is not known.

Objective  To determine whether hospital ratings for acute myocardial infarction (AMI) mortality from a prominent Internet hospital rating system accurately discriminate between hospitals' performance based on process of care and outcomes.

Design, Setting, and Patients  Data from the Cooperative Cardiovascular Project, a retrospective systematic medical record review of 141 914 Medicare fee-for-service beneficiaries 65 years or older hospitalized with AMI at 3363 US acute care hospitals during a 4- to 8-month period between January 1994 and February 1996 were compared with ratings obtained from HealthGrades.com (1-star: worse outcomes than predicted, 5-star: better outcomes than predicted) based on 1994-1997 Medicare data.

Main Outcome Measures  Quality indicators of AMI care, including use of acute reperfusion therapy, aspirin, {beta}-blockers, angiotensin-converting enzyme inhibitors; 30-day mortality.

Results  Patients treated at higher-rated hospitals were significantly more likely to receive aspirin (admission: 75.4% 5-star vs 66.4% 1-star, P for trend = .001; discharge: 79.7% 5-star vs 68.0% 1-star, P = .001) and {beta}-blockers (admission: 54.8% 5-star vs 35.7% 1-star, P = .001; discharge: 63.3% 5-star vs 52.1% 1-star, P = .001), but not angiotensin-converting enzyme inhibitors (59.6% 5-star vs 57.4% 1-star, P = .40). Acute reperfusion therapy rates were highest for patients treated at 2-star hospitals (60.6%) and lowest for 5-star hospitals (53.6% 5-star, P = .008). Risk-standardized 30-day mortality rates were lower for patients treated at higher-rated than lower-rated hospitals (21.9% 1-star vs 15.9% 5-star, P = .001). However, there was marked heterogeneity within rating groups and substantial overlap of individual hospitals across rating strata for mortality and process of care; only 3.1% of comparisons between 1-star and 5-star hospitals had statistically lower risk-standardized 30-day mortality rates in 5-star hospitals. Similar findings were observed in comparisons of 30-day mortality rates between individual hospitals in all other rating groups and when comparisons were restricted to hospitals with a minimum of 30 cases during the study period.

Conclusion  Hospital ratings published by a prominent Internet health care quality rating system identified groups of hospitals that, in the aggregate, differed in their quality of care and outcomes. However, the ratings poorly discriminated between any 2 individual hospitals' process of care or mortality rates during the study period. Limitations in discrimination may undermine the value of health care quality ratings for patients or payers and may lead to misperceptions of hospitals' performance.



INTRODUCTION
 Jump to Section
 •Top
 •Introduction
 •Methods
 •Results
 •Comment
 •Author information
 •References

Increasing interest in the quality of health care has led to the development of "report cards" to grade and compare the quality of care and outcomes of hospitals,1 physicians,2 and managed care plans.3 The organizations that produce these evaluations span the spectrum of popular periodicals, federal and state agencies, nonprofit accreditation organizations, consulting companies, and for-profit health care information companies.4 In addition, the Centers for Medicare and Medicaid Services (formerly called the Health Care Financing Administration) has recently expressed interest in developing a public performance report for hospitals.5

One of the most prominent organizations involved in providing health care quality ratings is HealthGrades.com, Inc. This company has developed "Hospital Report Cards" as part of an effort to provide comparative information about quality of health care providers via the Internet.6-8 The company's Web site indicates that as "the healthcare quality experts," it is "creating the standard of healthcare quality."9 Using primarily publicly available Medicare administrative data to calculate risk-adjusted mortality rates for a variety of conditions, HealthGrades.com claims to provide "accurate and objective ratings" for hospitals to enable patients to make "well-informed decisions about where to receive their care." As a free service, public interest in the Web site is substantial, with over 1 million visitors in 2001 and discussion of the company's rating system in publications such as Yahoo! Internet Life10 and in print stories in USA Today and the Los Angeles Times.11-12 HealthGrades.com is publicly traded on NASDAQ and reported over $7 million in revenue in 2000, with a 640% increase in ratings revenue over the fourth quarter of 1999.13 With ratings soon appearing for nursing homes, hospices, home health agencies, fertility clinics, linkages to data concerning individual health plans and providers, and a recently announced partnership with The Leapfrog Group,14 this is one of the most ambitious health ratings resources available online today.

While hospital ratings are widely disseminated to the public, little information is available about their validity. The HealthGrades.com rating system uses publicly available Medicare Part A billing data for many of its ratings, but its statistical methods have not been published in the peer-reviewed literature, nor has any published study, to our knowledge, evaluated its performance. By providing ready access to ratings for all US hospitals via a free, public-access Web site, this rating system offers consumers, who may be unfamiliar with the limitations of rating systems, an option that no other rating system today provides—the opportunity to directly compare 2 individual hospitals' "performance" for a variety of conditions. Use of such ratings may have substantial benefit if it encourages hospitals to compete on quality, but may have significant, unintended, and potentially deleterious consequences if the ratings do not accurately discriminate between individual hospitals' performance. Accordingly, we sought to determine if these ratings could discriminate between hospitals based on their quality of care and outcomes.

For this evaluation we used data from the Cooperative Cardiovascular Project (CCP), a national initiative to improve quality of care for Medicare beneficiaries hospitalized with acute myocardial infarction (AMI). The CCP involved the systematic abstraction of clinically relevant information from more than 200 000 hospitalizations for AMI nationwide. As a highly prevalent condition with significant morbidity and mortality and established quality of care and outcomes measures, AMI is well suited to an assessment of hospital performance. We compared hospitals ratings with process-based measures of the quality of AMI care and risk-standardized 30-day mortality based on medical record review. Since the public is expected to be particularly interested in comparisons between individual hospitals, we determined how often individual higher-rated hospitals performed better than lower-rated hospitals in head to head comparisons.


METHODS
 Jump to Section
 •Top
 •Introduction
 •Methods
 •Results
 •Comment
 •Author information
 •References

The CCP

The CCP, a Centers for Medicare and Medicaid Services project developed to improve the quality of care provided to Medicare beneficiaries hospitalized with AMI,15 included a sample (n = 234 769) of fee-for-service patients hospitalized with a principal discharge diagnosis code of AMI (International Classification of Diseases, 9th Revision, Clinical Modification [ICD-9-CM] code 410, excluding 410.x2) at 4834 hospitals between January 1994 and February 1996. Identified hospital medical records were forwarded to 1 of 2 clinical data abstraction centers and abstracted for predefined variables including demographics, previous medical history, clinical presentation, electrocardiographic reports, laboratory test results, in-hospital treatments, complications, and vital status. Data quality was ensured through the use of trained abstractors, clinical abstraction software, and random record reabstraction.

Study Sample

We excluded patients younger than 65 years (n = 17 593), those in whom a clinical diagnosis of AMI was not confirmed (n = 31 186), and those who were readmitted for AMI (n = 23 773). Patients who transferred into a hospital (n = 34 409) were excluded, as we could not ascertain their clinical characteristics at initial admission. We also excluded patients with a terminal illness (documentation of anticipated life expectancy <6 months) or metastatic cancer (n = 5496) since the focus of their treatment may not have been targeted toward improved survival. Patients admitted to the 1059 hospitals that averaged fewer than 10 patients annually (n = 4724) were also excluded to replicate the minimal volume requirements used in the development of the Internet rating system. Patients admitted to the 66 hospitals for which American Hospital Association data were unavailable (n = 2363) or the 1170 hospitals for which hospital quality ratings were not available (n = 17 162), and patients with unverified mortality from linkage with the Medicare Enrollment Database and the Social Security Administration's Master Beneficiary Record or death outside of the study period (n = 402) were excluded. In total, 92 855 cases (1471 hospitals) met 1 or more of the above exclusion criteria; the remaining 141 914 patients (3363 hospitals) comprised the study cohort.

Hospital Quality Ratings

We collected individual hospital ratings for AMI outcomes directly from the HealthGrades.com Web site in summer 1999.9 Using publicly available Medicare Part A billing data for the period of October 1994 to September 1997 inclusive, the company used a proprietary formula to predict mortality rates during hospitalization and the 30 days following discharge for each hospital incorporating demographic, clinical, and procedural information.9 Each hospital's predicted mortality rate was then compared with its observed mortality rate over the same time period. Hospitals were given a 3-star rating if their "actual performance (observed mortality) was not significantly different from what was predicted."9 Hospitals with statistically significant differences between their observed and expected mortality rates were divided into 2 groups: those hospitals that exceeded predicted performance (ie, observed mortality lower than predicted) and those with poorer performance (ie, higher observed mortality than expected). Among those hospitals that exceeded performance, up to 10% (of the overall population) with the greatest difference between their observed and predicted mortality rates were assigned a 5-star rating to indicate "actual performance was better than predicted and [that] the difference was statistically significant"9; all remaining hospitals that exceeded predicted performance were assigned a 4-star rating. Similarly, among those hospitals in which performance was significantly worse than predicted, up to 10% (of the overall population) with the greatest difference between their observed and predicted mortality rates were assigned a 1-star rating to indicate "actual performance was worse than predicted and the difference was statistically significant."9 Due to a skewed left-shifted distribution in the hospital ratings, no hospitals received a 4-star rating in the period we surveyed. The 2-star and 4-star ratings have since been eliminated and only 1-star or 5-star ratings are now used to identify hospitals whose performance significantly exceeds or fails to meet predicted levels (3-star).

Process of Care Measures and Outcomes

Six process of care measures, drawn from clinical guidelines for the management of AMI,16 were used in our evaluation: (1) use of acute reperfusion therapy (thrombolytic agents or primary angioplasty within 12 hours of admission for patients with ST-segment elevation or left bundle branch block), (2) aspirin within 48 hours of admission, (3) {beta}-blockers within 48 hours of admission, (4) aspirin at discharge, (5) {beta}-blockers at discharge, and (6) angiotensin-converting enzyme inhibitors at discharge. Criteria used to identify patients who were considered "ideal" candidates for each treatment are listed in the BOX. Mortality at 30 days' postinfarction was determined from the Medicare Enrollment Database and the Social Security Administration's Master Beneficiary Record.17


Box. Treatment Exclusion Criteria to Classify Patients as Ideal Candidates

Acute Reperfusion Therapy
Absence of ST-segment elevation or left bundle branch block on admission electrocardiogram
Transferred into the hospital
Chest pain of more than 12 hours in duration
Bleeding before or at time of admission
Increased risk of bleeding or hemorrhage
Stroke on admission or history of cerebrovascular disease
Warfarin use before admission
Malignant hypertension
Age older than 80 years
Patient or physician refused thrombolytics

Aspirin Within 48 Hours of Admission
Bleeding before or at time of admission
Increased risk of bleeding or hemorrhage
History of allergy to aspirin
Transferred into the hospital

{beta}-Blockers Within 48 Hours of Admission
Heart failure at time of admission or history of heart failure
Shock or hypotension at time of admission
Second- or third-degree heart block
History of asthma or chronic obstructive pulmonary disease
Bradycardia at time of admission (unless taking a {beta}-blocker)

History of allergy to {beta}-blockers
Transferred into the hospital

Aspirin at Discharge
Died during hospitalization
Bleeding during hospitalization
Increased risk of bleeding or hemorrhage
History of allergy to aspirin or reaction to aspirin during hospitalization
History of peptic ulcer disease
Warfarin prescribed at discharge
Transferred out of the hospital

{beta}-Blockers at Discharge
Died during hospitalization
Heart failure at time of admission, during hospitalization, or left ventricular ejection fraction (LVEF) less than 35%
Shock or hypotension during hospitalization
Second- or third-degree heart block
History of asthma or chronic obstructive pulmonary disease
Peripheral vascular disease
Bradycardia during hospitalization (unless taking a {beta}-blocker)
History of allergy to {beta}-blockers or reaction to {beta}-blockers during hospitalization
Transferred out of the hospital

Angiotensin-Converting Enzyme (ACE) Inhibitors at Discharge
Died during hospitalization
LVEF 40% or greater or LVEF unknown
Aortic stenosis
Creatinine level greater than 3 mg/dL at time of admission or during hospitalization
Hypotension (unless taking an ACE inhibitor)
History of allergy to ACE inhibitors or reaction to ACE inhibitors during hospitalization
Transferred out of the hospital

RETURN TO TEXT


Statistical Analysis

Patient characteristics, performance on process of care measures, in-hospital outcomes, and 30-day mortality rates were compared between hospitals with different ratings using global and test of trend {chi}2 analyses for categorical variables and analysis of variance for continuous variables.

Hospital ratings were evaluated for their association with each of the 6 process of care measures using a multivariable logistic regression analysis among the cohort of patients classified as ideal candidates for each specific therapy. Analyses were adjusted for patient demographic characteristics; illness severity as assessed by the Medicare Mortality Prediction System, a disease-specific model for predicting 30-day mortality in elderly patients with AMI18; findings on admission; and comorbid conditions. Separate analyses were conducted for each process of care measure, comparing performance in all hospitals relative to the performance of 5-star (top-rated) hospitals.

Hospitals' expected mortality rates were calculated using the mean Medicare Mortality Prediction System predicted probability of 30-day mortality for all patients treated in that hospital. Hospitals' risk-standardized mortality rates were calculated by dividing each hospital's observed 30-day mortality rate by its predicted 30-day mortality rate and multiplying this ratio by the entire cohort's 30-day mortality rate (18.2%). The resulting 30-day mortality rate is standardized to the overall CCP population and provides an estimate for each hospital, assuming that hospital had the same patient composition as the entire sample. To determine the independent association of hospital rating groups with patient survival at 30 days, a multivariable logistic regression analysis was conducted adjusting for patient demographic characteristics, illness severity, admission findings, and comorbid conditions.

To examine variations in treatment and outcomes within different hospital rating groups, we plotted the distribution of risk-adjusted treatment rates and risk-standardized mortality rates for each hospital rating group using "box and whisker" graphs. Hospital rating groups with less variation will have both "shorter" boxes and whiskers, while groups with a broader distribution of rates will have "longer" boxes and whiskers. Box and whisker plots of treatment rates were restricted to those hospitals with 20 or more patients classified as ideal for each therapy.

To evaluate the discrimination provided by the ratings for individual hospitals, we compared risk-standardized mortality rates between individual hospitals within each rating group. If the ratings provided perfect or near perfect discrimination, then all, or nearly all, hospitals in higher rating groups would have lower mortality rates than hospitals in lower rating groups. Thus, all hospitals with 1-star ratings were compared with all hospitals with 5-star ratings to determine the proportion of comparisons in which a 5-star hospital had a significantly lower risk-standardized mortality rate than the 1-star hospital to which it was compared. Similar comparisons were made between 2-star and 5-star hospitals, 3-star and 5-star hospitals, 1-star and 2-star hospitals, 1-star and 3-star hospitals, and 2-star and 3-star hospitals.

Secondary analyses were conducted incorporating hospital characteristics, physician specialty, geographic location, and AMI therapies to determine if these characteristics may have accounted for variations in treatment and outcomes between the rating groups. In addition, comparisons of risk-standardized mortality rates between hospitals in different rating groups were also repeated, restricting analysis to the 1738 hospitals with 30 or more cases and evaluating mortality at 60 days' postadmission. Because the time periods of the CCP cohort and the Internet-based ratings did not exactly overlap, we repeated our analyses by restricting our evaluation of the rating system to those patients admitted after October 1, 1994. We similarly repeated our analyses, including cases that had been excluded as readmissions. Huber-White variance estimates19 were used in all models to provide robust estimates of variance and to adjust for clustering of patients by individual hospitals. All models demonstrated appropriate discrimination and calibration. Odds ratios were converted to relative risk ratios using the conversion formula specified by Zhang and Yu.20 Statistical analyses were conducted using SAS 6.12 (SAS Institute Inc, Cary, NC) and STATA 6.0 (STATA Corp, College Station, Tex).


RESULTS
 Jump to Section
 •Top
 •Introduction
 •Methods
 •Results
 •Comment
 •Author information
 •References

Hospital and Patient Characteristics

Of the 3363 hospitals studied, 10.6% were classified as 5-star hospitals, 74.0% as 3-star hospitals, 7.8% as 2-star hospitals, and 7.6% as 1-star hospitals. Hospitals with higher ratings had a higher AMI volume, were more likely to be teaching hospitals, not-for-profit in ownership, and have invasive cardiac care facilities (Table 1).


View this table:
[in this window]
[in a new window]
Table 1. Hospital Characteristics*


Patients were elderly, predominantly male, and white, and a significant number had comorbid conditions. Patients were mostly treated at 3-star hospitals (n = 98 725, 69.6%), a smaller group at 5-star hospitals (n = 23 944, 16.9%), and even fewer at 1-star (n = 5089, 3.6%) or 2-star (n = 14 156, 10.0%) hospitals. Differences in patient characteristics across hospital rating groups were small, although many of these small differences were statistically significant because of the large sample (Table 2).


View this table:
[in this window]
[in a new window]
Table 2. Patient Characteristics*


Process of Care Measures

A graded association was observed between hospital rating and use of aspirin and {beta}-blockers, both on admission and at discharge. There was no apparent trend for greater use of angiotensin-converting enzyme inhibitors or acute reperfusion therapy in higher-rated hospitals (Table 3). Multivariable analysis of AMI treatment indicated lower rates of aspirin (admission and discharge) and {beta}-blocker use on admission in 1-, 2-, and 3-star hospitals, while only 1- and 2-star hospitals were less likely to provide {beta}-blockers on discharge (Table 4). Patients at 2- and 3-star hospitals were more likely to receive acute reperfusion therapy than patients at 5-star hospitals. In addition, there was significant heterogeneity in the use of treatments among each hospital rating group (Figure 1). Findings were similar in secondary analyses except for 3-star hospitals, which were comparable to 5-star hospitals for use of all therapies and 2-star hospitals' use of {beta}-blockers on admission.


View this table:
[in this window]
[in a new window]
Table 3. Process of Care Measures and In-Hospital Outcomes According to Hospital Rating*



View this table:
[in this window]
[in a new window]
Table 4. Association Between Hospital Rating and Process of Care Measures*




View larger version (18K):
[in this window]
[in a new window]
Figure 1. Risk-Adjusted Rates of Therapy Use Among the Rating Groups

The outer lines of each "box" correspond to the 25th and 75th percentiles, and the middle line corresponds to the 50th percentile in the distribution of treatment rates. The upper horizontal line or "whisker" represents upper adjacent values or treatment rates above the 75th percentile that fall within the range of rates defined by the 75th percentile plus 1.5 times the interquartile range (25th-75th percentile). The lower horizontal line or "whisker" represents lower "adjacent" values or treatment rates below the 25th percentile that fall within the range of rates defined by the 25th percentile minus 1.5 times the interquartile range (25th-75th percentile). ACE indicates angiotensin-converting enzyme.


In-Hospital Outcomes and Mortality

Patients at 5-star hospitals had lower in-hospital mortality rates and higher total charges than patients at lower-rated hospitals; no clear trend was observed for length of stay (Table 3). Crude 30-day mortality rates were highest for patients treated at 1-star hospitals (23.0%), lower in 2- and 3-star hospitals, and lowest among patients treated at 5-star hospitals (15.4%). Risk-standardized mortality rates were nearly identical for patients in 1-star and 2-star hospitals, but higher than those for patients in 3-star and 5-star hospitals, with a 6.0% absolute difference in 30-day mortality between 1-star and 5-star hospitals. Multivariable analysis also indicated a higher 30-day mortality risk among patients treated at 1-star and 2-star hospitals and a slightly lower, but still increased, mortality risk for patients treated at 3-star hospitals compared with 5-star hospitals (Table 5).


View this table:
[in this window]
[in a new window]
Table 5. Association Between Hospital Rating and 30-Day Mortality*


While lower-rated (1-star and 2-star) hospitals had a higher average mortality risk compared with that of 5-star hospitals, there was marked intragroup variation in individual hospitals' 30-day mortality rates. Discrimination in individual hospitals' risk-standardized mortality rates between rating groups was poor, as indicated by the box and whisker plots (Figure 2). Pairwise comparisons of hospitals with 1-star ratings and those with 5-star ratings found that in 92.3% of comparisons, 1-star hospitals had a risk-standardized mortality rate that was not statistically different than that of a 5-star hospital and a lower risk-standardized mortality rate in 4.6% of comparisons. Similarly, 95.9% of 2-star hospital comparisons and 94.6% of 3-star hospital comparisons had risk-standardized mortality rates that were not statistically different or lower than those of the 5-star hospitals to which they were compared. The proportion of comparisons in which mortality rates were statistically comparable between hospitals in different rating groups was similarly high in the comparison of 1-star and 2-star hospitals, 1-star and 3-star hospitals, and 2-star and 3-star hospitals (Table 6).



View larger version (17K):
[in this window]
[in a new window]
Figure 2. Risk-Standardized 30-Day Mortality Rates Among the Rating Groups

The outer lines of the "box" correspond to the 25th and 75th percentiles, and the middle line corresponds to the 50th percentile in the distribution of 30-day mortality rates. The upper horizontal line or "whisker" represents upper adjacent values or 30-day mortality rates above the 75th percentile that fall within the range of rates defined by the 75th percentile plus 1.5 times the interquartile range (25th-75th percentile). The lower vertical line or "whisker" represents lower "adjacent" values or 30-day mortality rates below the 25th percentile that fall within the range of rates defined by the 25th percentile minus 1.5 times the interquartile range (25th-75th percentile).



View this table:
[in this window]
[in a new window]
Table 6. Comparison of Risk-Standardized Mortality Rates Between Hospital Rating Groups*


Secondary Analyses

Our findings were similar in secondary analyses evaluating hospitals with 30 or more cases, assessing mortality at 60 days' postadmission, restricting the cohort to patients admitted after October 1, 1994, and repeating analyses including readmissions.


COMMENT
 Jump to Section
 •Top
 •Introduction
 •Methods
 •Results
 •Comment
 •Author information
 •References

In our evaluation of a popular Web-based hospital report card for AMI, we found a gradient in the care and outcomes of patients in hospitals in different rating categories. In general, patients who received care in higher-rated hospitals were, on average, more likely to receive aspirin and {beta}-blockers and had lower risk-standardized mortality rates than patients treated in lower-rated hospitals. This finding would seem to validate the use of ratings derived from a proprietary model using administrative data. However, we also found substantial heterogeneity in performance within rating categories. In addition, when hospitals assigned to any 2 different rating groups were considered individually instead of in aggregated categories, risk-standardized mortality rates were either comparable or even better in the lower-rated hospital in more than 90% of the comparisons. These findings suggest that these ratings do convey some important information in aggregate, but provide little meaningful discrimination between individual hospitals' performance in a manner sufficient for a public interested in making informed hospital choices.

This rating system's performance at the group and individual hospital level highlights a discrepancy common to hospital rating and evaluation systems. While such ratings may differentiate between groups of hospitals in the aggregate when sample sizes are large enough to produce stable estimates, they do not differentiate well between quality and outcome differences between individual hospitals where sample sizes are much smaller. Although evaluating more cases at each hospital would increase the precision of estimates associated with any individual hospital's performance and the likelihood of detecting differences when comparing 2 hospitals, the patient vol ume at many hospitals is insufficient to produce precise estimates. Even when analyses were restricted to hospitals with an annual volume of 30 or more cases (a large number given the volumes of smaller centers), the proportion of comparisons in which hospitals in 2 different ratings groups were statistically comparable was relatively unchanged. Alternatively, multilevel regression analyses may facilitate comparisons incorporating centers with small volumes. In the absence of this approach, invalid classifications resulting in mislabeling may have significant unintended consequences by providing consumers with an inaccurate perception of an individual hospital's performance. For example, the publication by the then Health Care Financing Administration of statistical outliers for mortality quickly became known as the government's hospital "death list."21-22

Misclassification of hospitals also may be due to the performance of the predictive model. Due to the proprietary nature of the HealthGrades.com model, we were unable to evaluate it directly. Nevertheless, even without information about this model, it is likely that these ratings are limited by their reliance on administrative data. Administrative data are subject to significant limitations, including insufficient clinical information, the inevitable inclusion of substantial numbers of patients who did not experience an AMI because of administrative diagnosis imprecision,23 and confusion concerning complications and preexisting conditions.24-27 Inconsistencies in coding ("overcoding" in "low" mortality hospitals and "undercoding" in "high" mortality hospitals) are also problematic and often explain some of the difference between hospitals' ratings.28 Risk models based on administrative data can lead to substantial misclassification of hospitals when compared with models based on higher-quality data.29 Because of issues of patient selection, either as a result of location, ownership, membership in health plans, or teaching status, hospitals may differ in the kinds of patients treated in a way that is not accounted for in risk-adjustment models. Administrative data are far easier and less expensive to obtain than more clinically detailed information that can be derived from medical records, but they may have limited utility in publicly reported ratings. Concerns about data quality, adequacy of methods, issues of selection bias in patient populations, inadequate risk adjustment, and reliable identification of outlier hospitals were some of the reasons why the then Health Care Financing Administration abandoned its decision to publicly release hospital mortality statistics after 1993.6 The repackaging of Medicare hospital mortality data in this rating system does not address the fundamental limitations of administrative data. This is particularly problematic given that such rating data are provided, with minimal explanation of design concerns, to health care consumers unfamiliar with basic statistical concepts or the limitations of administrative data and administrative data-based rating systems.

Publicly reported hospital ratings based on patient mortality may result in poorer net clinical outcomes than observed prior to public reporting.30 Even if mortality ratings were based on high-quality data and comprehensive risk-adjustment models, mortality has limited utility as a measure of quality. Although mortality is an important measure, it does not identify specific processes that require improvement31 and often correlates poorly with quality of care.32 Mortality results may be best used for internal quality audits in which other supplementary information can be obtained and evaluated. A more accurate evaluation of hospital quality for the public may be achieved by the use of process measures. Comparisons of hospitals' processes of care (eg, the use of {beta}-blockers during hospitalization for AMI) would directly demonstrate whether a hospital's care is in compliance with national treatment guidelines.

The Joint Commission on the Accreditation of Healthcare Organizations is developing such a process-based evaluation,33 and the Centers for Medicare and Medicaid Services is currently evaluating process-based care measures with the goal of reducing preventable mortality.34 Such an approach has its own limitations, notably how to develop standards for reporting and measuring process of care. However, this approach may represent an improvement in the measurement of hospitals' performance by providing quantifiable measures of quality that can be of benefit to both hospitals and consumers.

Several issues should be considered in evaluating our methods. Although we sought to replicate HealthGrades.com's rating approach, there are several differences between our cohort and that it evaluated. First, the period of the rating system's data (October 1994 to September 1997) overlapped with only half of our study period (January 1994 to February 1996). A perfect overlap was not feasible because this rating system first began reporting data (in 1999) for the 1994-1997 period; thus, no ratings were available for the entire CCP period. Lack of a precise temporal overlap, however, would only be of concern if hospitals' ratings markedly changed between March 1996 and September 1997. This would raise even further concerns as to the stability and validity of these ratings because they are based on admissions that occurred 2 to 5 years earlier.

Second, the rating system was based on patients admitted with a principal ICD-9-CM diagnosis code of 410 or a diagnosis related group code of 121, 122, 123, while CCP data only include patients admitted with a principal ICD-9-CM discharge diagnosis of AMI. We believe the use of the principal discharge diagnosis is the most appropriate method of identifying AMIs (which are subsequently clinically confirmed) as it identifies the condition chiefly responsible for a patient's admission to the hospital.35

Third, the rating system retained patients' readmissions in their evaluation cohort. Because multiple admissions for the same patient may violate independence assumptions required for regression analyses, we only included patients' first admissions in our main analysis. However, findings were similar when analyses were repeated incorporating cases that had previously been excluded as readmissions.

Fourth, the rating system includes hospitalizations of patients who arrived by means of interhospital transfer in their hospital evaluation while we excluded these patients from our analysis. Patients with AMI who are admitted by interhospital transfer are generally healthier than those who arrive by direct admission.36 Including these patients would result in a bias toward lower estimates of mortality in large, urban, and advanced cardiac care hospitals that receive patients by transfer. Furthermore, risk adjustment for "admission" characteristics for patients who arrive by transfer would reflect their clinical status several days postinfarction as opposed to peri-infarction characteristics for patients who arrive by direct admission.

Fifth, the rating system excluded hospitalizations of patients who are transferred out of a hospital while we retained these patients in our analysis. Given that patients who leave a hospital by transfer are generally healthier than those not transferred, the rating system's exclusion of these patients results in a systematically biased higher estimate of mortality for smaller hospitals, hospitals in rural areas, hospitals without cardiac care facilities, and others more likely to transfer patients to other centers.37

Finally, we compared hospitals' ratings, based on mortality during hospitalization and the 30 days following discharge, with mortality at 30 days' and 60 days' postadmission. We used a slightly different follow-up period to ensure hospitals' outcomes reflected standardized ascertainment of mortality, not influenced by variations in length of stay or discharge practices, thus avoiding the documented biases associated with using in-hospital mortality to assess hospitals' performance.38

Several possible limitations should be considered in interpreting these data. We only considered a single disease in our evaluation of the rating system, so our findings may not necessarily be generalizable to ratings for other conditions. Nonetheless, AMI is a common, high-volume condition at many hospitals with a clear base of evidence to support recommended treatments. In addition, our study was limited to data concerning Medicare fee-for-service patients hospitalized with AMI and may not be relevant to the care of younger patients or those hospitalized with other conditions. However, the hospital ratings were also derived from data related to this group, and thus should be ideally suited for producing hospital ratings for the treatment of this population. Also, we evaluated only 1 "report card" system. These results may not be generalizable to other ratings systems, although it is unlikely that a ratings system focused on outcomes, using the same data source and the same methods would achieve different results.

The increase in the number of publicly available hospital report cards such as HealthGrades.com reflects the public's desire for comparative data on quality and outcomes. However, the necessary and often overlooked caveat associated with such report cards is that the public (and health care professionals) often become focused on identifying "winners and losers" rather than using these data to inform quality improvement efforts. Our evaluation of an Internet hospital rating system highlights the importance of this message. Although the ratings we evaluated accurately differentiated between large groups of hospitals, they inadequately classified individual hospitals, with significant potential consequences for perceptions of an individual institution's quality of care, particularly if directly released to a public unfamiliar with the design and limitations of administrative data-derived rating systems. As such, current outcome-based report card efforts are better used as a tool for quality improvement, rather than as a publicly reported means of discriminating between hospital performance.


AUTHOR INFORMATION
 Jump to Section
 •Top
 •Introduction
 •Methods
 •Results
 •Comment
 •Author information
 •References

Author Contributions: Study concept and design: Krumholz, Rathore, Chen, Radford.

Acquisition of data: Krumholz, Chen, Radford.

Analysis and interpretation of data: Krumholz, Rathore, Chen, Wang, Radford.

Drafting of the manuscript: Krumholz, Rathore, Chen.

Critical revision of the manuscript for important intellectual content: Krumholz, Rathore, Chen, Wang, Radford.

Statistical expertise: Krumholz, Rathore, Chen, Wang.

Obtained funding: Krumholz.

Administrative, technical, or material support: Chen, Wang.

Study supervision: Krumholz.

Funding/Support: The analyses upon which this article is based were performed under Contract 500-99-CT01 entitled "Utilization and Quality Control Peer Review Organization for the State of Connecticut," sponsored by the Health Care Financing Administration, US Department of Health and Human Services.

Disclaimer: The content of this publication does not necessarily reflect the views or policies of the US Department of Health and Human Services, nor does mention of trade names, commercial products, or organizations imply endorsement by the US government. The authors assume full responsibility for the accuracy and completeness of the ideas presented. This article is a direct result of the Health Care Quality Improvement Project initiated by the Health Care Financing Administration, which has encouraged identification of quality improvement projects derived from analysis of patterns of care, and therefore required no special funding on the part of the contractor.

Acknowledgment: The authors thank Christopher Puglia, BS, for assistance in data collection and Maria Johnson, BA, for editorial assistance.

Corresponding Author: Harlan M. Krumholz, MD, Yale University School of Medicine, 333 Cedar St, PO Box 208025, New Haven, CT 06520-8025.

Author Affiliations: Section of Cardiovascular Medicine, Department of Medicine (Drs Krumholz, Chen, and Radford, and Messrs Rathore and Wang), and Section of Health Policy and Administration, Department of Epidemiology and Public Health (Dr Krumholz), Yale University School of Medicine, New Haven, Conn; Yale-New Haven Hospital Center for Outcomes Research and Evaluation, New Haven, Conn (Drs Krumholz and Radford); and Qualidigm, Middletown, Conn (Drs Krumholz and Radford). Dr Chen is currently affiliated with the Department of Radiology, Hospital of the University of Pennsylvania, Philadelphia.


REFERENCES
 Jump to Section
 •Top
 •Introduction
 •Methods
 •Results
 •Comment
 •Author information
 •References

1. America's Best Hospitals: 2001 Hospital Guide. US News and World Report; 2001. Available at: http://www.usnews.com/usnews/nycu/health/hosptl/tophosp.htm. Accessed February 12, 2002.
2. Green J, Wintfeld N. Report cards on cardiac surgeons: assessing New York State's approach. N Engl J Med. 1995;332:1229-1232. FREE FULL TEXT
3. National Committee for Quality Assurance. NCQA's Health Plan Report Card. Washington, DC: National Committee for Quality Assurance; 2000.
4. Health Care Report Cards 1998-1999. 4th ed. Washington, DC: Atlantic Information Services Inc; 1998.
5. Pear R. Medicare shift towards HMOs is planned. New York Times. June 5, 2001:A19.
6. Morrissey J. Internet company rates hospitals. Mod Healthc. 1999;29:24-25. PUBMED
7. Schifrin M, Wolinsky M. Use with care. Forbes Best of the Web, June 25, 2001. Available at: http://www.forbes.com/bow/. Accessed February 12, 2002.
8. Prager LO. Criteria to identify "leading physicians" yield a long list. American Medical News. September 6, 1999. Available at: http://www.ama-assn.org/sci-pubs/amnews/pick_99/prl10906.htm. Accessed February 12, 2002.
9. Healthgrades.com: The Healthcare Quality Experts Available at: http://www.healthgrades.com. Accessed June 18, 2001.
10. Butler R. Fifty most incredibly useful sites. Yahoo! Internet Life. July 2001. Available at: http://www.yil.com/features/feature.asp?volume=07&issue=07&keyword=usefulsites. Accessed February 12, 2002.
11. Appleby J, Davis R. Is your doctor bad? USA Today. October 11, 2000:B1.
12. Carey B. Say "aah": your health online. Los Angeles Times. July 2, 2001:S2.
13. HealthGrades, Inc announces fourth quarter and year-end results; 2001. Available at: http://www.healthgrades.com. Accessed February 12, 2002.
14. HealthGrades announces partnership agreement with the Leapfrog Group; 2002. Available at: http://www.healthgrades.com. Accessed February 12, 2002.
15. Marciniak TA, Ellerbeck EF, Radford MJ, et al. Improving the quality of care for Medicare patients with acute myocardial infarction: results from the Cooperative Cardiovascular Project. JAMA. 1998;279:1351-1357. FREE FULL TEXT
16. Ryan TJ, Anderson JL, Antman EM, et al. ACC/AHA guidelines for the management of patients with acute myocardial infarction: a report of the American College of Cardiology/American Heart Association Task Force on Practice Guidelines (Committee on Management of Acute Myocardial Infarction). J Am Coll Cardiol. 1996;28:1328-1428. FULL TEXT | ISI | PUBMED
17. Fleming C, Fisher ES, Chang CH, Bubolz TA, Malenka DJ. Studying outcomes and hospital utilization in the elderly: the advantages of a merged data base for Medicare and Vet