National hospital ratings systems share few common scores and may generate confusion instead of clarity.

At the Intersection of Health, Health Care and Policy Cite this article as: J. Matthew Austin, Ashish K. Jha, Patrick S. Romano, Sara J. Singer, Timothy J. Vogus, Robert M. Wachter and Peter J. Pronovost National Hospital Ratings Systems Share Few Common Scores And May Generate Confusion Instead Of Clarity Health Affairs, 34, no.3 (2015):423-430 doi: 10.1377/hlthaff.2014.0201

The online version of this article, along with updated information and services, is available at: http://content.healthaffairs.org/content/34/3/423.full.html

For Reprints, Links & Permissions: http://healthaffairs.org/1340_reprints.php E-mail Alerts : http://content.healthaffairs.org/subscriptions/etoc.dtl To Subscribe: http://content.healthaffairs.org/subscriptions/online.shtml

Health Affairs is published monthly by Project HOPE at 7500 Old Georgetown Road, Suite 600, Bethesda, MD 20814-6133. Copyright © 2015 by Project HOPE - The People-to-People Health Foundation. As provided by United States copyright law (Title 17, U.S. Code), no part of Health Affairs may be reproduced, displayed, or transmitted in any form or by any means, electronic or mechanical, including photocopying or by information storage or retrieval systems, without prior written permission from the Publisher. All rights reserved.

Not for commercial use or unauthorized distribution Downloaded from content.healthaffairs.org by Health Affairs on March 26, 2015 at University of Sussex

Patient Engagement By J. Matthew Austin, Ashish K. Jha, Patrick S. Romano, Sara J. Singer, Timothy J. Vogus, Robert M. Wachter, and Peter J. Pronovost

10.1377/hlthaff.2014.0201 HEALTH AFFAIRS 34, NO. 3 (2015): 423–430 ©2015 Project HOPE— The People-to-People Health Foundation, Inc.

doi:

National Hospital Ratings Systems Share Few Common Scores And May Generate Confusion Instead Of Clarity

J. Matthew Austin (jausti17@ jhmi.edu) is an assistant professor at the Armstrong Institute for Patient Safety and Quality, Johns Hopkins Medicine, in Baltimore, Maryland.

Attempts to assess the quality and safety of hospitals have proliferated, including a growing number of consumer-directed hospital rating systems. However, relatively little is known about what these rating systems reveal. To better understand differences in hospital ratings, we compared four national rating systems. We designated “high” and “low” performers for each rating system and examined the overlap among rating systems and how hospital characteristics corresponded with performance on each. No hospital was rated as a high performer by all four national rating systems. Only 10 percent of the 844 hospitals rated as a high performer by one rating system were rated as a high performer by any of the other rating systems. The lack of agreement among the national hospital rating systems is likely explained by the fact that each system uses its own rating methods, has a different focus to its ratings, and stresses different measures of performance. ABSTRACT

A

ssessing the quality and safety of care delivered in hospitals is increasingly important for patients, payers, and providers, as evidenced by increased public reporting of hospital quality and safety data and by payment changes that reimburse providers for value, rather than volume.1–3 Consequently, the marketplace has seen a surge in consumer-directed hospital rating systems that assess and compare the relative quality and safety of hospitals. However, relatively little is known about what these rating systems reveal. To date, research examining relationships between a hospital’s rating and other measures has shown mixed results. Some research indicates that being named to U.S. News & World Report’s “Best Hospitals” list is associated with lower thirty-day mortality,4 but other studies have found no association between the U.S. News list and readmissions,5 wide variation on a number of indicators,6 and discrepancies with other ratings systems such as the Centers

Ashish K. Jha is a professor of health policy and management at the Harvard T.H. Chan School of Public Health, in Boston, Massachusetts. Patrick S. Romano is a professor of medicine and pediatrics in the Division of General Medicine at the University of California, Davis, School of Medicine, in Sacramento.

for Medicare and Medicaid Services’ (CMS’s) Hospital Compare.7 Hospital rating systems use a variety of methods for distinguishing “high” performers from “low” performers, often creating the paradox of hospitals’ simultaneously being considered best and worst depending on the rating system used.8–10 For example, 43 percent of hospitals classified as having below-average mortality by one risk-adjustment method were classified as having above-average mortality by another method.11 These contradictions have created challenges for stakeholders concerned with hospital quality. For patients, differences across hospital ratings add complexity to ascertaining a hospital’s actual quality. For payers, conflicting ratings make it difficult for them to recognize and reward hospitals for high quality.12 For hospital leadership, differences across rating systems complicate decisions regarding the focus of their improvement efforts. Four well-known national entities released March 2 015

3 4: 3

Downloaded from content.healthaffairs.org by Health Affairs on March 26, 2015 at University of Sussex

Sara J. Singer is an associate professor of health care management and policy in the Department of Health Policy and Management at the Harvard T.H. Chan School of Public Health. Timothy J. Vogus is an associate professor at the Owen Graduate School of Management at Vanderbilt University, in Nashville, Tennessee. Robert M. Wachter is a professor and associate chair in the Department of Medicine at the University of California, San Francisco, where he holds the Benioff Endowed Chair in Hospital Medicine.

Health Affa irs

423

Patient Engagement

Peter J. Pronovost is a professor of anesthesiology and critical care medicine, surgery, and health policy and management at the Johns Hopkins University, in Baltimore, Maryland. He is also the director of the Armstrong Institute for Patient Safety and Quality at Johns Hopkins Medicine.

424

hospital ratings in 2012 and early 2013. In each case, the hospitals were rated at no cost to the hospital. U.S. News, a for-profit company known for its magazine and ratings of universities and graduate programs, has issued its Best Hospitals list for twenty-three years. HealthGrades, a forprofit company that develops and markets quality and safety ratings of health care providers in addition to offering consulting services, has rated hospitals since 1998, releasing its annual Top 50 and Top 100 hospital lists, among many other ratings.13 The Leapfrog Group, a nonprofit purchaser-based coalition advocating for improved transparency, quality, and safety in hospitals, has supplemented its annual hospital survey since 2012 with assigning letter grades (A-B-CD-F) to hospitals, reflecting how well they kept patients free from harm.14 And Consumer Reports, a nonprofit organization known for rating and comparing consumer products, has issued a hospital safety rating (rating hospitals on a 0–100 scale) since 2012.15 In addition to these formal rating systems, the CMS Hospital Compare website, reporting programs sponsored by states and regional quality collaboratives (for example, Chartered Value Exchanges), web-based consumer-driven rating systems (for example,Yelp), and hospital systems’ self-reported performance (including their own websites and paid media) serve as additional inputs into consumers’ decision making.16–18 Aside from differences in their for-profit status, the four national rating systems also differ in how they finance the ratings. Independently funding the work of rating hospitals (that is, acquiring, analyzing, and presenting the data) is essential because such work is rarely underwritten by grants or other public funding sources. Consequently, Leapfrog, U.S. News, and HealthGrades finance their ratings, in part, by allowing hospitals to use their ratings in advertisements and promotional materials for a fee. In contrast, Consumer Reports does not allow hospitals to use its ratings promotionally and instead releases its ratings only to paid subscribers. Online Appendix Exhibit A1 details the business models of the four ratings organizations that we evaluated.19 To better understand the differences across the four rating systems, we explored how differences in rating methodologies may help explain differing conclusions regarding whether a hospital is a high or low performer. Specifically, we reviewed the methodologies underlying the ratings, examined whether the existing ratings provide a convergent or divergent picture of highand low-performing hospitals, and explored whether hospital characteristics make a hospital more or less likely to be a high or low performer.

H ea lt h A f fai r s

Ma rc h 2 015

34 : 3

Overview Of The Rating Methodologies Each of the four rating systems articulates a distinct objective and uses a unique methodology for determining its ratings. Appendix Exhibit A2 compares the ratings’ methodologies across three attributes that we discuss further below: the focus of the score, the types of measures used, and the transparency of calculations.19 Focus Of Rating Both Leapfrog and Consumer Reports focus on safety for rating hospitals, although each defines safety differently. Leapfrog defines it as “freedom from harm,”20 while Consumer Reports refers to “a hospital’s commitment to the safety of their patients.” HealthGrades’ Top 50 and Top 100 ratings stress quality, highlighting hospitals that consistently perform well on patient outcomes, as measured by mortality and complication rates. U.S. News focuses on identifying the “best medical centers for the most difficult patients,” with the goal of helping consumers determine which hospitals provide the best care for serious or complicated medical conditions and procedures.21 Hospital Eligibility In addition to differences in focus, each of the ratings assesses a different number of hospitals. That is, each has its own inclusion and exclusion criteria. Leapfrog and Consumer Reports both start with all general acute care hospitals and then exclude hospitals that do not have sufficient data. For Leapfrog’s rating, this excludes hospitals that do not participate in the CMS inpatient prospective payment system (IPPS), including critical-access hospitals; free-standing children’s hospitals; specialty hospitals; federal hospitals; and hospitals located in Maryland, Puerto Rico, and Guam. In addition, Leapfrog does not rate hospitals participating in the IPPS if the hospitals are missing significant amounts of data. As a result, 2,514 hospitals were eligible for Leapfrog’s rating. Consumer Reports excludes hospitals that do not have at least some data available for all six of the measure sets that constitute its composite score (for example, avoiding complications, avoiding readmissions). Consumer Reports had 2,040 hospitals eligible for its rating. U.S. News limits eligibility for its rating to teaching hospitals, hospitals affiliated with medical schools, hospitals with 200 or more beds, and hospitals with 100 or more beds that also have at least four of eight “key” medical technologies. As a result, 1,928 hospitals were eligible for the rating. Within a hospital, U.S. News evaluates sixteen specialties, twelve of which are “data driven” (that is, do not rely exclusively on a reputational survey). In addition to U.S. News’s general eligibility requirements, eligibility in a specialty requires hospitals to meet a discharge


volume or be nominated by 1 percent of physicians in that specialty’s reputational survey. A total of 15,447 hospital data–driven specialty combinations were eligible for a rating. Eligibility for HealthGrades rating rests on two criteria. First, hospitals must have had thirty or more cases during the past three years in at least nineteen of the twenty-seven common procedures and conditions evaluated by HealthGrades using Medicare inpatient data from the Medicare Provider Analysis and Review database. Five of the thirty cases must be from the most recent year of analysis. Second, the hospital must have been given a Distinguished Hospital Award for Clinical Excellence by HealthGrades for at least the past four years. Although HealthGrades begins with more than 4,000 hospitals in its analysis, based on the two eligibility criteria, only 262 hospitals were eligible for HealthGrades’ America’s Best Hospital designation.22 Measures Included The Leapfrog rating includes multiple measures of hospital structures, processes, and outcomes that have been linked to patient safety. Structures include use of computerized physician order entry systems for ordering medications; processes include adherence to the CMS/Joint Commission Surgical Care Improvement Project measures; and outcomes include rates of certain hospital-acquired infections. Hospital performance on the structural and process measures constitute 50 percent of the rating, with performance on outcome measures accounting for the remaining 50 percent. Consumer Reports uses process (for example, overuse of double chest computed tomography [CT] scans) and outcome measures (for example, thirty-day readmissions for common acute conditions), which account for 78 percent of the rating. The remaining 22 percent of the rating consists of a subset of patient experience measures (that is, communication about discharge instructions and new medications). The HealthGrades rating is exclusively based on hospital outcome measures—specifically, risk-adjusted mortality and complication measures—for twenty-seven conditions and procedures. The outcome measures and risk adjusters are derived from Medicare data (patients ages sixty-five and older). U.S. News specialty ratings include structure (30.0 percent of score), process (32.5 percent of score), and outcomes measures (37.5 percent of score). U.S. News, however, uses a reputational survey as a proxy for measuring processes of care, rather than hospital adherence to clinical guidelines.21 Approaches To Missing Data Both the Consumer Reports and HealthGrades ratings handle missing data by establishing minimum data thresholds for a hospital to receive a rating. If

the data threshold is not met, a rating is not calculated. The Leapfrog rating uses a couple of approaches for missing data. In most instances, a measure is ignored if data are missing, and the relative weight assigned to that measure is reallocated to those measures for which data are available. The exception to this rule is for two structural measures—computerized physician order entry and intensive care unit physician staffing—for which Leapfrog imputes a score based on other data (that is, hospitals’ responses to the American Hospital Association [AHA] annual survey). For U.S. News, there is not an explicit discussion in its methodology about how it approaches the issue of missing data; however, hospitals that are missing data from at least one of the two most recent years of the AHA annual survey do not receive any credit for the measures that are derived from the AHA data. Risk Adjustment All four rating systems use outcome measures in calculating their ratings but with varying degrees of risk adjustment and transparency. Outcome measures that are not adjusted for risk include hospital-acquired infection measures used by Leapfrog and Consumer Reports. These measures are stratified by risk. Leapfrog also does not adjust for risk of CMSbased hospital-acquired condition measures (for example, foreign object retained), as they are considered by many to represent “never events,” minimizing the need for risk adjustment. The risk-adjustment methodologies used with the outcome measures incorporated into the Leapfrog and Consumer Reports ratings are fully transparent, and outside parties can replicate their hospital ratings. The U.S. News and HealthGrades ratings use proprietary methods such that outsiders are not able to replicate the risk adjustment or calculate a hospital’s final rating. Communication Of Ratings The organizations also vary in how they communicate their ratings to the public. Leapfrog communicates a hospital’s rating as one of five letter grades, A–F. Consumer Reports and U.S. News issue scores from 0 to 100 for each hospital and rated specialty, respectively. U.S. News also reports a ranked Honor Roll (with number one being the best performer). For its Best Hospitals list, HealthGrades identifies the top 50 and top 100 hospitals but does not rank these hospitals or provide any rating information on hospitals outside the top 100.

Study Data And Methods To compare hospital performance on four national hospital rating systems—U.S. News’s Best Hospitals, HealthGrades’ America’s 100 Best Hospitals, Leapfrog’s Hospital Safety Score, March 2 015

3 4: 3


Health Affa irs

425

Patient Engagement and Consumer Reports’ Health Safety Score—we stratified hospitals by their letter grade (Leapfrog) or numerical score in decile ranges (Consumer Reports and U.S. News). The release date of the ratings used in the analyses covered the time period from July 2012 to July 2013. For U.S. News, we used the distribution of scores from the twelve data-driven specialties that are not exclusively based on reputational survey, because U.S. News issues scores by specialty and does not provide an overall numerical score for a hospital. HealthGrades does not publicly release a grade or score for each hospital but rather lists the top 100 hospitals without distinguishing among hospitals on the list. Therefore, we were not able to quantitatively examine a distribution for HealthGrades. For purposes of identifying the overlap among hospitals rated as high performers, hospitals in all four ratings were included. Appendix Exhibit A3 summarizes the number and percentage of hospitals that were rated as high performers, medium performers, and low performers on each rating system.19 For Leapfrog, we defined a high performer as a hospital given the grade A, representing the top 31 percent of rated hospitals (780 hospitals). For Consumer Reports, there was no official definition of a high performer, so the study authors, through group discussion, set the definition as a hospital with a score of 65 or greater. A score above 65 means that a hospital was approximately two standard deviations above the mean or in the top 3.6 percent of rated hospitals (seventy-three hospitals). For U.S. News, we defined a high performer as a hospital listed on the U.S. News Honor Roll, representing the top 0.9 percent of rated hospitals (eighteen hospitals). Lastly, for HealthGrades, we defined a high performer as a hospital included on its Top 100 hospital list, representing the top 38.2 percent of rated hospitals (100 hospitals). For identifying the overlap in hospitals rated as low performers, we included only three ratings, as HealthGrades lists only top hospitals. For Leapfrog, we defined a low performer as a hospital assigned a grade of D or F, representing 6.5 percent of rated hospitals (164 hospitals). For Consumer Reports, there was no official definition of a low performer, so the study authors, through group discussion, set the definition as a hospital with a score of 30 or lower. A score of 30 or lower means that a hospital was approximately two standard deviations below the mean or in the bottom 3.4 percent of eligible hospitals (seventy hospitals). And for U.S. News, we defined a low performer as a hospital that received a score of 10 or lower in one or more of the twelve data-driven specialties, representing 4.2 percent of rated hospitals (eighty-one hospitals). This 426

Health Affai rs

March 2 015

3 4: 3

threshold was also set by the study authors through group discussion. Hospital scores between the high- and low-performer thresholds for each rating were considered to be medium performers. We included all hospitals that appeared in at least one rating in the high- and low-performer analyses. We did so because differences in eligibility criteria and their influence were of substantive interest. To better understand how hospital characteristics correspond with each rating system, we used the 2011 AHA Annual Survey of Hospitals to examine hospital performance according to a set of characteristics, including region of the country, number of beds, ownership type, teaching status, urban or rural status, membership in a hospital system, the percentage of patients who have Medicare as their primary insurance, and the percentage of patients who have Medicaid as their primary insurance. The 2011 AHA survey data were the most current data that overlapped with reporting periods for each rating (2009– 13).We used Fisher’s exact test to test if particular hospital characteristics are statistically overrepresented or underrepresented in each scoring stratum of Consumer Reports, Leapfrog, and U.S. News ratings. Limitations Our findings should be considered in light of the study’s limitations. First, our analysis included only one round of ratings from each of the four rating systems. Repeating the analysis with a different round of ratings may produce different conclusions. However, preliminary analysis suggests that high performers tend to maintain their ratings. Second, we included only four national rating systems in our analysis. It is possible that greater agreement might exist between other national ratings or state- or region-level ratings. However, we chose these national ratings because they are both most visible and readily available. Third, when the rating system did not provide clear performance thresholds, we had to make decisions about what defined a high-performing or low-performing hospital. Different decisions about those thresholds could lead to different results, although our results appear to be robust to the thresholds chosen. Finally, none of the four rating systems explicitly define what constitutes a hospital. This is potentially problematic because Medicare provider numbers identifying hospitals can occasionally encompass two or more physical entities that are actually separate hospitals with differing names and addresses. Our review of eligible hospitals indicates that the ratings generally define a hospital as a single physical entity, but each rating could be enhanced by clarifying their definitions.


Study Results

Exhibit 1

Leapfrog graded 56 percent of hospitals as A or B (Exhibit 1). Consumer Reports scored 85 percent of hospitals in medium-performance categories, between 31 and 60 out of 100 (Exhibit 2). U.S. News scored 89 percent of hospitals in mediumperformer categories, between 21 and 60 out of 100, in the twelve data-driven specialties (Exhibit 3). Eighty-three hospitals were rated by all four rating systems, with no hospital rated as a high performer by all four (Exhibit 4). Only three hospitals were rated as high performers by three of the four systems. In aggregate, only 10 percent of the 844 hospitals rated as high performers by one of the rating systems were rated as high performers by any of the other systems. Of the eighty-eight hospitals rated as high performers on more than one rating system, Leapfrog and HealthGrades agreed most frequently (fortyeight hospitals, 55 percent); Consumer Reports and U.S. News agreed on none of the hospitals they rated as high performers. No one hospital was rated as a low performer on all three of the rating systems that rated low performance. Fifteen hospitals were rated as low performers on two of the three systems. Of the fifteen hospitals rated as low performers on more than one rating system, Leapfrog and Consumer Reports agreed most frequently (seven hospitals, 47 percent). The weakest agreement for low performers was between Consumer Reports and U.S. News, which agreed on only three hospitals (20 percent). We also conducted supplementary analyses to compare high and low performers among hospitals rated by Consumer Reports, Leapfrog, and U.S. News (1,559 hospitals). We found similar levels of overlap in which hospitals were rated as high and low performers. We also found the distribution of grades and scores to be very similar to the original set of hospitals. In addition to examining the level of agreement across ratings, we also examined disagreement: whether high performers in one rating were low performers in another. We identified twenty-seven cases of such extreme cross-rating disagreement. Fourteen hospitals were rated as high performers by Leapfrog and low performers by U.S. News. Seven hospitals were rated as high performers by Leapfrog and low performers by Consumer Reports.23 In all other cases, high performers on one rating were rated as middle performers by the other ratings. Appendix Exhibit A4 illustrates the results of Fisher’s exact tests examining whether a scoring stratum (high performing, low performing) is statistically overrepresented or underrepresented on a specific hospital characteristic.19 For

Distribution Of Letter Grades For Leapfrog’s Hospital Safety Score

SOURCE Leapfrog Group, released May 2013.

Leapfrog, public hospitals were overrepresented among low-performing hospitals, while private nonprofit hospitals were underrepresented. In addition, for Leapfrog, hospitals that are members of a hospital system were overrepresented among higher-performing hospitals and underrepresented among lower-performing hospitals. Hospitals with the largest percentage of patients with Medicaid as their primary insurance were underrepresented among high-performing hospitals and overrepresented among lower-performing hospitals. For Consumer Reports, hospitals located in the Midwest, small hospitals (1–99 beds), private nonprofit hospitals, and nonteaching hospitals were overrepresented among the high-performing hospitals. For-profit hospitals and hospitals that had the largest percentage of patients with Medicaid as their primary insurance were statistically overrepresented among low-performing hospitals. For U.S. News, large hospitals (400 or more beds), major teaching hospitals, hospitals that are members of a system, and hospitals that treat the lowest percent-

Exhibit 2 Distribution Of Scores For Consumer Reports’ Health Safety Score

SOURCE Consumer Reports Health, released July 2012.

M a r c h 20 1 5

34:3


Health Affairs

427

Patient Engagement Exhibit 3 Distribution Of Scores Of Data-Driven Specialties For U.S. News & World Report’s Best Hospitals

SOURCE U.S. News & World Report’s Best Hospitals 2013–14, released July 2013.

age of patients with Medicare as their primary insurance were overrepresented among highperforming hospitals. Medium-size hospitals (100–399 beds), for-profit hospitals, minor teaching and nonteaching hospitals, hospitals that are not members of a hospital system, and hospitals with a higher percentage of patients with Medicare as primary insurance were underrepresented among high performers on US News.

Discussion We examined four consumer-directed national hospital rating systems to identify whether they arrive at similar or different conclusions about how a hospital is performing and, if different, to explain why. There was disagreement across the four ratings, with each identifying different sets of high- and low-performing hospitals. In addition, Leapfrog rated a relatively large set of hospitals favorably compared to Consumer Reports, U.S. News, and HealthGrades. We described how

differing rating methodologies likely caused the discrepancies among hospital ratings. For example, Leapfrog and Consumer Reports focused on hospital safety, although each defined safety differently. U.S. News focused strictly on the “best medical centers for the most difficult patients,” whereas HealthGrades focused on general hospital quality over time. Leapfrog and Consumer Reports both used the whole hospital as their unit of analysis, but they included different types of measures in their ratings. U.S. News and HealthGrades assessed the performance of individual specialties within a hospital to calculate their overall hospital rating. In addition, each system emphasized different measures of performance. While the lack of agreement among these rating systems is largely explained by their different foci and measures, these differences are likely unclear to most stakeholders. The complexity and opacity of the ratings is likely to cause confusion instead of driving patients and purchasers to higher-quality, safer care. Given our findings, maximum transparency regarding each rating’s measures and methods is needed to help stakeholders understand any individual rating and to compare across ratings. There is also a need to better understand whether the disagreement in rankings serves as a motivation for hospitals to provide more complete information to consumers. It is essential for the organizations sponsoring the ratings to assist patient interpretations of their ratings through use of media and other channels. Otherwise, patients may only learn of the ratings through hospital advertisements and promotional materials. The four rating systems vary in the transparency of their methodologies. Leapfrog and Consumer Reports provide full methodological transparency, sharing all of the underlying details and methods needed to replicate a hospital’s rating, while U.S. News and

Exhibit 4 Overlap Of Hospitals Rated As High Performers And Low Performers Across The Four Hospital Rating Systems No. of ratings that overlap High performers

No. of hospitals rated as high performers

Details of the high-performer overlap

On all four ranking systems On three of four ranking systems On two of four ranking systems

0 3 85

— 2 rated by LF, CR, and HG; 1 rated by LF, US, and HG 45 rated by LF and HG; 7 rated by LF and US; 29 rated by LF and CR; 0 rated by CR and US; 3 rated by CR and HG; 1 rated by US and HG

Low performersa On all three ranking systems On two of three ranking systems

0 15

— 7 rated by LF and CR; 5 rated by LF and US; 3 rated by US and CR

SOURCE Leapfrog Group, Consumer Reports Health, U.S. News & World Report’s Best Hospitals 2013–14, and HealthGrades America’s Best Hospitals 2013 Report. NOTE LF is Leapfrog; CR is Consumer Reports; US is U.S. News & World Report; and HG is HealthGrades. aHealthGrades was excluded from the low performers because it lists only top hospitals and makes no distinctions among the lower-performing hospitals.

428

H e a lt h A f fai r s

March 2015

34:3


HealthGrades do not make their proprietary risk-adjustment models fully transparent. Full transparency allows providers to use the ratings as a trigger for improvement efforts; it enables consumers and payers to incorporate them into their decision making. Still, Leapfrog, Consumer Reports, and U.S. News composites use judgmentbased weighting schemes as opposed to approaches that incorporate information about measure reliability and validity. This decision could be attributed to the unavailability of denominator data needed to adjust the weight of a measure for its reliability. The use of a more empirically driven weighting scheme would ensure that more reliable and valid measures of performance would receive greater weight. In addition, the weighting schemes used by all of the ratings are standardized for all patients, with none dynamically accounting for individual patient preferences. This could be remedied by allowing patients to provide preferences for measures and domains through a web-based tool. Future research should examine how the preferences of different health care stakeholder groups can be used to validate, refine, or weight the measures comprising the ratings. It would also be useful for future research to explore how hospitals and their administrators make use of the ratings. For example, what are the characteristics of hospitals that use the ratings primarily for advertising purposes, primarily for internal improvement efforts, or both? And how do these choices affect hospital ratings over time? The methods used by Leapfrog and Consumer Reports to publicly communicate their rankings—letter grades and numerical scores, respectively—make it easier for patients to identify the low performers than the methods used by HealthGrades, and to a lesser degree U.S. News, which emphasize identifying “top” hospitals. The approach of focusing on top hospitals may make the U.S. News and HealthGrades rankings less controversial to hospitals and may be beneficial for those ratings that rely on advertising for their business model. However, they may be less useful for patient and purchaser decision making. They also may be less useful because topranked hospitals can provide care to only a small subset of the US population. Ratings that inform patients of the relative quality of the providers accessible to them hold more promise for broadly shaping consumers’ and payers’ behavior. A trade-off exists between more finely grained scores (for example, 0–100 scales) and coarsely grained summaries (for example, letter grades). Finely grained scores allow for greater differentiation in hospital performance, yet small differences may be overinterpreted. Higher-level summaries can mask variations within a category yet

are easier to understand and potentially more actionable for stakeholders. In contrast, the use of terms such as “top” and “best” may provide an aspirational target or source of best practices for providers as well as guidance for patients who have the means to travel to other markets for hospital care. However, they are less informative to the majority of stakeholders, as relatively few hospitals receive these designations. We also identified some systematic differences across the ratings. Public hospitals and hospitals that have the largest percentages of patients with Medicaid as their primary insurance were overrepresented on the lower end of Leapfrog’s ratings. This could suggest that the Leapfrog rating is biased against these hospitals or that safetynet hospitals perform more poorly than others. Either way, additional research is needed. Nonteaching and small hospitals (1–99 beds) are overrepresented as high performers in Consumer Reports’ ratings, likely reflecting its decision to include Hospital Consumer Assessment of Healthcare Providers and Systems (HCAHPS) survey results. Earlier studies have found that nonteaching hospitals and smaller hospitals generally receive higher HCAHPS scores.24

Conclusion The four rating systems that we evaluated varied in their foci, measures, methods, and transparency—characteristics that have the potential to confuse patients, providers, and purchasers. Only a very small number of hospitals were found to be either high or low performers on multiple ratings. This suggests that there are relatively few hospitals that excel across all the measures. The differences across rating systems may reflect true differences in quality and safety across different indicators. However, the divergent ratings of hospitals may also reflect poorly defined concepts as well as idiosyncratic issues of measure selection or data quality (for example, measurement error, missing data). Although the variety and differences among hospital rating systems may be beneficial to patients, providers, and purchasers, these stakeholders would benefit if rating organizations could agree on standards for transparently reporting key features of their ratings. Full transparency in how ratings are constructed is especially important to patients, hospitals, and researchers. Being able to replicate ratings can help hospitals understand where to focus their improvement efforts, researchers can assess the antecedents and consequences of a hospital’s rating, and consumers can better grasp the meaning of “best” or “safest” and make more informed decisions about their care. ▪ March 2015

34:3


H ea lt h A f fai r s

429

Patient Engagement

J. Matthew Austin received funding support from the Leapfrog Group (research on hospital performance measurement; Grant No. 114005). Patrick S. Romano, Sara J. Singer, Timothy Vogus, Robert M. Wachter, and Peter J. Pronovost all voluntarily serve on the Leapfrog Group’s Hospital Safety Score Expert Panel and receive no compensation for their service. Wachter serves as the immediate past chair of

the American Board of Internal Medicine (for which he received a stipend) and a current member of the ABIM Foundation board; he received a stipend and stock options for serving on the board of directors of IPC-The Hospitalist Company; and he serves on the scientific advisory boards for Amino.com, PatientSafe Solutions, CRISI, QPID, and EarlySense (for which he receives stock options). The authors

thank Robert Wild for his assistance with data analysis and Christine Holzmueller for her thoughtful review and editing of this article. They are grateful to Doris Peter from Consumer Reports and Leah Binder from the Leapfrog Group for providing Consumer Reports’ Health Safety Score and the Hospital Safety Score data, respectively, for use in this study.

the Internet]. 2013 Jul 27 [cited 2015 Jan 27]. Available from: http://www .nytimes.com/2013/07/28/sundayreview/the-hype-over-hospitalrankings.html?_r=0 Osborne NH, Nicholas LH, Ghaferi AA, Upchurch GR Jr, Dimick JB. Do popular media and Internet-based hospital quality ratings identify hospitals with better cardiovascular surgery outcomes? J Am Coll Surg. 2010;210(1):87–92. Shahian DM, Wolf RE, Iezzoni LI, Kirle L, Normand SL. Variability in the measurement of hospital-wide mortality rates. N Engl J Med. 2010;363(26):2530–9. Rau J. Hospital ratings are in the eye of the beholder. Kaiser Health News [serial on the Internet]. 2013 Mar 18 [cited 2015 Jan 27]. Available from: http://kaiserhealthnews.org/news/ expanding-number-of-groups-offerhospital-ratings/ HealthGrades. HealthGrades research reports, top hospitals, and methodologies [Internet]. Denver (CO): HealthGrades; 2013 [cited 2015 Jan 27]. Available from: http:// www.healthgrades.com/quality/ archived-reports Leapfrog Group. About the score [Internet]. Washington (DC): Leapfrog Group; 2013 [cited 2015 Jan 27]. Available from: http:// www.hospitalsafetyscore.org/aboutthe-score How safe is your hospital? Our new ratings find that some are riskier than others. Consumer Reports [serial on the Internet]. 2012 Aug [cited 2015 Jan 27]. Available from: http:// www.consumerreports.org/cro/ magazine/2012/08/how-safe-isyour-hospital/index.htm PwC Health Research Institute. Scoring healthcare: navigating customer experience ratings [Internet]. New York (NY): PricewaterhouseCoopers; 2013 [cited 2015 Jan 27]. Available from: http://www.pwc .com/us/en/health-industries/ publications/scoring-patient-health care-experience.jhtml#consumer Schauffler HH, Mordavsky JK. Consumer reports in health care: do they make a difference? Annu Rev Public

Health. 2001;22:69–89. 18 Sofaer S, Crofton C, Goldstein E, Hoy E, Crabb J. What do consumers want to know about the quality of care in hospitals? Health Serv Res. 2005; 40(6 Pt 2):2018–36. 19 To access the Appendix, click on the Appendix link in the box to the right of the article online. 20 Austin JM, D’Andrea G, Birkmeyer JD, Leape LL, Milstein A, Pronovost PJ, et al. Safety in numbers: the development of Leapfrog’s composite patient safety score for US hospitals. J Patient Saf. 2014;10(1):64–71. 21 Olmsted MG, Murphy J, Geisen E, Williams J, Bell D, Pitts A, et al. Methodology: U.S. News & World Report Best Hospitals 2013–14 [Internet]. Research Triangle Park (NC): RTI International; 2013 Aug 28 [cited 2015 Jan 27]. Available from: http://www.usnews.com/ pubfiles/BH_2013_Methodology_ Report_Final_28August2013.pdf 22 HealthGrades. America’s best hospitals 2013: navigating variability in hospital quality [Internet]. Denver (CO): HealthGrades; 2013 [cited 2015 Jan 27]. Available from: http:// hg-article-center.s3-website-us-east1.amazonaws.com/a9/7b/ 2954b09f4822bb81649a1f06a6cf/ healthgrades-americas-besthospitals-report-2013.pdf 23 The twenty-seven hospitals with extreme disagreement broke down as follows: Fourteen Leapfrog high performers were U.S. News low performers, seven Leapfrog high performers were Consumer Reports low performers, two HealthGrades high performers were Leapfrog low performers, one Consumer Reports high performer was a Leapfrog low performer, one U.S. News high performer was a Leapfrog low performer, and two Consumer Reports high performers were U.S. News low performers. 24 Lehrman WG, Elliott MN, Goldstein E, Beckett MK, Klein DJ, Giordano LA. Characteristics of hospitals demonstrating superior performance in patient experience and clinical process measures of care. Med Care Res Rev. 2010;67(1):38–55.

NOTES 1 Sinaiko AD, Eastman D, Rosenthal MB. How report cards on physicians, physician groups, and hospitals can have greater impact on consumer choices. Health Aff (Millwood). 2012;31(3):602–9. 2 Lindenauer PK, Remus D, Roman S, Rothberg MB, Benjamin EM, Ma A, et al. Public reporting and pay for performance in hospital quality improvement. N Engl J Med. 2007; 356(5):486–96. 3 Centers for Medicare and Medicaid Services. Roadmap for implementing value driven healthcare in the traditional Medicare fee-for-service program [Internet]. Baltimore (MD): CMS; [cited 2015 Jan 27]. Available from: http://www.cms .gov/Medicare/Quality-InitiativesPatient-Assessment-Instruments/ QualityInitiativesGenInfo/ Downloads/VBPRoadmap_OEA_ 1-16_508.pdf 4 Chen J, Radford MJ, Wang Y, Marciniak TA, Krumholz HM. Do “America’s Best Hospitals” perform better for acute myocardial infarction? N Engl J Med. 1999;340(4): 286–92. 5 Mulvey GK, Wang Y, Lin Z, Wang OJ, Chen J, Keenan PS, et al. Mortality and readmission for patients with heart failure among U.S. News & World Report’s top heart hospitals. Circ Cardiovasc Qual Outcomes. 2009;2(6):558–65. 6 Wang OJ, Wang Y, Lichtman JH, Bradley EH, Normand SL, Krumholz HM. “America’s Best Hospitals” in the treatment of acute myocardial infarction. Arch Intern Med. 2007; 167(13):1345–51. 7 Halasyamani LK, Davis MM. Conflicting measures of hospital quality: ratings from “Hospital Compare” versus “Best Hospitals.” J Hosp Med. 2007;2(3):128–34. 8 Jha AK. Hospital rankings get serious. An Ounce of Evidence/Health Policy [blog on the Internet]. 2012 Aug 14 [cited 2015 Feb 3]. Available from: http://blogs.sph.harvard.edu/ ashish-jha/hospital-rankings-getserious/ 9 Rosenthal E. The hype over hospital rankings. New York Times [serial on

430

H e a lt h A f fai r s

March 2015

34:3

10

11

12

13

14

15

16

17


Diabetes and chronic kidney disease: concern, confusion, clarity?

Calcium-dependent potassium channels in the heart: clarity and confusion.

Pancreatitis and incretin-based drugs: clarity or confusion?

Cell proliferation and carcinogenesis may share a common basis of permeable plasma membrane clusters.

The measured effect of delay in completing operative performance ratings on clarity and detail of ratings assigned.

Ligand-activated ion channels may share common gating mechanisms with the Shaker potassium channel.

Predictors of perceived functional ability in early-stage dementia: self-ratings, informant ratings and discrepancy scores.

Disproportionate-share hospital payment reductions may threaten the financial stability of safety-net hospitals.

Are There Still Too Few Suicides to Generate Public Outrage?

The relationship between personality inventory scores and self-ratings.

Identifying hospitals that may be at most financial risk from Medicaid disproportionate-share hospital payment cuts.

Activity level: a comparison between actometer scores and observer ratings.

Use of PRECIS ratings in the National Institutes of Health (NIH) Health Care Systems Research Collaboratory.

Aging Trajectories in Different Body Systems Share Common Environmental Etiology: The Healthy Aging Twin Study (HATS).

The use of global ratings in OSCE station scores.

Reliability and validity in binary ratings: areas of common misunderstanding in diagnosis and symptom ratings.

Clarity in the face of confusion: new studies tip the scales on bisphenol A (BPA).

Overlapping meta-analyses of bioresorbable vascular scaffolds versus everolimus-eluting stents: bringing clarity or confusion?

New Classification for Heart Failure with Mildly Reduced Ejection Fraction: Greater clarity or more confusion?

Relationship of DNA-transfer-systems: essential transfer factors of plasmids RP4, Ti and F share common sequences.

Senile confusion: limitations of assessment by the face-hand test, mental status questionnaire, and staff ratings.

Digits and fin rays share common developmental histories.

Protein farnesyltransferase and geranylgeranyltransferase share a common alpha subunit.

Cross-national comparisons of sickness absence systems and statistics: towards common indicators.