It's important to be constructively critical of efforts to rate health care providers. Such public reporting always has room for improvement. U.S. News receives its share of criticism in this regard, and our critics include other journalism organizations.

Last year, for instance, during a session on hospital ratings at a national meeting of health journalists in Denver, considerable time was spent critiquing the U.S. News Best Hospitals methodology and business model. One of the panelists was Marshall Allen, a senior reporter for ProPublica, an investigative journalism group. By all accounts, Allen had little positive to say about what we do to evaluate hospitals or how we do it.

Now flash forward to July of this year, when a ProPublica team led by Allen and Olga Pierce published a Surgeon Scorecard comparing about 17,000 U.S. surgeons' individual complication rates in a variety of procedures. The only uncontroversial thing that can be said about the Surgeon Scorecard is that the health care community found it to be massively controversial.

Given Allen's review of our work – not to mention the potential competitive threat his organization might be seen as posing to U.S. News – we might be expected to join the chorus that has been excoriating ProPublica for the many supposedly fatal flaws in the scorecard's design and execution.

We're not going to do that. What Allen and his colleagues have created is necessary and important. Incomplete, yes. Deeply imperfect, yes. And potentially unfair to some surgeons. But he and his group expressed high-minded objectives and pursued them earnestly and thoughtfully. Now that tempers have had a chance to cool, we're ready to offer our perspective.

What Allen's team created is, to our knowledge, the first purely objective public report card of health care quality at the individual physician level. While the result may be rough, the future should and will feature more polish. The nation needs more report cards like theirs and will get them, one way or another.

For that reason, it makes sense to examine ProPublica's ratings and the controversy around them more closely. What went right? What didn't? How can we use that information to further the discussion about patient safety and the quality of healthcare?

Much of the ire directed at the ratings has focused on ProPublica's reliance on Medicare claims data. On the popular physician website kevinmd.com, surgeon Jeffrey Parks took ProPublica to task. "…the entire analysis of 'surgeon quality' was based entirely on billing records," Parks wrote. "There was no case-specific analysis. No chart review. This may not have been possible given HIPAA and availability of data to journalists, but it is a critical weakness."

To the uninitiated, what ProPublica did may sound negligent and perhaps even crazy. That's totally misguided. Thousands of credible studies based on claims data have been published in peer-reviewed scientific journals over the past several decades. A recent article in Statistics in Medicine called this type of information "the predominant resource in pharmacoepidemiology, health outcomes, and health services research."

It's worth considering why this is so. Parks mentions that health privacy legislation and restrictions on journalistic access to data may have forced ProPublica to use claims data. This is not the case. The reason was much simpler: No central repository of electronic medical charts exists that ProPublica or anyone else can use for this type of study. Claims data represent the most complete source of data on health care treatment and outcomes in the United States.

Other data sources, such as surgical registries (like those maintained by the Society of Thoracic Surgeons) may collect detailed clinical information about patients. The STS registries are well vetted and fairly complete. But many registries include only a fraction of practicing physicians and hospitals. If an organization wishes to provide information on all the providers from which a patient might choose, that's not good enough. Further, registries, and for that matter, chart data of the type so many physicians wish was used by ProPublica, do not always pick up complications, or even deaths, that occur after patients are discharged.

The Medicare dataset accounts for what happens to patients after they leave the hospital. When Medicare members die, even outside the hospital, the plan receives notification. When they seek treatment for complications after surgery, the plan logs a bill, even if treatment occurs weeks later at another hospital in a different state.

Not a gold standard
   
Even when clinical data are collected on large numbers of hospitals for the specific purpose of measuring quality of care, problems can crop up. The National Hospital Safety Network is a case in point.

Operated by the federal government, NHSN was created to collect reliable information on hospital-acquired infections at facilities across the country. Specially designated nurses at most hospitals review patient charts and conduct other reconnaissance to identify patients with surgical-site and other infections.

This turns out to be a good way for a hospital to track its infection rates from year to year but not as good at comparing infection rates between hospitals. Exactly what defines a case of infection is interpreted in different ways by different hospitals. The data suggest that hospitals with more resources dedicated to surveillance find more infections.

Dr. Michael Calderwood, an infectious-disease specialist at Brigham and Women's Hospital in Boston, has authored a series of papers showing that using claims data is superior to NHSN data for accurate comparisons between hospitals.

Claims data do come with important limitations. Billing codes submitted by hospitals are widely acknowledged not to match up with a given patient's clinical picture. Hospitals fill out billing forms because they want to get paid, not because they're trying to create an accurate scientific record. This may result in errors in determining whether a patient received a particular treatment or truly suffered a complication that is indicated on the hospital bill.

Epidemiologists call these measurement errors. They are an unavoidable part of doing science. Researchers have spent quite a bit of time studying measurement error and creating formulas to estimate how different levels affect results.

Reassuringly, measurement error often leads to more conservative results than one would expect from a perfect data set. But when such errors are large, or when they are correlated with other information (if hospitals with bad doctors are more likely to miscode outcomes, for example), they can create misleading results.

ProPublica's scorecard is not entirely off the hook in this regard. The news organization convened an expert panel of clinicians to identify billing codes that signify postoperative complications. While the list presumably is valid, it is not clear that the assembled clinicians considered whether the complications are accurately coded in claims data.

This choice of methodology is perplexing given that plenty of existing metrics (including the ones proposed by Calderwood) for postoperative complications are well-documented in the peer-reviewed literature. None is perfect, but most have detailed information available on accuracy, allowing researchers to determine whether using them would potentially lead to biased results. With ProPublica's outcomes, we're in the dark.

Another potential source of trouble is the doctor identified in the hospital bill as having treated the patient. The doctor named differs from the primary surgeon on the separate physician bill at least 28 percent of the time, according to researchers working for the Medicare program. Depending on the type of procedure, the errors can sometimes be much larger – potentially large enough to create situations in which good doctors are identified as bad and vice versa.

Obviously, this is an area that demands further study.

Critics have also lambasted ProPublica for failing to make adequate risk adjustment to account for differences in patient mix. They argue that some surgeons, especially those at academic medical centers and hospitals in poor neighborhoods, operate on sicker patients than those seen elsewhere, so their mortality and complication rates suffer by comparison.

This is true, and if left unaddressed would be a major source of bias in a study like ProPublica's. But researchers have been fine-tuning ways for decades to adjust for those discrepancies, and those methods are now widely accepted when used in peer-reviewed studies. One of the most common methods – the one used by ProPublica – has been cited more than 2,800 times in the academic literature.

Risk adjustment is subject to measurement error, and missing details in the patient profiles might lead to incomplete adjustment for differences in patient mix between doctors. Evidence available to us, while not conclusive, suggests this is unlikely, however. The U.S. News Best Hospitals for Common Care project used similar risk-adjustment approaches.

We conducted extensive analysis to detect incomplete risk adjustment. We looked for instances in which mortality and readmission for certain types of hospitals with known differences in case mix – transplant centers, academic medical centers, and so on – had mortality and readmission rates that differed from other hospitals. We found nothing to suggest inadequate risk adjustment.

Multiple measures to reduce error
   
One additional point about measurement error bears consideration. Rather than relying on a single measure of quality, like mortality or readmission (or a combined outcome including both), it's possible to use many. Using multiple indicators and applying certain complex statistical machinery can produce accurate results even when the indicators contain measurement error.

The process is analogous to trying to locate someone based on a cell phone signal. If you use one tower, or even two, your estimate of their location would be wrong (i.e., contain a lot of measurement error). But with multiple towers, it's possible to triangulate and arrive at a more accurate calculation.

This is the approach we followed for our common care ratings, and it's one we hope other research organizations will incorporate into their work.

Many of the spears hurled at ProPublica take aim at the scorecard's answer to the question: How bad is too bad? Could a surgeon's high complication rate be explained by chance? Perhaps it was due to a run of especially challenging patients or an unlikely sequence of random events.

There are a few ways to address this sort of question. The one ProPublica used is to report the surgeon's complication rate along with a confidence interval that indicates a range of plausible values. Another would have been to report the point estimate along with an indicator of whether it is statistically significant – that is, unlikely to have been due to chance. The latter is a more traditional approach, while ProPublica's has gained a lot of favor in recent years.

One can argue that publishing an individual physician's high complication rate if it is likely due to chance is unfair. For this reason, we favor separating healthcare providers into groups (low, average, and high performers, for example) based on the likelihood that their performance is different from average. Providers with a wide confidence interval are not statistically distinguishable from average, so they end up in the middle group, and don't have to defend their high (but probably not accurate) complication rate, while the public can be reasonably certain that those in the low and high groups perform as advertised.

Finally, it's important to ask the purpose of ProPublica's report card. Most experts would agree that in the arena of quality improvement, clinical registries are superior to claims-based reporting such as that done by ProPublica and U.S. News.

But public reporting on provider performance can serve at least two other purposes. It can help patients make better-informed choices among providers, and it can drive better accountability in health care. ProPublica's primary objective appears to be the former. The Surgeon Scorecard site says, "Use this database to know more about a surgeon before your operation." Yet the latter is more in keeping with the news organization's investigative focus.

One telling analytical decision ProPublica made was to adjust its results by removing "hospital effect" – the positive or negative influence of a hospital on the results of the surgeons who work there. Removing hospital effect would be entirely appropriate if the sole idea is to identify bad (or good) surgeons to hold them accountable for their outcomes. In other words, it's a sound approach for investigative journalism. It's not ideal, however, if the goal is to help patients make decisions.

A patient choosing among different surgeons needs to consider not only how good each theoretically would be in isolation, but also what risks the surgeon's hospital might bring to the operating table.

We hold this to be uncontroversial. Patients need better information about the quality of doctors and hospitals. That's why U.S. News is involved in public reporting. And why we're glad to see ProPublica getting involved.

With reporting from Denver by Evi Heilbrunn

Clarified on Aug. 26, 2015: This article has been updated to include Olga Pierce of ProPublica, who co-led the development of the Surgeon Scorecard.

Recommended Articles

Flu Got You Down After Recovering?

Stacey Colino | Feb. 28, 2018

What you need to know about post-viral fatigue and depression.

What's the Deal With Raspberry Ketones?

Lisa Esposito | Feb. 28, 2018

It's unclear if these supplements lead to human weight loss.

A Connection Between Sleep and Lung Cancer?

Elaine K. Howley | Feb. 28, 2018

Sleep disorders might elevate risk for lung cancer or worsen prognosis in some patients.

Youth Contact Sports: Worth the Risk?

Vernon Williams, M.D. | Feb. 28, 2018

Because prohibition is not a perfect answer.

How Do I Find the Best Dermatologist?

Elaine K. Howley | Feb. 26, 2018

These skilled specialists do a lot more than look at moles and treat acne.

Hair Loss: There's Hope

Ken L. Williams Jr., D.O. | Feb. 26, 2018

Forty-one percent of men under 35 would rather lose their home or sight in one eye than experience hair loss.

Macular Degeneration: Managing This Vision Condition

Lisa Esposito | Feb. 23, 2018

Early treatment and vision rehabilitation help people stay independent and see as well as possible.

What's Financial Cost of Breast Cancer?

Elaine K. Howley | Feb. 23, 2018

Two reports find that a cancer diagnosis leads to financial problems for the majority of patients.

What Are the Advantages of Health Homes?

David Levine | Feb. 23, 2018

Putting mental and physical health care under one roof increases compliance, lowers costs – and saves lives.

Travel With a First Aid Kit

Neha Vyas, M.D. | Feb. 22, 2018

Here's what to pack.