MOST PHYSICIANS LIKE TO THINK they’re taking good care of their patients while also running efficient businesses. Louisville family practitioner A. O’tayo Lalude is no exception. Two years ago, when Lalude heard about a project designed to develop “best practices” in diabetes care, he volunteered. The program, sponsored by Hoangmai Pham, a physician and senior health researcher at HSC. Pham notes that Medicare influences the entire health system, with virtually every commercial insurer following the government program’s lead. She predicts that if Medicare adopts P4P, choosing not to participate may cease to be an option for most physicians.

As the P4P juggernaut gains momentum, doctors worry about a range of issues. First among those is where quality measures should come from—self-appointed “experts,” employers, the government, physician specialty groups or broader physician organizations. Then there’s the expense and effort of collecting the data, particularly in the solo and small group practices that make up the bulk of medical providers in this country. Should performance information about individual physicians, doctor groups and hospitals be made public, or would such “report cards” only penalize those who take care of older, sicker populations? Should efficiency be part of the P4P equation? And most essentially: Does P4P really accomplish its dual goals of controlling costs and improving the quality of care?

In 1911, Ernest A. Codman, a surgeon at the Massachusetts General Hospital, opened his own private facility, which he called the End Result Hospital. The name reflected Codman’s obsession with knowing the long-range impact of each patient’s care and with learning from mistakes to improve medical quality. During the next five years, Codman tracked 337 patients admitted to his hospital, recording errors in diagnosis and treatment, and following the patients long after discharge to evaluate the ultimate results of their care. In the hospital’s annual reports, Codman tallied mistakes, and he even offered to refund doctors’ professional fees to patients who had an unsatisfactory result. Then he sent the reports to hospitals around the country, encouraging doctors to follow his lead. Now, almost a century after Codman’s earliest efforts, employers, health plans and consumers have taken up his quest, demanding to know what they’re getting for their health-care dollars and forcing the medical profession to get with the program.

At Park Nicollet Health Services in Minneapolis, tracking performance and results has become a daily concern. The group’s 600 doctors are involved with multiple P4P initiatives, including those sponsored by Medicare and BTE, and must report on a whopping 134 patient-care measures related to breast cancer screening, diabetes care, the management of cholesterol, heart attacks, pneumonia, hip and knee replacements and the use of antibiotics, among others.

“It has been a lot of slow, hard work to get the systems in place,” says Nancy Jarvis, director of informatics at Park Nicollet. “Our medical group must look at so many measures, it can be overwhelming.”

Since 2002 the 3,000-plus physicians of Rochester Individual Practice Association (RIPA) in New York have worked with Excellus BlueCross BlueShield to report on a dozen performance measures linked to quality, affordability and patient satisfaction in the care of diabetes, asthma and heart disease. While the group has recorded improvements in every category, Howard Beckman, medical director of RIPA, cautions against reading too much into this early success—and in particular, against using the performance measures to judge whether particular doctors are good at what they do. He especially opposes one of the primary goals of many P4P programs: making doctor report cards public. “I think the pretense of giving people reports from which they’re supposed to figure out who’s a good doctor—well, it’s impossible based on the measures available.” Like Ernest Codman, many physicians routinely measure quality of care as a way to improve it. But today’s P4P experiments are largely being imposed from outside the profession, by insurers, government groups and quality watchdogs that cite as motivation two inescapable facts: Health care costs too much, and the quality of care ranges from outstanding to poor, depending upon who is providing it.


Spending on health care in the United States is expected to rise nearly 8% a year during the next 10 years, reaching $4 trillion by 2015—a staggering 20% of the nation’s projected gross domestic product (GDP)—according to a recent study by economists at CMS. “We’re paying 50% more than any other country per capita and as a percentage of GDP for health care,” says Robert Berenson, a physician and senior fellow at the Urban Institute in Washington, D.C. “Yet there are serious quality problems.” For example, researchers at Dartmouth Medical School in Hanover, N.H., who have been analyzing Medicare data for more than 20 years, recently estimated that as many as a third of Medicare dollars are wasted on unnecessary or inappropriate care. Other analysts put the figure as high as 40%—and the statistics don’t improve when you look beyond Medicare.

Meanwhile, corporate America, burdened by its promise to finance care for legions of past and present workers, is desperate to contain costs. That’s why GM, which underwrites care for 1.1 million current and retired employees and in 2005 paid out $5.3 billion in medical expenses, has jumped on the P4P bandwagon.

GM is involved in P4P initiatives through contracts with more than 100 health plans, a signal that ensuring quality and improving care isn’t optional for insurers that want to work with the company. “We’ve built into our expectations that they show us how they reward providers for performance, and that gets to measurement, public disclosure and so on,” says Bruce Bradley, director of health care and public policy for the automaker.

François de Brantes, formerly of GE and now national coordinator for Bridges to Excellence, says the idea of P4P makes sense to many companies because it mirrors the market-driven approach they use in other parts of their businesses, providing financial incentives to improve quality and lower costs. Increasing numbers of employers, he says, think consumers can be motivated to seek out efficient, better-quality care. But that means finding appropriate ways to reward physicians for providing such care—and that’s problematic, for several reasons.

First, most doctors involved in P4P programs believe that too little money is at stake to justify the extra work and expense. Often, it seems, health plans aren’t putting additional dollars into P4P programs but, rather, cutting everyone’s compensation, then returning some of the lost money to those who meet quality requirements. (Opponents of pay for performance characterize such programs as “no pay for no performance.”) There’s also the question of who gets more—practices that consistently deliver high-quality care or those that show the greatest improvement—and whether to reward medical groups or individual physicians.

One long-term study of seven of the nation’s most advanced P4P programs suggests that to have the desired impact, P4P incentive programs need to account for at least 10% of a physician’s annual income. That’s about how much is at stake for achieving annual quality and efficiency targets at Partners HealthCare (parent of the Massachusetts General Hospital), where more than 5,000 physicians participate in some form of P4P. They’re motivated by around $90 million in annual hospital and physician payments that are withheld pending compliance with a long list of quality measures. “We have a lot of patients and a lot of money at risk,” says Thomas Lee, a cardiologist and CEO of Partners Community Healthcare, the network of physicians associated with Partners HealthCare. “Though the money ostensibly comes in the form of bonuses for good performance, it’s really the opposite—losses if we don’t perform.”

FOR PHYSICIANS IN ONE OF THE NATION’S LARGEST P4P EXPERIMENTS, it appears that fear of notoriety, not money, is the force motivating them to meet the program’s performance goals. Now up and running for five years, the Integrated Healthcare Association (IHA) program brings together six California health plans involving 215 medical groups and 35,000 physicians who, combined, provide care for 8 million enrollees. The $60 million in bonuses the plans have distributed to participating medical groups represents only about 1.5% of total compensation from those health plans. But each medical group also receives a consolidated scorecard, accessible on the Web, that bestows as many as four stars based on performance in three areas: clinical measures, patient satisfaction and use of information technology. “No one wants to see the name of his or her medical group up there with only one star,” says Stephen M. Shortell, dean of the School of Public Health at the University of California at Berkeley, who helped design the project.

Early measurements in the IHA pilot have shown impressive gains in such categories as childhood immunizations, cervical and breast cancer screenings, cholesterol screenings and patient satisfaction. And some IHA health plans have seen reduced hospitalizations, especially in patients with diabetes. Yet James Naughton, an internist in San Francisco’s East Bay with Alliance Medical Group, which has tracked several dozen P4P measures as an IHA participant, worries about hidden costs and data errors that can unfairly stigmatize physicians. He cites the example of a patient who has a family history of diabetes but who doesn’t have that condition himself. “If the tracking system wrongly identifies him as diabetic, it will appear as if he’s missing many required tests,” says Naughton. “You get the code wrong, and you look like a terrible doctor.

“Philosophically, I have no problems with P4P,” he says. “But the cost of participating has been grossly underestimated.” The bonus money the 15 physicians at Alliance have received hardly covers the expense of participation, according to Naughton, despite the group’s sophisticated information technology system. “The time I spend in front of my computer eyeballing numbers, the staff time it takes to make follow-up calls, that’s all pure overhead,” he says.

For physicians to score well on IHA’s performance measures, patients must be up to date on recommended tests and screenings, and making sure that happens is also time-consuming, says Naughton. Moreover, he finds there’s only so much physicians can do to prompt patient compliance. “If patients refuse to follow recommendations, how do you account for that?” he asks. “Our argument is that once we’ve informed them what they need to do, the quality event has occurred, even if they choose not to follow our advice. Either way, a patient’s noncompliance isn’t counted against us.”


WORKING OUT P4P ISSUES HAS TAKEN ON NEW URGENCY since the AMA struck a deal last December with key congressional leaders. In the agreement, the AMA promised to develop approximately 140 physician performance measures covering 34 clinical areas by the end of 2006, and to ask physicians who agree to participate to begin tracking performance as early as 2007.

The deal, which the AMA negotiated on its own, didn’t sit well with many physicians. Medical specialty groups, many of which are gaining members at the AMA’s expense, worried that the broad measures the AMA agreed to create would not reflect the realities of specialty practices. What’s more, officials representing 10 national medical societies and 200,000 physicians complained in a letter to congressional leaders that the AMA hadn’t consulted with them before signing the agreement.

Stuart Weinstein, an Iowa City pediatric orthopedic surgeon and former president of the 28,000-member American Academy of Orthopaedic Surgeons, thinks that the timetable endorsed by the AMA and congressional leaders is unrealistic. “Performance measures need to be developed by specialty societies, then tested and validated to confirm they really affect patient care in a positive way,” he says. “We don’t think these measures can be simply mandated.”

What’s more, says Weinstein, developing a good measure can be difficult, in part because there’s not always consensus. Medicare’s Physician Voluntary Reporting Program, for example, includes measures for preventing blood clots in surgical patients. But, says Weinstein: “Not everyone agrees what’s best.”

Many physician organizations, while apparently accepting the inevitability of complying with quality measures, nevertheless have been hoping for a relatively slow, deliberate transition involving government-sponsored pilot programs to see what works and what doesn’t. In August 2005, a few months before the AMA deal, the American College of Physicians (ACP) and 70 other national medical societies sent Congress a plan for a five-year phase-in of P4P for Medicare.

Dana Safran, director of the Health Institute at Tufts–New England Medical Center in Boston, who has spent the past 15 years researching how to measure the quality of patient experiences, also advises caution as the major players continue to experiment. “P4P runs the risk of pushing the measures beyond what they can do accurately, making unjustified inferences about physician or hospital performance,” Safran says. “You have to use these tools fairly and with precision, because of the risks involved in getting it wrong.”

There’s also the chance, says the Urban Institute’s Berenson, that the impact of P4P could be much less profound than many expect. Most patients of doctors in the pilot programs, Berenson says, are unlikely to detect much of a difference beyond, perhaps, getting more frequent calls to schedule follow-up appointments. “If P4P is done right, there might be small improvements in care, but it’s not going to solve all our problems—particularly not the spiraling costs,” he says. Yet for now, as baby boomers move into retirement and put the health-care system to a monumental test, all bets are on P4P. Says CMS’s Wilson: “The momentum is unprecedented in terms of people talking to each other, collaborating and building consensus to make this work.”