IN THE SPRING OF 1999, a financial analyst named Jason Maude took his three-year-old daughter Isabel to the hospital. She had chicken pox, and was listless and feverish; doctors assured him that was typical of her condition. Diarrhea and vomiting followed. Not to worry, a doctor said: chicken pox. By the following morning, the skin on her lower belly had swelled and turned blue, with small blisters. At the emergency room, the pediatrician told the family Isabel was just dehydrated but might need to stay overnight.

Only when Isabel collapsed in delirium and was taken to a pediatric ICU did doctors recognize the emergency for what it was—necrotizing fasciitis, in which an opportunistic bacterial infection eats away flesh. This was coupled with toxic shock syndrome caused by bacterial toxins. Isabel’s body went into multisystem failure, and she was put on life support. The doctors were able to quickly excise the infected skin, leaving her with extensive scars. It was nearly two months before she left the hospital; she is just now close to a full recovery.

Maude probably could have sued, but he didn’t. Isabel’s doctors, he reasoned, had made an honest mistake. Necrotizing fasciitis and toxic shock syndrome were uncommon complications of chicken pox. For clinicians who hadn’t come across those problems before, they would be easy to miss.

So rather than blaming Isabel’s doctors, Maude figured they needed help. Could a computer program aid them in reviewing a patient’s symptoms and suggest possibilities they might not have considered? With the help of programmers, and clinicians from Isabel’s PICU, Maude and his wife, Charlotte, created software to do just that, and named it after their daughter. If you type in a set of clinical features about a patient’s condition, the Isabel program lists the possible diagnoses for some 6,000 diseases and disorders, from most to least probable.

Now used in hospitals across the United States and around the world, Isabel is part of the fast-growing field of medical decision-support tools based on artificial intelligence, or AI. With capabilities ranging from the simple to the fabulously complex, many experts—not just Silicon Valley disruption mavens but also leading clinicians who find themselves confronting too much data with too little time—consider artificial intelligence to be one of medicine’s most exciting frontiers. They envision caregivers and patients aided by a web of sophisticated, near-instantaneous analyses, with intelligent computers generating diagnoses and treatment plans that supplement physicians’ overworked and fallible capacities. The next great breakthrough won’t come from new drugs, these AI proponents say, but from data.

Machines won’t replace doctors, but will make them hugely more effective. “To maximize the potential of both, let humans do what they do well, and let machines do what they do well,” says Casey Bennett, an artificial intelligence researcher at the Centerstone Research Institute in Tennessee, which helps patients with behavioral health disorders. Centerstone recently used the electronic records of nearly 6,000 patients to build an artificially intelligent model of clinical decision-making. “It’s not about robot doctors,” says Bennett. “It’s about empowering people.”

Technology, however, is always intertwined with the people who use it, and most of that happens in a messy world outside computer-lab cradles. You might think of medical AIs as adolescents, brilliant but impressionable. They have the potential to improve health care dramatically—but also, perhaps, to reflect human flaws.

TO DOCTORS WITH LONG MEMORIES or a historical bent, talk of artificial intelligence won’t be new. In 1967, British physician G.H. Hall wrote in The Lancet of an “approaching computer revolution” in diagnosis. By the mid-1970s, a program called INTERNIST-1, developed by researchers at the University of Pittsburgh, was making expert-level judgments for internal medicine, and Stanford University’s MYCIN program could recommend personalized antibiotic treatments. Yet by around 2000, as computers and networking became ubiquitous, the field seemed to have stalled. Those early programs failed to make a real-world dent, and medical AI was considered a disappointment. “The expectation was that AI would change everything,” says Martin Kohn, chief medical scientist at a predictive analytics company called Sentrian and a developer of health care applications for Watson, IBM’s celebrated artificial intelligence.

According to Kohn and others, early AIs were constricted by the computing power and approaches of their era. INTERNIST, for example, essentially contained the expertise of a single physician, Jack Myers, whose answers to hundreds of questions were encoded as a straightforward series of rules. It was necessarily inflexible and limited.

Today’s diagnostic programs, such as Isabel, also draw on expertise—but much, much more of it. At the core of both Isabel and DXplain, a diagnostic-support tool developed at Massachusetts General Hospital, are regularly updated databases containing information about thousands of diseases, each with clinical findings drawn from  medical textbooks and records, scientific journals and expert opinion.

The great value of such algorithmic diagnoses may be less in cracking elusive cases than simply nudging doctors to think again, says Mark Graber, head of the Society to Improve Diagnosis in Medicine and former Veterans Administration hospital chief. While it’s tempting to romanticize the art of diagnosis, the prosaic reality is that most mistakes are simple cognitive glitches. Doctors are prone to “premature certainty,” a tendency to base decisions on quick reads without fully considering reasonable alternatives. Isabel and DXplain short-circuit that certainty by listing many other possibilities.

Of course, such programs work best when there is solid, straightforward data on which to base models. Diagnosis, drawing on lists of symptoms linked to known conditions, is particularly well suited to that requirement. But diagnosis is just one aspect of care—and data can be nebulous, as researchers at Boston Children’s Hospital learned after tracking 10 pediatric cardiologists for more than a week, cataloguing nearly 1,200 clinically significant decisions they made during that time. It turned out that barely one in five of those choices was grounded in published evidence. Mostly the doctors worked from their own experience, training and common sense.

It might seem troubling that they depended on what amounted to clinical hunches. Yet consider who these physicians were. Members of a world-renowned cardiology department, the doctors tracked in the study had a combined 185 years of faculty experience and more than 1,100 publications in peer- reviewed journals. Their decisions may not have been overtly evidence based, but then again, these were among the doctors whose studies and observations become the evidence in their field. AI stands to learn a lot from that kind of decision-making.

To create a more systematic approach to diagnosis and treatment, Children’s Hospital created a series of standardized clinical assessment and management plans, or SCAMPs—computerized models of diseases and conditions built by researchers who analyze thousands of clinical anecdotes, published studies and hospital records. From that mass of data come algorithms embedded into software that, when prompted with patient data, return not just a diagnostic snapshot of the patient with regard to a specific condition but also a course of treatment. The models are reviewed regularly, with the outcomes of therapies incorporated into updated versions.

Every department at Children’s now runs at least one SCAMP, according to James Lock, chief of the cardiology department. There are now dozens of these decision-support modules, for everything from knee replacement to headache management, with hospitals in nine states having used them to treat more than 12,000 patients by late 2014. Lock thinks of SCAMPs as flexible, ever-evolving compilations of sound clinical practices.


SCAMPS DON’T TRY TO ENCOMPASS the whole of medical knowledge, which is added to every week by thousands of new journal articles. Instead they rely on human experts to curate the data that the programs will factor into their recommendations. That’s important, says Lock, because the methodologies of studies feeding into the medical canon are often questionable, and clinical findings frequently go unreplicated, or are overturned or prove more complicated than expected. Far more information is produced than is actually useful.

But sifting through everything that’s out there, analyzing so-called big data, is a key selling point of another major player in  computer-aided medicine: IBM’s Watson, the artificially intelligent Jeopardy! champion. To programmers, Watson’s obliteration of the all-time best human players didn’t demonstrate Watson’s knowledge, but rather the program’s ability to understand everyday language, draw insight from vast bodies of text data, and present evidence-backed  hypotheses. Shortly after that victory in 2011, IBM announced that Watson was available for medical applications, and versions of the technology have since been deployed at the Cleveland Clinic, Mayo Clinic, Memorial Sloan Kettering Cancer Center and health care giant WellPoint, along with a host of smaller institutions and companies.

Researchers are training those Watsons to analyze data for specialized purposes: cancer care, education, insurance claim review. There’s still a role for experts, in teaching the system through continuous interaction and providing the information that is fed into Watson—answers to thousands of yes-or-no questions that are used to calibrate algorithms. Eventually, though, human-directed instruction should be supplanted by machine learning—the ability of an AI to analyze and understand its own mistakes—and the hope is that medical Watsons will transcend their trainers, generating insights beyond what the humans could have come up with on their own.

One big-data possibility involves what’s known as patient similarity analytics. Presented with a patient, an AI might query and analyze everything from electronic health documents of other patients with similar conditions to journal articles, clinical trial results and research databases. “It lets us say, ‘We’ve seen more than 200,000 patients with psoriasis, and we want to know, for a 45-year-old female, what have all the physicians in our system used to treat that kind of patient?’” says Dan Cane, co-founder of Modernizing Medicine, a company that provides information analytics to increase healthcare efficiency and improve results and is now integrating Watson into its systems.

The horizons expand even further when considering genomics and the vast volumes of genetic information generated by patients and in research. Watson might connect a tumor’s genetic signatures with, say, biochemical information on proteins, cellular pathways and overall physiology. IBM’s collaboration with the New York Genome Center is taking that approach with a small group of patients who have glioblastoma, a malignant brain cancer, to try to identify drug targets based on mutations in tumor cells.

The prodigious amounts of information involved in such tasks expand still further when one factors in test results and clinical observations. Add to the mix data that comes from home records and personal monitors, ranging from smartphone-based heartbeat and glucose readers to video systems that detect signs of impending disease. Soon the data become so big that computer synthesis is the only way to approach it.

Although excited about the future of such undertakings, Enrico Coiera, a medical informatics researcher at Australia’s Macquarie University, notes that researchers have yet to find the right mix of statistical methods to make sense of all that information, much less to test its clinical effectiveness. And there likely won’t be just one correct method. The future will probably contain many AIs, big and small, built for specific purposes and even specific institutions.

SOME OF THE RHETORIC ABOUT medical AI has verged on techno-topian, imagining a future in which the human intelligence of physicians is no longer crucial. Within 10 to 15 years, wrote venture capitalist and Sun Microsystems founder Vinod Khosla in 2012, people should be able to ask “Siri’s great-great grandchild…for an opinion far more accurate than the one I get today from the average physician.” His article’s title: “Do We Need Doctors or Algorithms?”

But that’s a minority view. “The machine is not always going to give you an answer,” says Coiera. “In many cases its best effort will be an overview of a case. You will still have to interpret that your own way,” and the best doctors might be those who improve on what AIs recommend. Clinicians using SCAMPs, for example, are free to deviate from prescribed plans. They simply have to explain why they choose a different course. Lock says those human decisions prove right more than 70% of the time—and then data about those choices, too, are incorporated into future iterations of the software. “Deviations almost always occur because there are individual patient characteristics that the SCAMP can’t possibly anticipate,” says Lock. “The heroes are the people who deviate and are right. There’s a premium placed on people who think about the patient.”

Looking at the history of technology, David Louis, chief of pathology at Massachusetts General Hospital, observes that previous  advances in automation have hardly  replaced doctors but rather have freed them to focus  on other tasks. A physician helped  by decision-support tools will have time to collect still more information—through the very human capability of talking with patients, which is likely to generate a great deal of crucial additional data. “A good doctor,” says Jesse Hoey, a University of Waterloo computer scientist who develops AI-enabled assistance systems for people with cognitive and physical disabilities, “is one who asks the right questions in the right way.”

Compared with the crunching of big data, conversation might seem mundane. Yet coaxing a patient to admit that she hasn’t been taking her medicine, or reconstructing exactly what someone was doing right before fainting, can be essential. To take an especially sobering example, no amount of AI could compensate for failing to learn that Thomas Eric Duncan, the first American to die from Ebola, had recently traveled to West Africa.

Yet a nurse did ask Duncan about his travel, and one way to improve medical AIs will be to expand the role of non-physician clinicians, both to factor in their observations and to help them perform some of what physicians do now.

REGARDLESS OF HOW SOPHISTICATED and accomplished decision-support tools may become, there’s no guarantee that doctors, nurses and others will use them. “Twenty years ago, neural networks”—AIs patterned after the workings of biological brains —“were developed that, in some studies, outperformed doctors at ruling out cardiac causes of chest pain,” says Pavel Roshanov, a Ph.D. candidate in clinical epidemiology and biostatistics at Canada’s McMaster University. “But the technology has not been widely accepted. Accuracy is necessary but clearly not sufficient.”

Often, doctors don’t think they need help, says Roshanov. And medical AIs have also been hindered because they’re hard to use or inaccessible, typically running on dedicated computers that clinicians must seek out.

Adding extra administrative steps to the process of treating a patient can backfire, says Coiera, who specializes in understanding how medical professionals use electronic information systems. Decision-support tools often represent extra steps, he says, and in a “resource-constrained, time-constrained, complex domain,” most clinicians won’t take them. To be both useful and used, medical AIs will need to be integrated seamlessly into  existing systems.

Another issue is “alert fatigue.” Current e-prescription and treatment management programs already may overwhelm caregivers with notifications and prompts, some of which will inevitably go unnoticed. They also create distractions, interrupting  conversations or disrupting clinicians who are trying to focus on another task. Even as AIs prevent some mistakes, they might also cause others.

Software interfaces can also be difficult to use, and physicians’ experiences with electronic health records, which were supposed  to put a wealth of valuable data at their  fingertips, have frequently been disappointing. Entering information is often much more cumbersome than making a note on a paper chart, and often involves arcane codes or drop-down menus designed mostly for billing purposes. It’s much too easy to use the wrong code, put numbers in the wrong field, or cut and paste the wrong figures.

“Despite the huge volume of data that is now routinely collected in health care, much of it remains incomplete or inaccurate in critical ways,” writes Peter Szolovits, head of the Massachusetts Institute of Technology’s clinical decision-making group, in the journal Artificial Intelligence in Medicine. “Notes of patient encounters sometimes misrecord even basic facts such as the chief complaint, but often get wrong details such as the patient’s medical history or medications being taken.”

Szolovits made these observations in 2009, but not much has changed—and poor electronic record systems don’t just cause errors, but can affect crucial clinician-patient interactions. Almost everyone knows what it’s like to talk with doctors who are primarily focused on their computers. “I would defy anyone to find a modern electronic medical record that doesn’t interrupt helpful interactions between doctor and patient,” says Coiera.

If the data a physician records are wrong, they won’t accomplish much when fed into an AI. And even when information is accurate and rich, it’s often not easy to share. Doctors and AI designers are regularly frustrated by the difficulty of exchanging data between records designed by different companies or used at different institutions. “The records at a Veterans Administration hospital are great,” says Graber, “but if a patient of mine is seeing a doctor a mile away, I can’t see that doctor’s EMR, and that doctor can’t see mine.”

Such problems cast a cloud on the seemingly bright horizon of medical AIs. It’s within these systems that medical AIs will mature, realizing—or failing to realize—their potential. As the old programming adage goes:  garbage in, garbage out. “No amount of analytic advancement can overcome fundamental limitations of the data,” says Roshanov.  Artificial intelligences will only be as smart as we allow them to be.