Digital Gold

Illustrations by BARTHOLOMEW COOKE

Using natural language processing and other advanced search tools, bioinformatics experts are mining patient files—and striking paydirt.

Scroll Down

CLINICAL RESEARCH
Search results for “CLINICAL RESEARCH”
Close

Where Psychedelic Research Goes Next
A new generation of research looks at psilocybin, LSD and other mind-altering drugs. Both the challenges and the therapeutic promise are enormous.

One Man’s Poison
New genetic sequencing technologies have opened the door to animal venom therapies.

The Tender Years
Adversity in early childhood too often equals poor health later. Now researchers are looking for ways to change the equation.
See More
DATA
Search results for “DATA”
Close

The People Who Never Were
Machine learning can create fake medical images and histories that look real. They may transform research.

The Gift of Data
Data altruism sounds like a good idea. But can the strategy effectively be put into practice?

What Blockchain Could Do
Beneath the hype is a technology that could solve many logistical problems that plague medicine.
See More
EMR
Search results for “EMR”
Close

The Shape of Things
Design choices pervade the health care system, and pediatrician Joyce Lee wants to make them smarter.

In Passing
How does a hospital keep patient details from getting lost in the shuffle?

Under Lock and Key?
Genetic databases have helped medicine make great leaps forward. But is it really possible to keep the identities behind those genes a secret?
See More
ER
Search results for “ER”
Close

Reunion
A son finds that his father’s alcoholism complicates efforts at elder care during the pandemic.

Why Doctors Must Solve the Suicide Problem
As despair deaths reach historic levels in the United States, interventions at health care checkpoints may be the best way to bring them down.

The Minutes After
It’s a matter of time before the next bombing, shooting or violent attack. How can emergency physicians save more lives?
See More
GENETICS
Search results for “GENETICS”
Close

An Errant Gene
A rare disease, now treatable thanks to a valiant international effort, offers a window into the metabolism of fat.

One Man’s Poison
New genetic sequencing technologies have opened the door to animal venom therapies.

From Genome to Pangenome
The reference genome is bracing for its next leap forward. Geneticist Ting Wang wants the process to embrace both science and social equity.
See More
PUBLIC HEALTH
Search results for “PUBLIC HEALTH”
Close

The 100-Day Vaccine
COVID vaccines happened in record time. Could the next be even more rapid?

One for All
A global pandemic treaty—a health plan on par with nuclear
and climate deals—is now in the works. Does it stand a chance?

Drawing the Line
Shirlene Obuobi—cartoonist and physician—shows how dark days sometimes call for a light touch.
See More

Published On July 23, 2009

THE TRIAGE STATION IN A HOSPITAL EMERGENCY ROOM is the last place any ill or injured patient wants to linger. It’s frustrating to be waylaid on the way to urgent treatment so that a nurse can ask about symptoms and biographical information and type the answers into a computer. But in many Massachusetts hospitals, at least, the value of that annoying process goes beyond determining which patients will be seen first. It also establishes an electronic health file that ER physicians will supplement with notes and test results. Sometime that day, each patient’s anonymized complaint, diagnosis and selected demographic information—gender, age, zip code—are transmitted into a statewide database, the Automated Epidemiological Geotemporal Integrated Surveillance system, or AEGIS.

What happens next is something known as data mining—searching caches of information for hidden patterns. For example, using a technique called natural language processing, AEGIS scans nurses’ triage notes for such key words as cough and fever, and the computer automatically categorizes chief complaints according to several broad medical conditions. Vomiting or abdominal pain is flagged as gastrointestinal illness, while other symptoms may be placed with respiratory or neurological problems. The AEGIS software then conducts a trend analysis of the data, comparing it with information about hundreds of thousands of past ER visits across the state. Against that backdrop, if there’s an unusual pattern, it should quickly become obvious.

“We’ve made computer models of all the historical data so we can predict how many people should be coming into every emergency department on a particular day, give or take 7%,” says Kenneth Mandl, an attending emergency medicine physician at Children’s Hospital Boston and one of the researchers who developed AEGIS. “If there’s a sudden surge of traffic, we see it, and we know something unusual is happening.” If a problem is detected, public health officials and hospital administrators are alerted by e-mail, voice mail or text message. The warning gives hospitals a chance to increase staffing, stock up on appropriate supplies and free up beds.

The state deployed the system six years ago to help officials watch for disease outbreaks and bioterror attacks. “We’re able to mine data that the health system produces and create a whole new use for it,” Mandl says. “Instead of the information just sitting there in a computer, we’ve found ways to make it informative in a much larger way about the population’s health.”

One dividend of this approach came from a study by Mandl and John Brownstein, an epidemiologist at Children’s, that used AEGIS-generated data to analyze emergency department patterns in six hospitals from 2000 through 2004. They discovered that a spike in respiratory illness among preschoolers typically preceded by four to five weeks a rise in influenza-related deaths among the elderly. This suggested that young children helped spread the flu, which kills some 36,000 Americans annually and leads to about 200,000 hospitalizations. Spurred in part by this finding, the Centers for Disease Control and Prevention now recommends that preschoolers receive flu shots.

Unfortunately, most data that could reveal such patterns remains locked away. Although many emergency rooms around the country record information electronically, it’s rare for those systems to mesh even with other electronic networks within the same hospital, let alone with similar systems in other institutions or state health departments. Even though it has been almost 20 years since the Institute of Medicine identified electronic records as an essential health care technology, few hospitals have aggregated all their patient information electronically. In fact, in a recent survey just 2% of more than 3,000 U.S. hospitals said they’ve completed the switch from paper medical charts to electronic systems.

The Obama Administration, whose $787 billion stimulus package includes more than $19 billion to speed the transition to electronic health records, expects a big return on that investment, estimating that improved information systems could save the health care system as much as $80 billion a year and provide a range of other benefits. Electronic records provide a patient’s primary care physician as well as specialists and other hospital personnel easy access to information, helping to flag possible drug and allergy interactions, facilitate accurate claims processing and monitor quality-of-care criteria—making sure, for example, that follow-up visits are scheduled and a patient receives all recommended care.

But health care’s belated push to join the information age could have beneficial consequences that go well beyond those normally associated with computerized record-keeping. The results of pilot projects in places where electronic records are the norm hint at what could be learned through widespread mining of medical data. As records become more sophisticated—adding information from genetic testing, advanced imaging technology and other sources—they could aid in population studies measuring predispositions to disease. Scanning records to see which patients are taking which prescription drugs might also help identify medications that, though they’ve passed muster in the relatively small-scale trials required for Food and Drug Administration approval, turn out to have harmful effects when prescribed to millions of patients. But achieving such benefits depends not only on digitizing records across the country but also on getting hospitals and physicians to agree about what should be in a patient’s electronic file and deciding how to use this data without infringing on patient privacy. Only then will a database of information gleaned from doctor-patient encounters present itself as a rich source of medical innovation.

IN TODAY’S INFORMATION-INTENSIVE HEALTH CARE ENVIRONMENT, a doctor-patient visit generates an expanding trove of data. A physician orders blood work before an annual physical, takes notes about family history, writes prescriptions and refers the patient to a specialist. The specialist orders a CT scan. There’s also information about billing and insurance coverage. When and if those terabytes of information begin flowing into a seamless database, many secondary uses become possible.

Consider Vioxx, a prescription medication for arthritis pain that was pulled from the market in 2004 in the largest drug recall in history. The study that led to the Vioxx ban involved mining electronic clinical data, with researchers examining the records of 1.4 million patients in a database of the Kaiser Permanente organization and ultimately linking Vioxx to more than 27,000 heart attacks or sudden cardiac deaths nationwide from the time it came on the market in 1999 through 2003. But the study was done only after anecdotal reports of problems, and the recall took place five years after FDA approval of the drug, which by then had been prescribed to millions of patients. If there had been a system to look for patterns of adverse effects as soon as the drug hit the market, the problems with Vioxx might have been discovered much earlier.

That’s what a research team led by Mandl and also involving Brownstein and Isaac S. Kohane, a professor at Harvard Medical School and director of the Children’s Hospital Boston Informatics Program, determined in a retrospective study. To see whether they could have found a correlation between Vioxx and heart attacks sooner—in time to save many lives—Mandl’s group searched the electronic health records of tens of thousands of patients seen during the previous 10 years at Brigham and Women’s and Massachusetts General hospitals. Both are part of Partners HealthCare, which has an electronic database containing information from more than 3 million patients who have been seen at those hospitals. The researchers first searched the full database, then focused on patients with coronary heart disease.

The data enabled the scientists to identify medications patients were taking and medical conditions that have influenced their health. The researchers looked at demographic information, billing codes, visit dates, medication histories and diagnostic data, and used natural language processing and other search tools to piece together a picture of who was taking Vioxx, whether they have suffered heart attacks, and what factors besides Vioxx might have been the cause. This wasn’t easy, Kohane recalls. “One problem,” he says, “was that patients wouldn’t always get care in one place, so we had to take into account treatments they may have received outside of Partners that could also affect their risk of heart attack.”

But once they had included such issues, Mandl and his team found a nearly 20% jump in heart attacks just eight months after Vioxx came on the market, an effect that vanished within a month of the drug’s being pulled. “The trends were amazingly clear,” says Mandl. “The problem was that no one in the health care system was monitoring that information.”

That’s a persistent issue for the FDA; its approval process typically tests drugs for safety and efficacy in only several thousand patients. And though it would be prohibitively difficult and expensive to try out a medicine on millions of patients, “drugs behave differently in large populations, and it’s often only with widespread exposure that comparatively rare adverse reactions surface,” says Kohane.

Another study, using similar methods, is considering Avandia (rosiglitazone), a drug still FDA-approved for treating type 2 diabetes despite concerns about possible increased heart ?attack risk and other dangerous side effects. Allison Goldfine, head of clinical research at the Joslin Diabetes Center in Boston, is working with a research team to extract information from Partners patient files to find out whether people who take Avandia are at higher risk for heart attacks than those receiving other classes of diabetes drugs. Cardiovascular outcome trials aren’t required as part of the approval process for diabetes treatments, though that may change, Goldfine says. “To do those trials would be expensive and might take years, and they would have to involve high-risk patients.” But the current system, which calls for trials only after problems arise, could harm large numbers of patients. Developing the capability to search patient databases could provide a better, cheaper approach that might uncover risks soon after a drug is approved.

A VAST AMOUNT OF MEDICAL RESEARCH FOCUSEs on finding genetic factors that may predispose someone to a particular disease or determine how well that patient will respond to a specific treatment. But that’s proving to be an incredibly complex task, in part because there are so many factors that could affect each person. The ability to search a database that has years of detailed clinical information about patients—health problems, lifestyles, family histories, medications and outcomes of therapies—and to correlate that with their genetic profiles could not only provide grist for retrospective studies but also help identify candidates for controlled trials.

Scientists have pinpointed variations in 40 human genes that may increase the risk of asthma, and Scott Weiss, a physician and research scientist at Brigham and Women’s, has designed a study that will look for correlations between those gene variants (and others with possible links to the condition) and why some asthma patients don’t respond well to the usual treatments and suffer repeated attacks. But recruiting suitable subjects for such a project tends to be a long, frustrating process. Weiss’s tactic to find volunteers has traditionally involved sending surveys by mail to randomly selected households. His research team then has to call everyone who responds, then follow up with personal interviews and a review of each prospective subject’s medical history. That’s very costly, and it usually provides only a small pool of subjects.

Recently Weiss and his team tried a different approach. They used hospital billing codes, among other criteria, to do a search of the Partners database, which gave them a preliminary list of 90,000 patients with asthma. Then, using such demographic variables as age and race, they narrowed that down to 40,000. Natural language processing also enabled the system to pick up comments in physicians’ notes that may characterize smokers, a subset of asthma patients Weiss wanted to exclude. He then waited for the first 5,000 of those patients, their identities blinded, to come to the hospital for routine blood tests. His team received samples from those tests for DNA analysis.

Though waiting for those patients took a year, finding so many eligible patients for the study would have been impossible if the only recourse had been to search tens of thousands of records by hand. “Being able to screen 90,000 people is something we never would have been able to do without ?sophisticated informatics,” Weiss says. “This represents a huge jump in efficiency and speed.”

Now that he has his 5,000 blood samples, which Partners stores, he has ordered genetic scans that will enable the researchers to focus on the 40 single nucleotide polymorphisms, or SNPs, that have been linked to asthma, plus other possibly related genes, while also considering each patient’s symptoms, medications and other variables. Then they’ll compare the ?genetic profiles and clinical findings with those gleaned from a group of nonasthmatics to determine whether someone with a particular genetic makeup may, for example, tend to respond poorly to steroids, a common asthma treatment. “We might be able to develop a predictive genetics test that could tell us who is likely to have repeated hospitalizations and ER visits or is likely not to respond to an inhaled steroid,” Weiss says. “Then we could more carefully target our treatment.”

Launching a trial this way will prove much cheaper once the kinks are worked out. Using traditional recruiting methods for a project such as Weiss’s asthma study could cost as much as $800 a patient—or $4 million for a 5,000-patient study. But Kohane at Children’s Hospital thinks effective data mining, coupled with the declining costs of genetic testing, should be able to reduce that price tag by half. “We’re trying to take advantage of already very expensive health care encounters to generate findings without multimillion-dollar price tags,” he says.

A related project at Vanderbilt University and four other academic health care organizations will use data-mining techniques to analyze genetic information from about 20,000 patients. Scanning a sample from each patient for variations in 1 million base pairs of DNA will generate some 2 billion “data points,” says Daniel Masys, a physician and a professor of medicine and biomedical informatics at Vanderbilt. During the next three years, the researchers will look for correlations between this mass of genetic data and clinical information about those patients. Researchers hope to identify genetic roots for vascular disease, asthma and diabetes, among other conditions. “We already know some gene variants for those conditions, but we will discover new ones,” Masys says. “Until now no one has had a large enough population sample to do that.”

Masys thinks the data in electronic records could become a cornerstone of more personalized care. “In the future your physician will listen to your story and get blood tests and X-rays, but in addition we’ll have your DNA to know whether you have a variation that predicts a higher risk for a particular disease or that you’ll have a bad reaction to a medication because of how you might metabolize it,” he says. “That additional data is uniquely you, and looking at those minute differences to diagnose disease risk and choose treatments and therapies will be part of enhanced personalized care.”

CONSIDERABLE TIME AND MONEY ARE GOING TOWARD exploring the possibilities of mining the data in patient records. The work in the Boston hospitals is a $20 million effort called Informatics for Integrating Biology and Bedside, one of seven research hubs being funded by the National Institutes of Health’s Center for Bioinformatics and Computational Biology. But these projects face complex financial, technical and ethical challenges. Even with the help of federal stimulus dollars, the transition to electronic records is likely to be slow, and the systems that exist now are far from standardized and don’t always provide the kind of rich data that researchers crave.

For example, researchers on the Cancer Biomedical Informatics Grid, a Web-based platform connecting dozens of nationally designated cancer centers, would like to see an “oncology specific” electronic health record. “Oncology has unique information needs, such as the staging criteria used to gauge how widely the cancer has spread,” says Brenda Duggan, a program manager at the National Cancer Institute Center for Biomedical Informatics and Information Technology. “There are also terms to describe how patients are responding to chemotherapy or radiation.” Today’s electronic records don’t adequately incorporate those special characteristics, Duggan adds. Other specialties have similarly specific elements that ought to be included in records if they’re to reflect the full complexity of a patient’s condition—and support sophisticated mining of that information for research purposes.

Protecting the privacy of patients while gaining access to large pools of information is another crucial issue. One solution may be to move toward personally controlled health records that give patients Web access to their medical information as well as the opportunity to authorize researchers to use information about lab tests, genetic profiles and clinical notes.

The American Medical Informatics Association, a professional organization based in Maryland, is wrestling with these concerns, all of which fall under the mantle of the secondary use of health data. “Right now everyone is doing this institution by institution,” says Don Detmer, a physician and the group’s president. “We need to address the issues with a public policy that informs people of their opportunities to make basic choices about whether they want their personal health data used for legitimate medical research.” And in fact, the American Recovery and Reinvestment Act of 2009, the stimulus legislation that will fund a push toward electronic rec-ords, also calls for new rules to address patient privacy. “With so much at stake in terms of both potential risks to patients and the vast possibilities of research based on the information in patient records,” Detmer says, “there’s going to be a lot of attention focused on what those regulations should say.”

Dossier

“Toward a National Framework for the Secondary Use of Health Data,” by Charles Safran, Meryl Bloomrosen, W. Edward Hammond et al., Journal of the American Medical Informatics Association, January/February 2007.An extensive analysis of the widespread use of personal health data for research and commercial applications and of the need for coherent standards to protect individual data.

“Characterization of Patients Who Suffer Asthma Exacerbations Using Data Extracted From Electronic Medical Records,” by Blanca E. Himes, Isaac S. Kohane, Marco F. Ramoni and Scott T. Weiss, American Medical Informatics Annual Symposium Proceedings, 2008. The authors discuss the computational methods they devised to mine data from electronic medical records, finding that age, race, smoking history and weight were significant predictors of asthma patient hospitalization rates.

“Identifying Pediatric Age Groups for Influenza Vaccination Using a Real-Time Regional Surveillance System,” by John Brownstein, Ken Kleinman and Kenneth Mandl, American Journal of Epidemiology, August 2005. The story of how a real-time population health monitoring system identified three- to four-year-olds as an age group that develops influenza earliest, a discovery that fueled a strategy to vaccinate preschoolers to help prevent flu deaths among older people.

Back
to top