WHEN MYRA AGREED TO participate in a clinical trial testing an epilepsy drug, she understood that the medication might not help her and that its side effects could be harmful. But she figured that even if the treatment failed, she would be providing crucial information. The investigators running the trial would use the data they had gathered on her to develop a better drug or figure out why she and others didn’t respond to existing epilepsy therapies. What Myra didn’t know is that the results from her experiment—and those of hundreds of thousands of other people who participate in human clinical trials—are frequently buried, sometimes forever. Without those results, patients don’t benefit. Medical research doesn’t progress.

Data from a medical trial are a crucial resource that can potentially help many more people than the original trial envisions, says Harlan Krumholz, professor of medicine at Yale School of Medicine. “But results from half of clinical trials are not published within three years of the trials’ completion, and many are never published,” Krumholz says. A researcher might sit on that data because an experiment’s hypothesis didn’t pan out, or the new drug or medical device may have failed to work as planned. Negative trial results are difficult to publish in medical journals, and the data from those experiments may never see the light of day.

Other reasons to withhold results are tied to the institutional politics of a scientist’s career, especially in academic settings. Researchers can spend a decade working on a single clinical trial, and their scientific currency is the number of publications they can get out of the research. Such investigators might also derive other benefits from the data they collect, including patents and a chance to start medical companies based on their discoveries. Researchers may also shield their data to avoid being scooped by other scientists, who could use someone else’s study results to design a better trial, make a breakthrough and get credit for the discovery. In a new analysis, inadvertent or intentional biases of the original study may be exposed.

But keeping data private hurts progress, say advocates for “open data”—the idea that information gathered during clinical trials should routinely be disclosed and shared. These voices call for new levels of transparency, especially from researchers and institutions that accept taxpayer-funded grants, who are ostensibly conducting science for the public good. The goal of these advocates would be to make public all of the information collected on each participant in a clinical trial, rather than just the aggregate data showing whether a drug or medical intervention worked, which is typically all that is shared. “In physics, astronomy and other scientific fields, raw data from experiments are available for others to see and analyze,” says Krumholz. “In medicine, researchers can claim a result with data that are virtually unauditable. It’s like having a telescope but only allowing one person to look through it.”


Yet many scientists continue to resist sharing that telescope. “People fear that someone will get their data, reanalyze them and come up with a different conclusion,” says Richard L. Schilsky, chief medical officer of the American Society of Clinical Oncology and former director of the University of Chicago Medicine Comprehensive Cancer Center. “But that’s the nature of science. It self-corrects.”

Research data can often be squeezed for new insights. In clinical trials, for example, investigators compile extensive research records for all participants that include a wealth of information—demographic data, the history of the person’s disease, the results of medical exams and laboratory tests, imaging studies, and that person’s response to the experimental drug or device. Records also show any complications and side effects. The researchers running the trial may need to know only how the trial participants responded to a treatment, but someone else might analyze the same data and find that a drug worked better in particular patients, or that a specific test seemed to predict the likelihood of complications. Looking at data for those in the placebo or control arm of a trial—people who didn’t get the treatment being tested—may help scientists understand how a disease progresses without intervention or with standard care.

Outside researchers can also pool individual patient data from separate experiments. Combining and comparing information from several trials of classes of antidepressants or diabetes drugs, for example, often provides more reliable, representative results than looking at what happened only with the relatively small number of patients in a single trial. And peering inside other scientists’ trial designs and results may help researchers devise better experiments or avoid duplicating previous research.

Beyond the usefulness of open data to scientists, there’s also researchers’ implicit obligation to those who volunteer for clinical trials, sometimes at risk to their health and safety. Sharing trial data can serve those participants by making the information they provide as helpful as possible, says Frank Rockhold, professor of biostatistics and bioinformatics at Duke University School of Medicine. “While those who generate the data should get full credit, just because you ran the trial doesn’t make you uniquely gifted to interpret the results,” he says.

Giving a nod to at least some of these arguments, the U.S. government in 2007 mandated that researchers share detailed results of all clinical trials involving drugs, biologics or devices that had been approved or cleared by the U.S. Food and Drug Administration. (Non-FDA approved products do not have the same requirements to report results.) This information was to reside on ClinicalTrials.gov, a public registry established in 1997 to help patients and doctors find clinical trials in which to participate. This access to trial results would also allow other researchers to pool aggregate findings of similar studies.

The government told researchers they could be assessed fines of up to $10,000 a day for not adhering to the rules. But there have been no fines, and many researchers don’t comply. The biggest offenders were those who directed clinical trials that didn’t make it to publication in medical journals. A 2013 analysis found that 78% of those studies never had their data posted on ClinicalTrials.gov.

Other research also comes with a requirement for sharing data, at least when it is government funded. Researchers who receive federal support of more than $500,000 per year from the National Institutes of Health’s 27 institutes and centers must share patient-level data from their studies with other scientists who ask for it. And some findings do get shared. The National Heart, Lung, and Blood Institute, for example, established a data repository in 2000 for its funded studies, and since then, NHLBI has fielded more than 1,200 requests from scientists to use that data, and 680 publications have resulted.

Yet that accounts for only a tiny percentage of the patient data that could be shared, says Eric Peterson, professor of medicine at Duke University School of Medicine and executive director of the Duke Clinical Research Institute, the world’s largest academic clinical research organization. “There are loopholes in NIH’s data-sharing rules,” he says. “All a scientist has to do is have a paragraph in their grant application saying they’ll consider requests from outside investigators.”  For industry trials not sponsored by the NIH, researchers are required to share their data only if they publish their trial findings.

The push to require researchers to share their findings goes beyond U.S. government agencies. Last January, the International Committee of Medical Journal Editors, which already requires researchers to share some level of data in order to publish in their member journals, issued a proposal requiring that patient-level data from clinical trials in those papers be made available to other scientists within six months of publishing their initial findings.


In addition, a 2014 data-sharing policy issued by the European Medicines Agency—Europe’s equivalent of the FDA—requires that anonymous patient-level data from clinical trials sponsored by drug companies be submitted and made accessible to others as part of the application to market a new medication. (The FDA doesn’t ask drugmakers to provide anonymous individual patient data from clinical trials.) While it may take a few years before the policy is fully in force—it’s an immense effort with many technical issues to work out—the EMA says that disseminating the detailed clinical data will allow “all medicine developers to learn from past successes and failures” and “develop new knowledge in the interest of public health.”

Meanwhile, in at least one area of basic science, data sharing has already become routine—and necessary. “If you want to look at genetic predictors of disease, you may need a population of many hundreds of thousands of patients,” says Anne Klibanski, chief of neuroendocrinology at Massachusetts General Hospital and chief academic officer of Partners HealthCare, where she oversees research.

Researchers depend on large collaborative databases that provide the masses of people required to validate the effect of particular genes. And that necessity is built into the research process. Before he can publish the results of any of his genomics work, Robert Kingston, chief of molecular biology at MGH, must submit his raw data to GEO, an international public genomics data repository sponsored by the National Center for Biotechnology Information. “Other researchers will log on and download our data to see whether they can learn something from them and we do the same with theirs,” says Kingston. “We use other people’s data as a foundation in our work or to test hypotheses. Data sharing has become part of the fabric of genomic science today, and it’s been hugely powerful in increasing the speed of discoveries.”


That’s the goal of much of the push for open data—to quicken the pace of acquiring medical knowledge and finding new, more effective therapies. It is what Vice President Joseph Biden has called for in his proposed “moonshot” to cure cancer, an effort that Biden has said will require “unprecedented levels of cooperation,” including sharing data and providing incentives for scientists to replicate one another’s work. In June, Biden launched the National Cancer Institute’s Genomic Data Commons housed at the University of Chicago, a first-of-its-kind public data platform that will allow researchers to share and analyze genomic and clinical cancer data; it went live with more than 14,000 cancer patient cases. Later that month, Biden told a group of doctors and cancer patients that cancer researchers who don’t release trial findings publicly should lose their federal funding.

In addition to what is happening in the academic sphere, several global pharmaceutical companies are also becoming unlikely standard-bearers for open data. GlaxoSmithKline was one of the first to make patient-level data from its clinical trials available to other researchers. In 2013, GSK created an open-data platform with the intention of expanding to include information from global studies conducted since 2000, and the following year, a dozen additional drug companies joined the effort. Now operated independently by the Wellcome Trust, the open-access platform, ClinicalStudyDataRequest.com, makes trial data available to other scientists as soon as possible after initial results from trials have been published in a journal. GSK also provides data from unpublished trials. “GSK holds the view that they should allow other qualified researchers to access the data and perhaps discover interesting things that add to the knowledge base,” says Duke’s Rockhold, who helped build the company’s open-data platform when he worked for the drugmaker.

To get access to data, a researcher must submit a request outlining proposed research. That application is then reviewed by an independent five-person panel, which includes a member of the public. If the reviewers determine that a research proposal has scientific merit, the data are shared.

Two other open-access platforms are sponsored by individual drug companies: Johnson & Johnson partners with the Yale University Open Data Access Project (YODA), while Bristol-Myers Squibb in collaboration with the Duke Clinical Research Institute has created Supporting Open Access for Researchers (SOAR). And although many global pharmaceutical companies don’t yet put data from their trials into the public domain, the rules of the new European Medicines Agency initiative will push more to do so.

While much of the push for open data will take time to pay off, there is already a substantial cache of information available. Researchers can now mine patient data from more than 3,000 clinical trials in open-access platforms. And a recent study found that from 2013 through 2015, researchers had requested data from about one in six of those trials. Most of those 154 research projects attempted to probe study results more deeply, looking at drug side effects, how a subgroup of trial participants responded, complications of a procedure or how a disease is likely to progress.

So far, few researchers have shown interest in scrutinizing the data to determine whether a drug company’s claims about a drug’s efficacy and safety were true. Journals aren’t likely to publish an analysis that confirms the original results, and “verification studies don’t satisfy the drive of researchers, which is to generate new information and not re-create what someone has already done,” says Ann Marie Navar, assistant professor of medicine at Duke Clinical Research Institute in Durham, N. C., the study’s lead author. Yet such work can be valuable. The single verification study that did get published—from out of that total of 154 publications—was a reanalysis of a 2001 antidepressant study looking at adolescents with major depression. In studying the data, the researchers discovered that not only were certain antidepressants ineffective in teens, but they also caused significant harm. While 154 studies and one publication may not seem like a burgeoning research frontier, the idea of using previously private trial data is in its early stages, says Navar. And managing and combining these large data sets remain quite challenging, requiring biostatistical expertise. Navar hopes that more publications of studies crediting open-source data will spur researchers to appreciate the data’s availability and value. “Still, we were impressed with the breadth of research being done with the data,” she says.

It will take time to change a deeply entrenched culture of scientific enterprise that rewards secrecy and competition. Academics, who conduct most medical research, still get hired and promoted based on the number of publications they author, and that reality provides a strong incentive for being proprietary about the results of their own studies. One model for changing that equation would instead reward academic researchers for the number of times their data were used by others. “Francis Collins, who led the Human Genome Project, is revered not only because he authored a lot of papers on the human genome, but because he created a data resource that allowed so many thousands more papers to be written,” says Krumholz. “His contribution was to create a resource for other researchers to make important discoveries. This creates a virtuous culture within science instead of promoting attitudes of ‘these data are mine and if it takes me 10 years to do everything I want to do with them, so be it.’ ”

“We don’t need to be afraid or defensive about sharing data,” he adds. “I’m hopeful that we’re embarking on a very different era of research where people realize there is so much good that can be generated from an open-science outlook.”