THE PEDIGREE CHART OF FAMILY 709 tells a grim story, with black diamonds indicating the death by suicide of 27 distant cousins across the branches of an eight-generation family tree. In another chart, for Family 553615, black diamonds mark the suicides of 81 descendants of a single couple who lived in the early 1800s.

Of all the approaches to solving Utah’s suicide problem—the state’s suicide rate is the nation’s fifth highest, and suicide is the leading cause of death there for young men—genealogy might not be the first to spring to mind. Yet researchers have suspected for some time that genes play an outsize role. Adopted relatives of those who have killed themselves have no more risk of suicide than the population at large, but for biological kin, the risk is four times greater; the identical twin of a suicide victim may carry up to 11 times the normal risk.

Family 709, Family 553615 and others like them fall far outside the norm, with a rate of suicide 4 to 10 times higher than that of the overall population. In trying to understand what might set them apart genetically, Hilary Coon, professor of psychiatry at the University of Utah, can draw on two resources unavailable almost anywhere else. One is DNA samples that the Utah state medical examiner has collected from nearly 5,000 suicide victims. The other is the Utah Population Database, or UPDB, one of the most comprehensive human databases, with extensive family pedigrees—genealogical records for Utah families that trace many to forebears in the 1700s and 1800s.

Exploring the medical histories of extended families can help refine and focus genetic investigations, and for Coon, it provided valuable leads. By cross-referencing public records—Utah death certificates started noting suicide as a cause of death in 1904—with Utah’s genealogical database, Coon was able to get de-identified family structures of 200 large families with evidence of a high risk of suicide. She chose to study 10 of those families—the ones with the highest risk and the most DNA available.

A genetic analysis of those samples pointed Coon toward 10 chromosomal locations where the sets of distant cousins who died by suicide shared unusual genetic variants. She and her colleagues are studying those sections in detail, using whole genome sequences to home in on particular variations in the DNA. If they find one or more variants that have an evident role in increasing suicide risk, it might be possible to design a drug or other therapy that could act on that gene. Meanwhile, identifying individuals at greater risk through a genetic test could help with early intervention.

Coon’s study is one of more than 50 that are part of the Utah Genome Project at the University of Utah. That work combines the latest tools of genetic research—next-generation DNA sequencing and advanced data processing—with one of the oldest: the family tree. By making use of Utah’s unique wealth of genealogical records and meshing them with a trove of sequenced DNA, researchers in the state hope to sift out new heirlooms from the past: concrete genetic discoveries that lead to new treatments and better genetic insights.


UTAH’S NON-NATIVE POPULATION descends overwhelmingly from settlers belonging to the Church of Jesus Christ of Latter-day Saints, commonly known as Mormons. Mormons today account for more than 60% of the state’s approximately 3 million residents. The UPDB is in part a product of their church—knowing the names of dead ancestors allows relatives to perform “proxy baptisms” for those who may have died unbaptized, giving them a chance at salvation in the afterlife. As a result, Mormons have become some of the world’s foremost genealogical record- keepers. The names of the Mormon dead and their living offspring are carefully archived by the Genealogical Society of Utah, which houses hard copies of ancestry records in a secure, underground facility.

During the 1970s, these genealogical records led to the establishment of the UPDB. It can cross-reference the family trees of more than 11 million people from millions of linked pieces of information, including death certificates, data from the Utah Cancer Registry, and electronic health records from the University of Utah and Intermountain Healthcare, which provide care for roughly 85% of the state’s population. About 3 million living people listed in the UPDB have ancestry records going back at least three generations. In 2017, the Genealogical Society of Utah supplied records for an additional 90 million, representing deceased relatives of families in the UPDB, many of whom lived outside the state and country.

The only comparable resource is the deCODE database of Icelandic families, which biotech company Amgen acquired in 2012 for $415 million. DeCODE has genealogy records tracking Iceland’s population of about 350,000 all the way back to a handful of ninth-century ancestors, and the company has collected more than 10,000 whole genome sequences from current Icelanders.

The UPDB is managed for public benefit by the Huntsman Cancer Institute at the University of Utah. It is used primarily for medical research and fields hundreds of requests each year. The wealth of ancestral records allows for a genealogical approach to understanding disease that has been largely pushed aside for the past 20 years—not because it wasn’t effective but because it was hard.

“Historically, we recognized the importance of families in understanding human disease before we knew anything about DNA,” says Scott Hebbring, an investigator at the Marshfield Clinic Research Institute’s Center for Human Genetics in Wisconsin. But the challenge of finding large families to study, and the declining costs of sequencing, prompted a shift in the mid-2000s toward very large studies of unrelated people, called genome-wide association studies (GWAS).

With a GWAS, researchers can survey the DNA of hundreds or thousands of unrelated people, connect that genetic data to health records and let software find mutations shared by those with a particular disease. Such studies have revealed nearly 40,000 potential connections between areas of the human genome and complex but common conditions such as type 2 diabetes and Parkinson’s and Crohn’s diseases. But translating those tantalizing findings into diagnostic tests or treatments has often proven difficult.

In part that’s because GWAS can identify many genes associated with a disease, but the roles of individual genes may be quite minor. For example, many genes discovered through GWAS raise or lower cholesterol levels by 5% or so—an increment too small to target with a drug. By focusing on families with a high incidence of a disease, however, researchers may be better able to spot rarer variants—the smoking-gun genes—that have a larger impact, says Will Dere, a biopharmaceutical-industry veteran who now heads the University of Utah’s Program in Personalized Health. “From my drug-discovery perspective, that’s appealing. Histories and generations help separate the wheat—the clinically meaningful gene variant—from the chaff.”


MANY STUDIES OF DISEASE genetics have focused on parent-child or sibling pairs, because those relationships are easy to find. But the UPDB makes distant relatives—and their medical histories—easier to find as well, which confers two distinct advantages. When a study focuses only on close family relationships, any illnesses they share may be the result of confounding factors that come from a shared environment, rather than just the genes they have in common. That’s less likely to be a problem if relatives are further apart on the same family tree.

Second, studying two distant relatives makes it much easier to find troublemaking genes. That’s a matter of math. Parents share roughly half of their DNA with their children, and siblings share roughly the same amount with each other. For more distant relatives, the number of shared inherited variants drops by half with each degree of separation. First cousins have only about 12.5% of their human DNA in common, and with each branching, that number goes down further. If two far-flung family members share a rare condition, the culprit genes will be lurking within a relatively small pool of their shared DNA—only a few dozen rare coding variants, versus hundreds or thousands shared by a closer relative.

The genes identified through this process may be beneficial. Lisa Cannon-Albright, a professor and division chief of genetic epidemiology at the University of Utah School of Medicine, and geneticist John Kauwe at Brigham Young University in Utah, used the UPDB to find a rare variant of the gene RAB10 that may provide resilience against Alzheimer’s disease. Major risk factors for Alzheimer’s include age and a particular variant of the APOE gene called APOEe4, which can increase the likelihood of developing late-onset Alzheimer’s by as much as twelvefold. But a small percentage of people who have the APOEe4 variant appear untouched by its effects, living well beyond 75 years without symptoms of cognitive decline. A beneficial genetic mutation may exist that counteracts the “bad” one.

For their study, published in Genome Medicine in 2017, Cannon-Albright and her colleagues began with some 5,000 residents of Cache County, Utah, who have been followed for more than 15 years in a study on aging and dementia. Because nearly all of these subjects were in the UPDB, the researchers were able to find those with a strong family history of Alzheimer’s and to break them into two groups—one consisting of 232 people, living and dead, who had never shown symptoms of cognitive decline, even though they had the normally damaging APOEe4 variant, and another of 581 people diagnosed with dementia.

With the Kauwe Lab, Cannon-Albright’s team was able to pinpoint variants in the RAB10 and SAR1A genes that hadn’t been seen before and that were shared by members of the pedigrees. Then they were able to validate their findings by checking two independent DNA databases of Alzheimer’s patients and elderly controls, finding that the variant in RAB10 appeared to confer protection against Alzheimer’s in those groups, too.

Brain cells in mice pointed to a likely biological mechanism: Changes in RAB10 affected another gene, APP, involved in the production of amyloid proteins—an excess buildup of which is a hallmark of Alzheimer’s disease. This suggests that RAB10 could be a particularly promising target for prevention and treatment.

Amgen has taken a similar tack in developing a cardiovascular drug that mimics the lack of a particular gene discovered in Icelanders. People without the gene have a 35% lower risk of having a heart attack. Developing a drug that “silences” the gene might confer protection in people who have it. “Looking at pedigrees allows us to focus our attention in promising places,” says Cannon-Albright.


SOME OF THE GREATEST leaps have combined genealogical tools with an ever-increasing trove of genomic data. In the past three years, the USTAR Center for Genetic Discovery at the University of Utah, which processes genomic data for the Utah Genome Project and external collaborators, has analyzed tens of thousands of genomes. Yet with so much information at their fingertips, the challenge for researchers becomes “how to get the data to tell us what it knows,” says Nicola Camp, a statistical geneticist in the Huntsman Cancer Institute.

In a recent study of breast cancer, Camp and her colleagues tried to ascertain whether there were inherited genetic variants that predisposed women to develop particular types of tumors. Breast cancer tumors can be classified into four main subtypes determined by looking at patterns of expression across a panel of dozens of genes. Those with particular subtypes are more likely to succumb to the disease than those with other subtypes.

Using the UPDB and Utah’s cancer databases, Camp identified 11 extended families containing an unusually large number of people with breast cancer. She expected to find a preponderance of certain subtypes within distinct pedigrees, but the samples didn’t fit the expected pattern. That made her question the standard model. “We weren’t convinced that the existing four categories really told us what we wanted to know,” Camp says.

So she took a different tack and looked at about 1,000 cancer patients in a Kaiser Permanente database. Rather than sorting them into the usual subtypes, her team used a method called principal component analysis to derive biomarkers that explained the most common patterns of gene expression across the panel. It discovered five multi-gene tumor characteristics—an alternative representation of expression diversity, distinct from the four standard categorical subtypes—and found that these were consistent across other cancer databases, too. Then, back with the original high-risk pedigrees, the researchers found that two of the new multi-gene tumor characteristics did a much better job of explaining the excess of cancer in those extended families than the four subtypes had done. That discovery might eventually lead to better diagnosis and treatment.

“There was information in the genes that was important, and the pedigrees themselves told us we weren’t looking at it in the right way,” Camp says.

The potential breakthroughs made by Coon, Camp, Cannon-Albright and other researchers in Utah and Iceland may have much to do with the special resources in those places. Still, it’s possible to build other useful genealogies—and to do so rather quickly. Cannon-Albright is working with the U.S. Department of Veteran’s Affairs to create a genealogy database that will eventually link all 24 million VA patients to ancestry and health records. “It’s a massive amount of data, but it’s not that hard to do with publicly available records,” says Cannon-Albright. Starting with a birth certificate, researchers can usually find who someone’s mother and father were, and death and marriage records can also help.

Using health records is another way to construct genealogies, says Hebbring of the Marshfield Clinic Research Institute. He was coauthor of a 2017 paper in Bioinformatics that outlined a strategy for predicting people who are related, using basic demographic data—last name, date of birth, home address and gender—available in most electronic health records. (All demographic data was de-identified to protect patient privacy, as it is in all of these genealogical databases.) Two people sharing an address are likely to be related, especially if they share a last name. Factor in ages and you can make a good guess about familial relationships—parent-child or siblings. Using an algorithm to analyze records of 2.6 million people in Marshfield’s electronic health records, Hebbring and his coauthors predicted the composition of 173,368 family units of two to five generations with remarkable accuracy. The work showed that other medical systems with decent electronic records might be able to build genealogies in a similar way.

MEANWHILE, A GROWING NUMBER of national genome projects also have the potential to generate genealogical data. At least 50 such projects are under way around the world, in the United Kingdom, Saudi Arabia, Singapore, China and other countries. “Most national studies are often treated as large sets of unrelated individuals, but in reality, everyone is related somehow,” says Hebbring. “If you have hundreds of thousands of people in a study, there will be a few brothers, sisters, children, cousins—but there will also be many more distant family relationships.”

The race to mine such data for important genes is heating up. In 2015, for example, both Amgen and Regeneron launched PCSK9 inhibitors, potent new cholesterol-lowering drugs based on a variant discovered in French families that researchers at University of Texas Southwestern Medical Center in Dallas linked to very low cholesterol levels. This heralds a bright future for genealogy studies and even promises a kind of poetic justice. Disease-causing genes have always stalked families across generations, bringing tragedy in their wake. Now, by looking across generations, researchers will be able to follow the trail to new cures.