AT THE SAN FRANCISCO CLINIC WHERE INTERNIST ELAINE KHOONG works, about half of the patients aren’t fluent in English. To communicate she uses an interpreter—in person, ideally, but sometimes via video or telephone. At the end of a visit, she hands over written instructions and a summary of what she and the patient discussed. “We know that patients have better comprehension if they get verbal and written instructions,” she says.

That is where the situation gets thorny. The written instructions are in English, but she sometimes uses Google Translate, a free online tool, to give patients a version in their native language. The tool has become an increasingly common go-to for hospitals and health care practitioners who don’t have access to expert translators. But some are asking: Does this do more harm or good?

Khoong is one of many researchers trying to get to the bottom of that question. In an experiment published earlier this year in JAMA Internal Medicine, her team at Zuckerberg San Francisco General Hospital used Google Translate to create Chinese and Spanish versions of the 100 most common sets of discharge instructions—pulled from patients’ charts in their emergency department. They then asked human translators to translate the instructions back to English and compared the final documents to the originals.

They found—perhaps surprisingly—that Google Translate did a mostly accurate job. Eight percent of instructions in Spanish and 19% of those in Chinese had confused the meaning, but most errors were minor. Only 2% of instructions in Spanish and 8% of those in Chinese had the potential to cause harm.

This investigation comes amid growing concern about Google Translate’s role in the daily practice of medicine, especially in multicultural communities. Given the current limited availability of translators, Khoong says she is hopeful about its value for real-time, patient-specific written instructions, but she feels the jury is out on whether it really helps in face-to-face interactions.

“It’s a problem if I’m trying to ask someone who only speaks Hmong culturally sensitive or personally sensitive questions,” says Jeffrey Jackson, an internist and epidemiologist at the Medical College of Wisconsin in Milwaukee. “I’d be worried that Google Translate wouldn’t carry nuance correctly.”

The tool does seem to be improving, however. A study published just five years ago looked at 10 common medical phrases translated into 26 languages. Only about 58% of the translations were accurate, and some errors were severe. In Swahili, “your child is fitting [seizing]” became “your child is dead,” and in Marathi “your husband had a cardiac arrest” became “your husband had an imprisonment of heart.”

Improvements in Google Translate are probably attributable to a change in algorithms. Until fall 2016, the algorithms translated text phrase by phrase and often got tripped up by being too literal. Then Google engineers introduced algorithms that used neural nets, trained on large bodies of text in multiple languages, to learn and follow the ways that languages are natively used.

Now Jackson thinks Google Translate may be good enough to help with a common problem in medical research—that more than three-quarters of published reviews of research exclude some studies because they were published in other languages. As a result, more than 90% leave at least one randomized trial out of their analysis. If the review includes 50 studies, an omission may not make a difference, Jackson says, but for a review that includes only five or six studies that don’t all reach the same conclusion, that outlier could be important.

Jackson, who regularly edits systematic reviews, tested Google Translate on excerpts from 45 studies published in nine languages. In July 2019, he reported in the Annals of Internal Medicine that the translations were accurate enough to be included in reviews. “It worked better than we thought,” he says.

Patrick Davies, a physician at Nottingham Children’s Hospitals in England who led the 2014 study about Google Translate, predicts that the tool—or other machine-learning translation algorithms—may become even more useful. “For serious conversations, we still need the human aspect because there’s so much intuition involved,” Davies says. “But these systems are getting smarter and smarter. We can’t say machines will never be able to do this.”

But for now, Khoong remains cautious, and recommends that human translators be used wherever possible. Even when translation mistakes are minor, they can still create a dramatic misunderstanding. “We don’t know how this tool impacts patient outcomes,” she says. “Does it help more than it harms? We don’t know.”