A New AI Tool Could Transform How We Diagnose Genetic Diseases

—Getty Images

Researchers at the Mayo Clinic and Goodfire, a San Francisco research startup, say they have used an AI model to predict which genetic mutations cause disease—and, crucially, to explain why—offering a new approach to diagnosing and studying genetic disorders at scale.

The research uses techniques from AI interpretability—the new science dedicated to understanding the opaque brains of AI systems—to predict and understand which gene mutations might be “pathogenic.”

Early diagnosis and treatment of certain cancers can be the difference between life and death, says Matthew Callstrom, professor of radiology and head of the generative AI program at the Mayo Clinic. However, the human genome consists of over 3 billion base pairs—an enormous needle-in-a-haystack problem.

The researchers worked with Evo 2—an open-source “genomic foundation model” trained by the Arc Institute—to predict which DNA mutations cause disease, and to understand which biological features might be responsible. Evo 2 is trained to predict the next “letter” in a DNA sequence—in the same way that large language models (LLMs) such as ChatGPT are trained to predict the next word in a passage of text. For ChatGPT, training on most of the text on the internet teaches it the structure of language and facts about the world. Trained on 128,000 genomes spanning all domains of life—each composed of just four letters (G, T, C, and A), the molecules that make up DNA—Evo 2 learns which genetic sequences are ‘conducive to life,” says Nicholas Wang, one of the paper’s authors.

However, this knowledge is locked in the seven billion numbers that encode the model’s artificial brain: researchers can see the numbers, but their meaning is opaque. Just as an EEG measuring electrical activity in a human brain doesn’t tell the neuroscientist what the patient is thinking about, AI researchers can see what’s happening inside the AI’s brain but struggle to interpret it.

The Goodfire researchers showed Evo 2 examples of pathogenic and benign gene mutations, and measured which parts of its brain lit up in response—allowing them to isolate the AI’s response to pathogenic mutations. They found that they could use this to predict which mutations caused disease better than every existing computational tool they tested against—despite Evo 2 never having been explicitly trained on the task of predicting which mutations caused disease. As with LLMs, the scale of the data that Evo 2 had been trained on—roughly ten times more than the previous largest genomic foundation model—had allowed it to infer the patterns of what healthy DNA had in common.

In the clinic, however, prediction is insufficient. “It’s extremely important that we understand why a model is making a decision,” says Matt Redlon, Chair of the Mayo Clinic’s AI program and a co-author on the paper.

Further probing revealed that Evo 2 had inferred meaningful biological features of a DNA sequence. For example, Evo 2 had learned to identify the boundaries between different sections of DNA, despite the fact that the genomes it was trained on don’t have explicit labels for these boundaries.

These biological features help to explain why certain mutations cause disease and others don’t. A mutation right at the boundary of two sections of DNA is more likely to produce a broken protein, leading to a genetic disorder. A mutation inside a section that is discarded before the protein is built is usually harmless.

The paper’s ability to identify biological features of mutations instead of just providing an opaque pathogenicity score is a “significant advance,” says Bo Wang, chief AI scientist at Canada’s University Health Network.

As the cost of genome sequencing falls—with recent systems claiming to sequence an entire genome for $100—methods of interpreting the genetic data, such as this one, could help scientists “go back to the biology” and create “personalized therapies” for individuals, says Redlon.

However, before Goodfire’s method is ready for the clinic, it would need to run larger trials to understand its performance on wider populations and then go through FDA approval. Moreover, while the researchers found biological concepts stored inside Evo 2, there is “no guarantee” that the model was actually using those concepts to determine which mutations were pathogenic, says James Zou, professor of biomedical data science at Stanford.

Interpretability has been gaining traction as AI is applied to the life sciences and beyond. Goodfire, which was founded in 2023 to advance the interpretability of AI models—a challenge its co-founder and CTO Dan Balsam calls “the most important problem in the world”—was valued at $1.25 billion in February. In January, Goodfire published research which identified novel biomarkers for Alzheimer’s stored in the brain of an AI model, raising the promise of finding new concepts inside the brains of AI models that have eluded human scientists.

“In my view, the most interesting part of [interpretability] is to be able to open the black box and see, ‘Did the model actually learn something about science beyond what we have known?’” says Zou. Goodfire’s newly published research doesn’t do this, since it only probes Evo 2 for known concepts, Zou added.

Interpretability has also been applied to large language models, such as ChatGPT and Claude. Recently, researchers at Anthropic found that Claude Mythos, the latest generation of the company’s flagship AI model, showed internal signs of awareness of being tested and then cheated on the tests—despite never explicitly stating that it was aware of being tested. The possibility that AI models could cheat on safety-relevant tests increases the importance of techniques that allow researchers to scan AI brains for signs of misbehavior.

“If there’s some barrier like, ‘Is interpretability useful?’ I think we’ve been cracking it, and I think we’ve smashed through it,” says Balsam.

Other Posts

Send a Comment Cancel reply