Ragon Institute

Mere Semantics: Natural Language Processing in Viral Evolution

by: Rachel Leeson

 

One of the most common metaphors used in science is the explanation of DNA as like an alphabet, with the bases of DNA like letters. Ragon Member Bryan Bryson, PhD, along with MIT professor Bonnie Berger, took this metaphor to a whole new level, using natural language processing to predict mutations which allow viruses to evade the immune response, recently published in the journal Science

 

“When thinking about the complexity of biology,” Bryson says, “a good analogy can go a long way, allowing you to explore other disciplines for similar paradigms and draw from their innovation and approaches. Here, we were able to find an analogy in natural language processing algorithms and explore their utility in the setting of modeling viral evolution.” 

 

If DNA is like an alphabet, MIT graduate student Brian Hie, the first author on this study, thought, then perhaps evolution, which comes from changes in DNA, was like a language.

 

Hie, who studies computer science, thought of the rules guiding virus evolution as akin to the  grammar and semantics rules that guide language. In a virus, the rules of grammar guide the virus’s ability to replicate or infect cells. Semantics, which in language studies is the meaning of the words, would represent how the virus is “read” by the immune system, just like switching one word for another can completely change a sentence. 

 

You can think of this, Bryson explains, by comparing “I eat bread” to “I fear bread.” Both sentences are grammatically correct, but changing one word significantly alters the way the sentence is read. If such a semantic change occurs in a virus, the immune system’s antibodies may no longer be able to recognize, or read, the virus they were developed against, allowing the virus to evade the immune response in a previously immune person. 

 

Working with MIT graduate student Ellen Zhong, Hie developed a machine learning algorithm that could test thousands of potential mutations and identify those that changed the semantics of a virus without violating its rules of grammar. These particular mutations, the team found, were the most likely to escape from the immune system. Being able to identify them through a computer model may help scientists prepare for potentially dangerous new strains before they appear in nature, and, because it is a model, doesn’t require scientists to generate and test new strains in the lab, a time-consuming and expensive process, in order to identify the most concerning ones. 

 

The team developed the model in influenza and HIV, using the vast amounts of sequenced strains of both to test their model for accuracy, as well as identifying potentially concerning new mutations. Then, they applied the algorithm to SARS-CoV-2, the virus that causes COVID-19, where they identified 5 potential mutations which have the potential to create a new strain of virus capable of escaping the immune response. Researchers would then be able to study concerning new strains in the lab, before they emerged in the wild. 

 

But these natural language algorithms are not limited to just viruses; they could also allow researchers to study things like pathogen evolution or drug resistance in diseases such as tuberculosis or cancer. 

 

“If you think about evolution as the language of biology,” says Hie, “then maybe language models can help us better understand biology as a whole.” 

 

About the Ragon Institute of Mass General, MIT, and Harvard
The Ragon Institute of Mass General, MIT, and Harvard was established in 2009 with a gift from the Phillip T. and Susan M. Ragon Foundation, creating a collaborative scientific mission among these institutions to harness the immune system to combat and cure human diseases. With a focus on HIV and infectious diseases, the Ragon Institute draws scientists, clinicians and engineers from diverse backgrounds and areas of expertise to study and understand the immune system with the goal of benefiting patients. 
For more information, visit www.ragoninstitute.org