It’s clear that 2024 was the year of AI. Not only has the technology become a topic of dinner-party conversation, from what it means for intellectual property rights to whether it will destroy us all, but its impact on the sciences was acknowledged with the award of both the physics and chemistry Nobel prizes to AI-related work. Part of the latter was given to the key developers of AlphaFold, an algorithm for predicting the shapes of proteins, the biological molecules that do most of the chemical work in living cells.
That, however, is the tip of the iceberg. AI techniques are transforming the life sciences at all levels. There is currently “a pace of AI advances in life science that I’ve not seen before, or frankly even imagined would be possible,” says US physician Eric Topol.
For example, an AI system called Evo has been trained to learn the “language of DNA”, at least for microbes. Give it a gene sequence – the string of chemical building blocks that make up a piece of DNA – and it can predict what role the gene has in the organism.
Evo can predict how mutations to the sequence will affect the fitness of the organism, and can work, like large language models such as ChatGPT and Midjourney, in “generative” mode: to design new sequences, not found in the data used to train the algorithm, that perform specific tasks. In essence, Evo represents a substantial step towards being able to “speak DNAese”.
AlphaFold is no longer the only game in town for predicting the structure of proteins. A rival algorithm called
Boltz-1, developed at Massachusetts Institute of Technology (MIT), can do a comparable job and, because it was produced in an academic rather than commercial setting, has been released on an open licence so that all the code and training data are publicly available. 
Meanwhile, other programs are seeking to plug important gaps. One called RhoFold, co-developed in China and at Harvard, predicts the structures of another class of biomolecules whose importance has become ever more apparent in recent years: RNA, which is not only crucial for transforming gene sequences into proteins, but also performs many vital roles on its own. A related algorithm from the same team is geared to designing RNA molecules that might act as drugs by sticking to specific target molecules in cells.
Other algorithms are seeking to crack the language of the “epigenome”: the system of chemical modifications made to chromosomes (where DNA is packaged up with proteins) to “regulate” specific genes, turning them on and off. It’s this set of epigenetic changes that, by annotating the “DNA text”, largely distinguishes the cells in our different tissues, for example making a heart muscle cell different from a liver cell.
Many epigenetic modifications to DNA are associated with particular diseases, and an algorithm called MethylGPT, developed at the Brigham and Women’s hospital in Boston, Massachusetts, and at Edinburgh University, has begun to decode the complex patterns in the genome of one type of epigenetic modification, called methylation, and to work out what they mean for the human body.
In particular, MethylGPT can evaluate the disease risk associated with a particular methylation pattern (which can be deduced from a sample of a patient’s DNA), and also how these patterns change with age.
AlphaFold has not stood still. Understanding what proteins actually do is a matter of figuring out how they interact with other molecules, such as other proteins, DNA, RNA, hormones and so forth, as well as synthetic drugs. The AlphaFold team have now expanded their algorithm to make predictions about such interactions, thereby speaking directly to the fundamental chemistry of life.
That’s not all. Increasingly, these AI tools can be integrated, almost like a team of collaborating scientists, to work on a project with a specific goal. A group at Stanford University has demonstrated this approach, using five AI systems that convene for virtual “lab meetings”, to design drugs against the Covid virus SARS-CoV-2.
All this is cause for celebration, and for recognising that, whatever the dangers of AI in the wider world, its potential for accelerating science is tremendous.
A word of caution, though. One sometimes sees the suggestion now that we don’t need theories able to describe the dizzying complexity of living systems, because we can just feed raw experimental data into AI and get out the answers we need – a new drug, a diagnosis, a prediction of which gene to edit. Don’t fall for it. Science has never worked that way, and the black box of AI, however handy, won’t and shouldn’t ever be a substitute for true understanding.

 
                             
                 
			            