Subscribe to the OSS Weekly Newsletter!

Artificial Intelligence Is Coming for Our Proteins

After beating human champions at the game of Go, self-learning computer programs are figuring out how the workhorses of life twist themselves into pretzels.

At first glance, these two things look nothing alike. On the one hand, we see the mesmerizing contortions that proteins display to grant life forms their functionalities. On the other hand, we have the oldest continuously played board game in the history of our species, consisting of a gridded board and simple stones. 

In both cases, however, a sobering truth has emerged in recent years: computers beat humans. 

As chatter intensifies over just how reliable the artificial intelligence behind ChatGPT is, there is no denying that AI has proven its superiority in select applications. When a machine defeated a human champion at the game of Go in 2016, the world was watching. This demonstration of the power of AI, however, was just a prelude to its use in the biomedical sciences. The phrase “game changer” is played out in the media but it clearly applies here, in more ways than one. 

There were fewer cameras present when AI raced ahead of humans at solving the protein folding problem, but this feat of supremacy could have much more lasting and important implications.  

To Go from zero to hero 

Lee Sedol lost to a computer. 

Lee was one of the best players at the game of Go. For those unfamiliar with Go, it looks like checkers played on a much larger board, typically a simple wooden square divided by 19 vertical and 19 horizontal lines. The game uses round pieces called stones, with one player using black stones and their opponent, white stones. These pieces are placed one by one where lines intersect on the board, and the choice of where to place your stones is crucial to the strategy behind Go. If one of your stones gets encircled by your opponent’s, you lose it. As time elapses, black and white serpents of stones grow on the board, creating borders that protect territory. The game ends when neither player wishes to continue. The amount of territory owned by each player is used to declare a winner. 

Go is a popular game in East Asia and its origin can be traced back to China thousands of years ago. Its complexity resides mainly in the freedom a player has. The number of possible configurations of stones on the Go board is calculable but, I would argue, unimaginable, which means that beyond deliberate strategy, players have to rely on intuition to decide if a move is sound or not. A computer, however, is not so limited. 

From March 8 to 15, 2016, world champion Lee Sedol sat in front of a man who was representing an artificial intelligence named AlphaGo. Five matches were played under the scrutiny of judges, commentators, and cameras live-streaming the event. Lee would place a stone on the board; this play would be entered in the laptop running a virtual version of the game; AlphaGo would consider its next move before unveiling it; and the man sitting opposite Lee would duplicate the AI’s play on the wooden board. 

Lee lost matches 1, 2, 3 and 5. He won the fourth event seemingly by playing such a complicated game that AlphaGo failed to appraise the situation well enough and eventually resigned. The competition, held in Seoul, was a confrontation of two thinking styles. Human moves we would consider creative were perceived by the AI as very conventional, whereas AlphaGo occasionally made decisions described by a witness as “not human moves.” 

AlphaGo was born of DeepMind, a London-based, Google-owned facility that researches and develops artificial intelligence. It attempts to loosely mimic in silicon how the human brain behaves, using the computer binary code of ones and zeroes. AlphaGo can be thought as having three main parts. Its first component is a network that has been trained on games of Go played by humans. Its second module can look at a new game and evaluate positions, calculating the probability of winning the game by placing a stone at a particular intersection. AlphaGo’s final unit tries to predict the future. While playing Lee Sedol, the AI was able to look 50 to 60 moves ahead, which a human simply cannot do. 

Lee’s defeat rippled into questions of identity, which are bound to become more common as artificial intelligence trounces the human brain in more and more applications. What does it mean to have devoted your life to becoming a world champion when a computer can leave you behind in the dust? And how valuable is human creativity when a self-trained device can make moves you never thought to consider or had previously dismissed as nonsensical?  

Three and a half years after his public defeat at the virtual hands of AlphaGo, Lee Sedol retired from professional play. “Even if I become the number one,” he said in an interview, “there is an entity that cannot be defeated.” This entity, AlphaGo, was replaced by a less specialized one: AlphaZero. Whereas AlphaGo was trained exclusively on the game of Go, AlphaZero is an artificial intelligence able to play any two-player game, starting with a blank slate (i.e. not trained on matches played by humans) and playing against itself millions of times to learn from its mistakes and improve its strategy. AlphaZero became superhuman in days, and it did not need the knowledge that we as a species gained from playing millions of games over thousands of years. 

The world of competitive board games has been changed forever, but the makers of AlphaGo and AlphaZero were not done. Putting stones down on a grid was only the opening act. The pièce de résistance would turn out to be predicting how precious building blocks fold in space. 

The Olympics of protein folding 

Proteins are the workhorses of life as we know it. DNA makes RNA and RNA makes proteins, long strings of amino acids that do so much more than help us build muscles. Proteins transport molecules into and out of our cells. They act as antibodies. They catalyze chemical reactions. They copy our DNA. Importantly, their function is derived from their structure, and that’s where folding comes into play. 

If you’ve laundered towels, you know you can fold them in different ways. Fold each of them in random ways and you will end up with a dysfunctional pile. Fold them each cleanly and in the same way and you will have a strong tower of fresh towels. The towels themselves never change, but their conformation in 3D space will decide if adding another one to the pile will result in a stable structure or a failed game of Jenga. 

Proteins are the same. Each amino acid they are made of has distinct side chains and electrical charges, which means that the full molecule twists itself into helices and flat sheets and eventually reaches its optimal shape in 3D space. That shape, like that of a key, determines what it can do, but it is not an easy one to guess. 

Scientists spend months, often years, figuring out the 3D shape of a single protein, in part because understanding these conformations can help us treat diseases by designing better drugs. They can’t look at a protein under a microscope, however: visible light is simply too “big.” They need a type of light with a wavelength that is equivalent to the size of the bonds between the atoms that make up a protein, i.e. X-rays. 

In X-ray protein crystallography, a technique pioneered in the 1950s, a bacterium is genetically modified to produce the protein of interest, which is purified then turned into a supersaturated solution that will crystallize millions of copies of this protein in the same orientation, like a field of sunflowers all facing the sun. This crystal is then hit by a beam of X-rays, often coming from a massive particle accelerator that uses magnets to wiggle electrons racing through it, forcing them to emit high-energy X-rays. The collision with the crystallized proteins creates a strange Rorschach-like image captured by a detector. Using this image and complex mathematics, scientists are able to eventually recreate the shape of the protein as if they were studying ripples in a pond to reproduce the exact stone that caused them. 

This was one of the main methods used in the pre-artificial intelligence days, often requiring a Ph.D. student’s entire doctorate to divine one protein’s conformation. Now, the 3D structures of nearly all human proteins have been predicted and are available for free to anyone who is curious enough. Who made these predictions? The successor to AlphaZero, an artificial intelligence named AlphaFold. 

Well over 100,000 protein structures had been elucidated by scientists prior to the development of AlphaFold, and this data was fed into the artificial intelligence, which learned to associate a particular string of amino acids with a certain three-dimensional shape. But this training set was too small, so the team fed AlphaFold’s most confident guesses back into itself for one final training. AlphaFold was then ready for the big leagues. 

CASP—the Critical Assessment of protein Structure Prediction—is sometimes referred to as the Olympics of protein folding. Every two years, teams compete to see how good they are at predicting the shape of proteins whose conformations have been resolved the hard way but have not yet been published. It is thus a blind test, and DeepMind decided to enter the competition with AlphaFold. In 2018, AlphaFold won the competition by a wide margin. In 2020, in the middle of the COVID-19 pandemic, an entirely different iteration of AlphaFold entered CASP again. The new AI’s speed was faster, often taking much less than a day to work out a prediction for a protein’s structure. One of the proteins that was fed to the participants was a protein made by the new coronavirus. Overall, AlphaFold scored 244.0. The next best group earned a final score of 90.8. 

AI has now predicted the 3D shapes of over 95% of human proteins, as well as tens of thousands of proteins found in nature, like those of the nematode worm, the mouse, and the parasite that causes malaria. The fidelity of these predictions is often very close to what can be painstakingly demonstrated using techniques like X-ray crystallography; however, there are still major challenges. AI still struggles with very long proteins, with how proteins change shape when they bind to other structures, and with how proteins are often modified by having other molecules attached to them. Some proteins are also notoriously “floppy,” like a wet noodle, making AI predictions harder. But AlphaFold now has cousins, like RoseTTAFold2 which beat AlphaFold at figuring out how proteins interact with each other to form three-dimensional complexes. 

These neural networks can, in theory, speed up drug development. Scientists can think of a protein that might act as a wrench in the works of a disease and run that sequence through an artificial intelligence to see if it will fold the correct way to carry out its mission. Of course, knowing a protein’s 3D structure is not the Holy Grail to ensure that a new drug will do what we want it to do, but accurate predictions of this structure can accelerate the overall process. 

As for the game of Go, many professional players, like Lee Sedol, have left the competition, and teachers are reported as seeing diminishing demand for their coaching. Human players who want to win big now avoid developing a personal style; instead, they try to imitate the AI. The student has become the master. No longer naively sifting through previous games to learn the moves, the computer now acts as the ultimate teacher for many aspiring Go competitors. 

And it has changed the game by making human players more creative and, ultimately, better. Just as Deep Blue’s victory over Garry Kasparov did not end humanity’s interest in playing chess, the crowning of artificial intelligence at Go simply pushed human players to up their game. 

It also acted as a cheat code to help us understand how proteins fold and gave us a leg up in producing better medicines. 

Not bad for a bunch of ones and zeroes. 

Take-home message:
- Artificial intelligence programs have been created that end up beating humans at the game of Go and at other, two-player board games
- Similar artificial intelligence programs can now quickly and often accurately predict the main 3D shape that a protein adopts, a process which experimentally otherwise takes months or years
- While this has the potential to speed up drug discovery, artificial intelligence still struggles with some aspects of protein folding, like what happens when multiple proteins assemble in a complex


Back to top