"Our generative AI speaks the language of proteins"
Birte HöckerProfession: Biochemist
Position: Professor at University of Bayreuth
Profession: Biochemist
Position: Professor at University of Bayreuth
Biochemist Birte Höcker from the University of Bayreuth uses artificial intelligence to create customized proteins. In this interview, she talks about the potential of language models, which work much like ChatGPT, for the development of completely new biomolecules.
Birte Höcker conducts research at the Institute of Biochemistry at the University of Bayreuth. The professor and her research group are developing digital tools for protein design. For Höcker, artificial intelligence (AI) opens up fascinating and promising avenues for using language processing methods to produce customized proteins. Generative AI technologies can create proteins that do not occur in the wild - or have never existed in evolution.
How are AI-based methods transforming the development of designer proteins for research and industry?
In recent years, AI technologies have transformed protein research and the means of producing customized proteins. AI algorithms such as AlphaFold2 have quickly become an integral part of structural biology analyses worldwide. Also, there are now a number of AI-based tools for designing new proteins. Language models are often used for this purpose, such as ProGen, ProtGPT2 or Chroma.
You and your team have developed an AI-based language model called ProtGPT2. How does it work?
Like ChatGPT, ProtGPT2 is a model for processing natural language. We have trained our language processing model with 50 million sequences of natural proteins. Now it not only understands the language of proteins, but can also use it creatively. It can be used to design proteins that adopt stable structures through folding and are permanently functional in this state. These insights into the vast world of possible proteins open the door to innovative research that creates previously unknown proteins in novel ways.
What are other special features of ProtGPT2?
Most proteins that have been designed from scratch so far have idealized structures. Before they can be used, complex functionalization processes are usually required, for example the insertion of extensions and cavities. ProtGPT2, on the other hand, produces proteins that have such an intrinsically differentiated structure that they are already ready for use in their respective environment. We also have evidence that the model can create proteins that do not occur in nature and may never have existed in the history of evolution.
Which proteins are you focussing on in particular?
We are very interested in so-called TIM-barrel proteins, which are often enzymatically active, but we also work a lot with binding proteins that are suitable for the production of sensors for small molecules. On the one hand, we want to know which protein structures are possible in the first place; on the other hand, we are also trying to develop proteins for specific applications. To us, enzymes for plastic degradation as well as alternatives for reagent antibodies and the construction of motor proteins are especially intriguing.
What are some of the most exciting developments you are currently aware of in the protein design space?
The pace of development is rapid. New tools are constantly being created and they are becoming more generally accessible. In addition, the success rates have increased enormously. In the past, many designs failed at the production stage (expression in bacterial cells). Today, we are seeing a significant improvement in the properties and handling of many designer proteins, allowing us to ask completely new questions and tackle new challenges.
Interview: Philipp Graf