Discovering the Design Rules Linking Protein Sequence to Function
Over the past decade, our ability to read, write, and edit exhaustive amount of DNA comprising complete genomes has flourished, leading to vast amounts of rich biological data available for analysis. However, a significant bottleneck is our capacity to author and design meaningful genes that code for proteins that conduct desirable properties, elevated functions, or novel functions. With the recent breakthroughs in multi-modal models, which have achieved remarkable success in high-resolution text-to-image synthesis, we are now extending this technology to the realm of biophysics and protein engineering, specifically for text-to-protein sequence generation. This innovative approach enables the design of proteins with specified functionalities based on natural language prompts, with subsequent experimental verification conducted in our laboratory to confirm their practical utility and effectiveness.
PhD student Nikša Praljak, along with Associate Professor Andrew Ferguson and Professor Rama Ranganathan from the University of Chicago are tackling this challenge by leveraging data-driven models trained on protein sequences. They aim to decode the principles linking protein sequence to function, paving the way for the development of novel synthetic proteins in the lab. Their project is poised to introduce a pioneering platform that marries computational analysis with experimental procedures for the fast-track design of synthetic proteins, tailored to specific functionalities. This innovative platform not only circumvents the need for traditional alignments and introduces fresh sequence diversification strategies but also integrates active learning loops and the use of natural language prompts for guiding the text-to-protein generation process, enhancing the precision and scope of sequence-based protein design.
Principal Investigators: Nikša Praljak (UChicago), Andrew Ferguson (UChicago), Rama Ranganathan (UChicago)