In the complicated biological system, multifarious proteins are responsible for specific physiological functions. The diversity and specificity of protein functions ensure the normal operation of each organism. However, our knowledge of the functional mechanism of the vast majority of proteins is very limited. The story between humanity and protein structure puzzles goes all the way back to the last century. In 1958, John Kendrew solved the first protein structure by X-ray crystallography1 and won the Nobel Prize in Chemistry 1962.
Photographs of the model of myoglobin1 John Kendrew (1962-1997)2
Ten years later, in 1972, another Nobel laureate Christian Anfinsen postulated that the native structure of a protein in standard physiological environment is determined only by its amino acid sequence, and historically raised the possibility to make the a priori prediction of protein conformation merely from primary sequence3.
“A protein structure is determined by Christian Anfinsen (1916-1995)4
its primary sequence.”3
For decades, scientists have been always making attempts to predict protein structure with the advantages of computational methods. Recently, AI tools make tremendous contributions in solving the protein structures in the computer. AlphaFold2 shows outstanding performance in the Critical Assessment of protein Structure Prediction (CASP) competition5. It brought the prediction accuracy of general proteins to the experimental level and hence has been viewed as the biggest success of AI for science so far, and was also recognized as Science’s 2021 breakthrough of the year6. Its predicted protein structure pool has been expanded to involve 214 million proteins from about 1 million species, which covers almost every known protein on the planet7.
Nevertheless, AlphaFold2 needs to rely on the coevolution information to predict the protein structures by searching the homologous proteins in the databases, which significantly limits their scope and usage in the modern pharmaceuticals industry. AlphaFold2 has shown unsatisfying prediction results on those proteins without clear evolutionary information, such as antibodies, T cell receptors and orphan proteins. It is reported that only around 35% of the 214 million predictions are deemed to be highly accurate8, which implies there is still a large space to improve. More importantly, evolution-based methods are obviously unapplicable for artificially-designed proteins, such as industrial enzymes and other macromolecule drugs.
To address this challenge, Helixon has been always looking for new perspectives to build the next generation of protein structure AI model. Recently, we developed a protein structure prediction tool, OmegaFold, which could achieve high-resolution protein structure prediction only using the primary sequence. The structure prediction performance of OmegaFold is very competitive. We evaluated our model on recent CAMEO (Continuous Automated Model EvaluatiOn) and CASP proteins. OmegaFold outperforms AlphaFold25 and RoseTTAFold9 when only single primary sequences are provided, and performs comparably to these two when envolution information is used as input.
More importantly, OmegaFold shows satisfying prediction performance in modeling antibodies and orphan proteins, which exploits immense potentialities of the whole protein industry. In addition, the runtime of OmegaFold is significantly shortened compared to AlphaFold2. Structure biologists now can run OmegaFold on their laptop and make accurate predictions by only using one protein sequence. In this way, OmegaFold greatly expands the space of applicable protein types, which will definitely help scientists to better understand the specific function of various proteins. OmegaFold also gives hope to find potential drug targets and further help researchers to develop new drugs. We believe our new cutting-edge technology will bring the entire field into a new era.
The code and software of OmegaFold has been released on Github: https://github.com/HeliXonProtein/OmegaFold. The paper could be found at: https://www.biorxiv.org/content/10.1101/2022.07.21.500999v1.full.pdf
1 Kendrew, J. C. et al. A three-dimensional model of the myoglobin molecule obtained by x-ray analysis. Nature 181, 662-666, doi:10.1038/181662a0 (1958).
2 NobelPrize.org. John C. Kendrew – Facts, <https://www.nobelprize.org/prizes/chemistry/1962/kendrew/facts/> (
3 Anfinsen, C. B. STUDIES ON THE PRINCIPLES THAT GOVERN THE FOLDING OF PROTEIN CHAINS, <https://www.nobelprize.org/uploads/2018/06/anfinsen-lecture.pdf> (1972).
4 NobelPrize.org. Christian Anfinsen – Facts., <https://www.nobelprize.org/prizes/chemistry/1972/anfinsen/facts/> (
5 Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583-589, doi:10.1038/s41586-021-03819-2 (2021).
6 Science’s 2021 Breakthrough of the year, <https://www.science.org/content/article/breakthrough-2021> (2021).
7 AlphaFold Protein Structure Database, <https://alphafold.ebi.ac.uk/> (
8 Callaway, E. ‘The entire protein universe’: AI predicts shape of nearly every known protein. Nature 608, 15-16, doi:10.1038/d41586-022-02083-2 (2022).
9 Baek, M. et al. Accurate prediction of protein structures and interactions using a three-track neural network. Science 373, 871-876, doi:10.1126/science.abj8754 (2021).