Affiliations 

  • 1 Institute of Structural and Molecular Biology, University College London, London, WC1E 6BT, UK
  • 2 TUM (Technical University of Munich) Department of Informatics, Bioinformatics & Computational Biology, i12, Boltzmannstr 3, 85748, Garching/Munich, Germany
  • 3 School of Biological Sciences, Seoul National University, Seoul, South Korea
  • 4 European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
  • 5 Institute of Structural and Molecular Biology, University College London, London, WC1E 6BT, UK. [email protected]
Commun Biol, 2023 Feb 08;6(1):160.
PMID: 36755055 DOI: 10.1038/s42003-023-04488-9

Abstract

Deep-learning (DL) methods like DeepMind's AlphaFold2 (AF2) have led to substantial improvements in protein structure prediction. We analyse confident AF2 models from 21 model organisms using a new classification protocol (CATH-Assign) which exploits novel DL methods for structural comparison and classification. Of ~370,000 confident models, 92% can be assigned to 3253 superfamilies in our CATH domain superfamily classification. The remaining cluster into 2367 putative novel superfamilies. Detailed manual analysis on 618 of these, having at least one human relative, reveal extremely remote homologies and further unusual features. Only 25 novel superfamilies could be confirmed. Although most models map to existing superfamilies, AF2 domains expand CATH by 67% and increases the number of unique 'global' folds by 36% and will provide valuable insights on structure function relationships. CATH-Assign will harness the huge expansion in structural data provided by DeepMind to rationalise evolutionary changes driving functional divergence.

* Title and MeSH Headings from MEDLINE®/PubMed®, a database of the U.S. National Library of Medicine.