PKU Summer School 2025
Artificial Intelligence for Protein Design

28 July - 1 August, 9:00 ~ 12:00
Jian Tang

Jian Tang

MILA & BioGeometry

Chunfu Xu

Chunfu Xu

NIBS & Tsinghua

Zhen Liu

Zhen Liu

NIBS & Tsinghua

Instructors

  • Jian Tang, Associate Professor, Mila-Quebec AI Institute (Mila) & Founder, BioGeometry, website
    • Jian Tang is an associate professor at Mila, the leading AI institute founded by AI pioneer and Turing Award Laureate Yoshua Bengio, and a CIFAR AI Research Chair. His research focuses on deep generative models, geometric deep learning, and their applications to protein design. He is also the founder of BioGeometry, a leading AI startup focusing on developing generative AI for protein design.
  • Chunfu Xu, Assistant Professor, National Institute of Biological Sciences (NIBS) & Tsinghua University, website
    • Chunfu Xu is an assistant investigator at the National Institute of Biological Sciences (NIBS) and an assistant professor at Tsinghua University. He received his Ph.D. from Emory University, and later joined Prof. David Baker’s lab for postdoctoral research. He established his own lab at NIBS in 2022, focusing on developing deep learning-based methods for protein design and exploring their applications in creating functional proteins to address challenges in basic science, biotechnology, and medicine.
  • Zhen Liu, Assistant Professor, National Institute of Biological Sciences (NIBS) & Tsinghua University, website
    • Dr. Liu is a Principal Investigator at the National Institute of Biological Sciences (NIBS). He earned his Ph.D. in Chemistry from The Scripps Research Institute (TSRI). In 2019, he served as a Postdoctoral Fellow at Caltech in the laboratory of Prof. Frances Arnold, who received the 2018 Nobel Prize in Chemistry. Dr. Liu started his independent career in 2022, focusing on biocatalysis and the development of novel enzyme-engineering strategies.

Course Introduction

Proteins are the fundamental workhorses of living cells, playing a pivotal role in carrying out nearly all cellular functions. Understanding protein function and designing de novo proteins is crucial for a broad range of applications across various industries, including biomedicine, environmental sustainability, agriculture, cosmetics, materials science, and food production. In recent years, advances in artificial intelligence have opened new frontiers in protein design (e.g., AlphaFold2, AlphaFold3, RFDiffusion), enabling the development of novel proteins with tailored functionalities. This course aims to give an introduction to AI for protein design, the latest progress, as well as their applications in real-world problems. We equip students with the tools to leverage cutting-edge machine learning models for protein structure prediction, function annotation, and the design of synthetic proteins. Through hands-on learning and real-world case studies, students will gain expertise in applying AI techniques to solve complex challenges in protein design, ultimately advancing innovation across multiple sectors.

Evaluation: 10% participation + 90% course project

Outline

  • Day 1: Introduction to AI, Proteins, and Computational Protein Design [Jian, Chunfu]
    • Introduction to AI, Deep Learning, Transformers, Graph Neural Networks [Jian]
    • Introduction to Proteins and Computational Protein Design [Chunfu]
  • Day 2: Protein Representation Learning and Protein Structure Prediction [Jian]
    • Protein Representation Learning, Function Prediction
    • Protein Structure and Dynamics Prediction
  • Day 3: Generative Models and Protein Design [Jian]
    • Autoregressive Models, Diffusion Models
    • Protein Generative Models
  • Day 4: Basic Strategies and Applications of Enzyme Catalysis [Zhen Liu]
    • Basic strategies and applications of biocatalysis
    • New-to-nature biocatalysis and novel protein engineering strategies
  • Day 5: Applications in Antibody Design and Other Proteins [Chunfu, Jian]

References

Introduction to Deep Learning

  1. Goodfellow, Ian, et al. Deep learning. Vol. 1. No. 2. Cambridge: MIT press, 2016.
  2. Vaswani, Ashish, et al. "Attention is all you need." Advances in neural information processing systems 30 (2017).
  3. Hamilton, William L. Graph representation learning. Morgan & Claypool Publishers, 2020.

Representation Learning and Protein Structure Prediction

  1. Madani, Ali, et al. "Large language models generate functional protein sequences across diverse families." Nature Biotechnology 2023.
  2. Rives, Alexander, et al. "Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences." PNAS 2021.
  3. Lin, Zeming, et al. "Evolutionary-scale prediction of atomic-level protein structure with a language model." Science 2023.
  4. Elnaggar, Ahmed, et al. "Prottrans: Toward understanding the language of life through self-supervised learning." TPAMI 2021.
  5. Zhang, Zuobai, et al. "A systematic study of joint representation learning on protein sequences and structures." ArXiv Preprint ArXiv:2303.06275.
  6. Hayes, Thomas, et al. "Simulating 500 million years of evolution with a language model." Science 2024.
  7. Wang, Xinyou, et al. "Diffusion Language Models Are Versatile Protein Learners." ICML 2024.
  8. Satorras, Victor Garcia, Emiel Hoogeboom, and Max Welling. "E(n) equivariant graph neural networks." ICML 2021.
  9. Jing, Bowen, et al. "Learning from protein structure with geometric vector perceptrons." ICLR 2021.
  10. Zhang, Zuobai, et al. "Protein representation learning by geometric structure pretraining." ICLR 2023.
  11. Fan, Hehe, et al. "Continuous-discrete convolution for geometry-sequence modeling in proteins." ICLR 2023.
  12. Su, Jin, et al. "Saprot: Protein language modeling with structure-aware vocabulary." ICLR 2024.
  13. Wang, Xinyou, et al. "DPLM-2: A multimodal diffusion protein language model." ICLR 2025.
  14. Notin, Pascal, et al. "ProteinGym: Large-Scale Benchmarks for Protein Design and Fitness Prediction." NeurIPS 2023.
  15. Cai, Huiyu, et al. "Pretrainable Geometric Graph Neural Network for Antibody Affinity Maturation." Nature Communications 2024.
  16. Shan, Sisi, et al. "Deep Learning-Guided Optimization of Human Antibody Against SARS-CoV-2 Variants with Broad Neutralization." PNAS 2022.

Generative Models, Protein Design

  1. Hsu, Chloe, et al. "Learning inverse folding from millions of predicted structures." ICLR 2022.
  2. Dauparas, Justas, et al. "Robust deep learning-based protein sequence design using ProteinMPNN." Science 2022.
  3. Yim, Jason, et al. "SE (3) diffusion model with application to protein backbone generation." ICML 2023.
  4. Yim, Jason, et al. "Fast protein backbone generation with SE (3) flow matching." arXiv preprint ArXiv Preprint ArXiv:2310.05297.
  5. Lin, Yeqing, and Mohammed Alquraishi. "Generating Novel, Designable, and Diverse Protein Structures by Equivariantly Diffusing Oriented Residue Clouds." ICML 2023.
  6. Lin, Yeqing, et al. "Out of Many, One: Designing and Scaffolding Proteins at the Scale of the Structural Universe with Genie 2." ArXiv Preprint ArXiv:2405.15489.
  7. Ingraham, John B., et al. "Illuminating protein space with a programmable generative model." Nature 2023.
  8. Watson, Joseph L., et al. "De novo design of protein structure and function with RFdiffusion." Nature 2023.
  9. Bose, Joey, et al. "SE (3)-Stochastic Flow Matching for Protein Backbone Generation." ICLR 2024.
  10. Huguet, Guillaume, et al. "Sequence-Augmented SE (3)-Flow Matching For Conditional Protein Backbone Generation." NeurIPS 2024.
  11. Shi, Chence, et al. "Protein Sequence and Structure Co-Design with Equivariant Translation." ICLR 2023.
  12. Lisanza, Sidney Lyayuga, et al. "Joint generation of protein sequence and structure with RoseTTAFold sequence space diffusion." Nature Biotechnology 2024.
  13. Campbell, Andrew, et al. "Generative Flows on Discrete State-Spaces: Enabling Multimodal Flows with Applications to Protein Co-Design." ICML 2024.
  14. Chu, Alexander E., et al. "An all-atom protein generative model." PNAS 2024.

Applications of Enzyme Catalysis

  1. Arnold, Frances H. "Directed evolution: bringing new chemistry to life." Angewandte Chemie (International Ed. in English) 57.16 (2017): 4143. link
  2. Bell, Elizabeth L., et al. "Biocatalysis." Nature Reviews Methods Primers 1.1 (2021): 46. link
  3. Savile, Christopher K., et al. "Biocatalytic asymmetric synthesis of chiral amines from ketones applied to sitagliptin manufacture." Science 329.5989 (2010): 305-309. link
  4. Lu, Hongyuan, et al. "Machine learning-aided engineering of hydrolases for PET depolymerization." Nature 604.7907 (2022): 662-667. link
  5. Coelho, Pedro S., et al. "Olefin cyclopropanation via carbene transfer catalyzed by engineered cytochrome P450 enzymes." Science 339.6117 (2013): 307-310. link
  6. Zeng, Qing-Qing, et al. "Biocatalytic desymmetrization for synthesis of chiral enones using flavoenzymes." Nature Synthesis 3.11 (2024): 1340-1348. link

Applications in Antibody and Other Protein Design

  1. Jin, Wengong, et al. "Iterative Refinement Graph Neural Network for Antibody Sequence-Structure Co-design." ICLR 2021.
  2. Zhu, Tian, Milong Ren, and Haicang Zhang. "Antibody Design Using a Score-based Diffusion Model Guided by Evolutionary, Physical and Geometric Constraints." ICML 2024.
  3. Jin, Wengong, et al. "DSMBind: SE (3) denoising score matching for unsupervised binding energy prediction and nanobody design." NeurIPS 2023.
  4. Jin, Wengong, Regina Barzilay, and Tommi Jaakkola. "Antibody-antigen docking and design via hierarchical equivariant refinement." ICML 2022.
  5. Nori, Divya, and Wengong Jin. "Rnaflow: Rna structure & sequence design via inverse folding-based flow matching." ICML 2024.
  6. Huang, Tinglin, et al. "Protein-nucleic acid complex modeling with frame averaging transformer." NeurIPS 2024.