Diffusion Language Models Are Versatile Protein Learners
Published in ICML, 2024
DPLM is a protein language model that unifies protein sequence generation and understanding. It uses discrete diffusion to provide a global receptive field, making it well suited for modeling 3D spatial dependencies among amino acids. DPLM achieves state-of-the-art protein sequence generation performance and outperforms Meta’s ESM2 on protein understanding benchmarks. Models at 150M, 650M, and 3B parameters further demonstrate scalable performance improvements.
Code: https://github.com/bytedance/dplm
Keywords: diffusion protein language models, protein sequence generation, protein representation learning, controllable protein design
Recommended citation: Xinyou Wang*, Zaixiang Zheng*, Fei Ye, Dongyu Xue, Shujian Huang, and Quanquan Gu. (2024). "Diffusion Language Models Are Versatile Protein Learners." Proceedings of the 41st International Conference on Machine Learning, 52309-52333.
Download Paper
