DPLM-2: A Multimodal Diffusion Protein Language Model
Published in ICLR, 2025
DPLM-2 is a multimodal protein language model for joint sequence and structure modeling. It introduces a protein structure tokenizer that discretizes 3D atomic coordinates, enabling amino acid and structure tokens to be jointly modeled with discrete diffusion. This unlocks cross-modal protein generation for diverse protein design tasks and improves multimodal protein understanding.
Code: https://github.com/bytedance/dplm
Keywords: multimodal protein language models, discrete diffusion, structure tokenization, sequence-structure co-generation
Recommended citation: Xinyou Wang*, Zaixiang Zheng, Fei Ye, Dongyu Xue, Shujian Huang, and Quanquan Gu. (2025). "DPLM-2: A Multimodal Diffusion Protein Language Model." International Conference on Learning Representations.
Download Paper
