Peking University Weiming Lake

Deli Chen

陈德里

Senior Researcher at DeepSeek AI. Building next-generation large language models.
Core contributor to DeepSeek-V1, V2, V3, V4, DeepSeek-R1 (Nature Cover), DeepSeek-Coder, DeepSeek-MoE architecture, etc.
B.S. & M.S. from Peking University · Previously Tencent WeChat AI.

All opinions are my own. | INTP-T | 人心惟危,道心惟微 | #AGIforEveryone

Deli Chen
23,500+
Citations
21
h-index
29+
Papers

Experience

2023 - Present
Senior Researcher, DeepSeek AI
Core contributor to DeepSeek-V1, V2, V3, V4, DeepSeek-R1 (Nature Cover), DeepSeek-Coder, DeepSeek-MoE architecture, etc. Public spokesperson at NVIDIA GTC 2024 and World Internet Conference 2025.
2021 - 2023
Researcher, Tencent WeChat AI
NLP and language model research.
2019 - 2021
M.S. in Computer Science, Peking University
MOE Key Lab of Computational Linguistics (LancoPKU). Advisor: Prof. Xu Sun. Research on GNN, NLP, and financial AI.
2015 - 2019
B.S. in Information Management, Peking University
School of Information Management.

Research

🧠
Large Language Models
DeepSeek series: scaling, MoE architecture, efficient training.
💡
Reasoning & RL
RL for LLM reasoning (DeepSeek-R1), step-by-step verification.
🕸️
Graph Neural Networks
Over-smoothing, topology-imbalance, contrastive learning.
🔒
LLM Safety & Alignment
Backdoor detection, watermarking, diffusion purification.
📈
Financial NLP
Stock prediction with event graphs, forex news aggregation.
🔍
Interpretability
In-context learning: label words as anchors.

Selected Publications

DeepSeek-V4 Technical Report
2025New
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
Nature 2025 (Cover Article)Nature Cover
9,541
Nature Vol.645 Cover - SELF-HELP DeepSeek-R1 Figure 1
DeepSeek-V3 Technical Report
arXiv 2024
4,600
Measuring and Relieving the Over-smoothing Problem for Graph Neural Networks from the Topological View
AAAI 2020
1,748
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models
ACL 2024
1,017
Math-Shepherd: Verify and Reinforce LLMs Step-by-Step without Human Annotations
ACL 2024
993
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
arXiv 2024
952
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
arXiv 2024
926
DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence
arXiv 2024
467
Label Words are Anchors: An Information Flow Perspective for Understanding In-Context Learning
EMNLP 2023Best Long Paper
284
EMNLP 2023 Best Long Paper Award - Label Words are Anchors
DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models
arXiv 2025
281
Modeling the Stock Relation with Graph Network for Overnight Stock Movement Prediction
IJCAI 2021
281
Topology-Imbalance Learning for Semi-Supervised Node Classification
NeurIPS 2021
173
Towards Codable Text Watermarking for Large Language Models
ICLR 2023
139
Fed-FA: Theoretically Modeling Client Data Divergence for Federated Language Backdoor Defense
NeurIPS 2023
CascadeBERT: Accelerating Inference of Pre-trained Language Models via Calibrated Complete Models Cascade
Findings of EMNLP 2021
72
Diffusion Theory as a Scalpel: Detecting and Purifying Poisonous Dimensions in Pre-trained Language Models
Findings of ACL 2023
17

Talks & Appearances

News & Highlights

"Label Words are Anchors" wins EMNLP 2023 Best Long Paper
Dec 2023Best Long Paper
DeepSeek-R1 crosses 9,500 citations in 4 months
May 2025

Lab & Collaborators

Blog

Chill Projects

Side projects for fun — 摸鱼时做的小玩意儿。