my photo

Kaifeng Lyu 吕凯风

I am a postdoctoral research fellow in the programs of Modern Paradigms in Generalization and Special Year on Large Language Models and Transformers at the Simons Institute at UC Berkeley. Before that, I obtained my Ph.D. in Computer Science in 2024 from Princeton University and I was very fortunate to be advised by Prof. Sanjeev Arora. I will be joining the Institute for Interdisciplinary Information Sciences (IIIS) at Tsinghua University as a Tenure-Track Assistant Professor in Fall 2025.

I did my undergraduate at Tsinghua University and received a B.Eng. in Computer Science and Technology in 2019. At Tsinghua, I was a student of Yao Class headed by Prof. Andrew Chi-Chih Yao and I was very fortunate to be advised by Prof. Jian Li.

Preprints

RNNs are not Transformers (Yet): The Key Bottleneck on In-context Retrieval
  • Kaiyue Wen*
  • Xingyu Dang*
  • Kaifeng Lyu
Efficient Stagewise Pretraining via Progressive Subnetworks
  • Abhishek Panigrahi*
  • Nikunj Saunshi*
  • Kaifeng Lyu
  • Sobhan Miryoosefi
  • Sashank Reddi
  • Satyen Kale
  • Sanjiv Kumar
Keeping LLMs Aligned After Fine-tuning: The Crucial Role of Prompt Templates
  • Kaifeng Lyu*
  • Haoyu Zhao*
  • Xinran Gu*
  • Dingli Yu
  • Anirudh Goyal
  • Sanjeev Arora

Conference Papers

A Quadratic Synchronization Rule for Distributed Deep Learning
  • Xinran Gu*
  • Kaifeng Lyu*
  • Sanjeev Arora
  • Jingzhao Zhang
  • Longbo Huang
Dichotomy of Early and Late Phase Implicit Biases Can Provably Induce Grokking
  • Kaifeng Lyu*
  • Jikai Jin*
  • Zhiyuan Li
  • Simon S. Du
  • Jason D. Lee
  • Wei Hu
DistillSpec: Improving Speculative Decoding via Knowledge Distillation
  • Yongchao Zhou
  • Kaifeng Lyu
  • Ankit Singh Rawat
  • Aditya Krishna Menon
  • Afshin Rostamizadeh
  • Sanjiv Kumar
  • Jean-François Kagy
  • Rishabh Agarwal
The marginal value of momentum for small learning rate SGD
  • Runzhe Wang
  • Sadhika Malladi
  • Tianhao Wang
  • Kaifeng Lyu
  • Zhiyuan Li
Understanding incremental learning of gradient descent: A fine-grained analysis of matrix sensing
  • Jikai Jin
  • Zhiyuan Li
  • Kaifeng Lyu
  • Simon S. Du
  • Jason D. Lee
Why (and When) does Local SGD Generalize Better than SGD?
  • Xinran Gu*
  • Kaifeng Lyu*
  • Longbo Huang
  • Sanjeev Arora
Understanding the Generalization Benefit of Normalization Layers: Sharpness Reduction
  • Kaifeng Lyu
  • Zhiyuan Li
  • Sanjeev Arora
On the SDEs and Scaling Rules for Adaptive Gradient Algorithms
  • Sadhika Malladi*
  • Kaifeng Lyu*
  • Abhishek Panigrahi
  • Sanjeev Arora
New Definitions and Evaluations for Saliency Methods: Staying Intrinsic, Complete and Sound
  • Arushi Gupta*
  • Nikunj Saunshi*
  • Dingli Yu*
  • Kaifeng Lyu
  • Sanjeev Arora
Gradient Descent on Two-layer Nets: Margin Maximization and Simplicity Bias
  • Kaifeng Lyu*
  • Zhiyuan Li*
  • Runzhe Wang*
  • Sanjeev Arora
Towards Resolving the Implicit Bias of Gradient Descent for Matrix Factorization: Greedy Low-Rank Learning
  • Zhiyuan Li
  • Yuping Luo
  • Kaifeng Lyu
(alphabetical order)
Reconciling Modern Deep Learning with Traditional Optimization Analyses: The Intrinsic Learning Rate
  • Zhiyuan Li*
  • Kaifeng Lyu*
  • Sanjeev Arora
Gradient Descent Maximizes the Margin of Homogeneous Neural Networks
  • Kaifeng Lyu
  • Jian Li
Theoretical Analysis of Auto Rate-Tuning by Batch Normalization
  • Sanjeev Arora
  • Zhiyuan Li
  • Kaifeng Lyu
(alphabetical order)
Fine-grained complexity meets IP = PSPACE
  • Lijie Chen
  • Shafi Goldwasser
  • Kaifeng Lyu
  • Guy N Rothblum
  • Aviad Rubinstein
(alphabetical order)
Single-Source Bottleneck Path Algorithm Faster than Sorting for Sparse Graphs
  • Ran Duan
  • Kaifeng Lyu
  • Hongxun Wu
  • Yuanhang Xie
(alphabetical order)
Learning gradient descent: Better generalization and longer horizons
  • Kaifeng Lv*
  • Shunhua Jiang*
  • Jian Li
(Contribution order by default; Asterisk * stands for equal contribution.)

Professional Services

  • Organizer, NeurIPS 2024 Workshop on Mathematics of Modern Machine Learning (M3L 2024).
  • Organizer, NeurIPS 2023 Workshop on Mathematics of Modern Machine Learning (M3L 2023).
  • Conference Reviewer: ICML (2020-2023), NeurIPS (2020-2023), ICLR (2022-2024), TPAMI, COLT (2020), AAAI (2020), KDD (2022).
  • Journal Reviewer: TMLR, JMLR, TPAMI, AIJ.
  • Organizer, Yao Class Seminar, Tsinghua University (Fall 2019, Fall 2020, Spring 2021).

Universal Online Judge

  • I founded the Universal Online Judge (UOJ) in 2014, a popular online judge system in China.
  • UOJ is capable of testing both traditional and non-traditional programming problems in OI (Olympiad in Informatics). A team of top OI players regularly hosts programming contests on UOJ.
  • [Link] [GitHub] [Docs]