my photo

吕凯风 Kaifeng Lyu

我将于 2025 年秋季入职清华大学交叉信息院任助理教授。

我现在是加州大学伯克利分校Simons研究所的一名博士后研究员,参与项目 Modern Paradigms in GeneralizationSpecial Year on Large Language Models and Transformers。我于 2024 年获得普林斯顿大学计算机科学博士学位,师从 Prof. Sanjeev Arora。 本科就读于清华大学姚班,于2019年毕业并取得计算机科学与技术工学学士学位。本科时的学术研究曾由 李建教授 指导。

预印本论文

RNNs are not Transformers (Yet): The Key Bottleneck on In-context Retrieval
  • Kaiyue Wen*
  • Xingyu Dang*
  • Kaifeng Lyu
Efficient Stagewise Pretraining via Progressive Subnetworks
  • Abhishek Panigrahi*
  • Nikunj Saunshi*
  • Kaifeng Lyu
  • Sobhan Miryoosefi
  • Sashank Reddi
  • Satyen Kale
  • Sanjiv Kumar
Keeping LLMs Aligned After Fine-tuning: The Crucial Role of Prompt Templates
  • Kaifeng Lyu*
  • Haoyu Zhao*
  • Xinran Gu*
  • Dingli Yu
  • Anirudh Goyal
  • Sanjeev Arora

会议论文

A Quadratic Synchronization Rule for Distributed Deep Learning
  • Xinran Gu*
  • Kaifeng Lyu*
  • Sanjeev Arora
  • Jingzhao Zhang
  • Longbo Huang
Dichotomy of Early and Late Phase Implicit Biases Can Provably Induce Grokking
  • Kaifeng Lyu*
  • Jikai Jin*
  • Zhiyuan Li
  • Simon S. Du
  • Jason D. Lee
  • Wei Hu
DistillSpec: Improving Speculative Decoding via Knowledge Distillation
  • Yongchao Zhou
  • Kaifeng Lyu
  • Ankit Singh Rawat
  • Aditya Krishna Menon
  • Afshin Rostamizadeh
  • Sanjiv Kumar
  • Jean-François Kagy
  • Rishabh Agarwal
The marginal value of momentum for small learning rate SGD
  • Runzhe Wang
  • Sadhika Malladi
  • Tianhao Wang
  • Kaifeng Lyu
  • Zhiyuan Li
Understanding incremental learning of gradient descent: A fine-grained analysis of matrix sensing
  • Jikai Jin
  • Zhiyuan Li
  • Kaifeng Lyu
  • Simon S. Du
  • Jason D. Lee
Why (and When) does Local SGD Generalize Better than SGD?
  • Xinran Gu*
  • Kaifeng Lyu*
  • Longbo Huang
  • Sanjeev Arora
Understanding the Generalization Benefit of Normalization Layers: Sharpness Reduction
  • Kaifeng Lyu
  • Zhiyuan Li
  • Sanjeev Arora
On the SDEs and Scaling Rules for Adaptive Gradient Algorithms
  • Sadhika Malladi*
  • Kaifeng Lyu*
  • Abhishek Panigrahi
  • Sanjeev Arora
New Definitions and Evaluations for Saliency Methods: Staying Intrinsic, Complete and Sound
  • Arushi Gupta*
  • Nikunj Saunshi*
  • Dingli Yu*
  • Kaifeng Lyu
  • Sanjeev Arora
Gradient Descent on Two-layer Nets: Margin Maximization and Simplicity Bias
  • Kaifeng Lyu*
  • Zhiyuan Li*
  • Runzhe Wang*
  • Sanjeev Arora
Towards Resolving the Implicit Bias of Gradient Descent for Matrix Factorization: Greedy Low-Rank Learning
  • Zhiyuan Li
  • Yuping Luo
  • Kaifeng Lyu
(按字母序排序)
Reconciling Modern Deep Learning with Traditional Optimization Analyses: The Intrinsic Learning Rate
  • Zhiyuan Li*
  • Kaifeng Lyu*
  • Sanjeev Arora
Gradient Descent Maximizes the Margin of Homogeneous Neural Networks
  • Kaifeng Lyu
  • Jian Li
Theoretical Analysis of Auto Rate-Tuning by Batch Normalization
  • Sanjeev Arora
  • Zhiyuan Li
  • Kaifeng Lyu
(按字母序排序)
Fine-grained complexity meets IP = PSPACE
  • Lijie Chen
  • Shafi Goldwasser
  • Kaifeng Lyu
  • Guy N Rothblum
  • Aviad Rubinstein
(按字母序排序)
Single-Source Bottleneck Path Algorithm Faster than Sorting for Sparse Graphs
  • Ran Duan
  • Kaifeng Lyu
  • Hongxun Wu
  • Yuanhang Xie
(按字母序排序)
Learning gradient descent: Better generalization and longer horizons
  • Kaifeng Lv*
  • Shunhua Jiang*
  • Jian Li
(默认按贡献排序;星号 * 表示贡献相同)

Professional Services

  • Organizer, NeurIPS 2024 Workshop on Mathematics of Modern Machine Learning (M3L 2024).
  • Organizer, NeurIPS 2023 Workshop on Mathematics of Modern Machine Learning (M3L).
  • Conference Reviewer: ICML (2020-2023), NeurIPS (2020-2023), ICLR (2022-2024), TPAMI, COLT (2020), AAAI (2020), KDD (2022).
  • Journal Reviewer: TMLR, JMLR, TPAMI, AIJ.
  • Organizer, Yao Class Seminar, Tsinghua University (Fall 2019, Fall 2020, Spring 2021).

Universal Online Judge

  • 为了促进信息学竞赛生之间的交流,我曾于 2014 年创办了 Universal Online Judge (UOJ)。
  • UOJ 是一款能够自由测评传统和非传统 OI 题的 OJ。自创办起,UOJ 定期举办比赛,主要由每年的国家集训队成员组织。
  • [链接] [GitHub] [文档]