Research Talk: Tao Lin (Harvard)
Tao Lin (Harvard)
Thursday, June 11
11:00am – 12:00pm
Data Science Institute, Room 353
5460 S University Ave
Talk Title: Reward Shaping for (Inference-Time) Alignment
Abstract: Existing alignment methods directly use the reward model learned from user preference data to optimize an LLM policy, subject to KL regularization with respect to the base policy.
This practice is suboptimal for maximizing user’s utility because the KL regularization may cause the LLM to inherit the bias in the base policy that conflicts with user preferences. While amplifying rewards for preferred outputs can mitigate this bias, it also increases the risk of reward hacking.
This tradeoff motivates the problem of optimally designing reward models under KL regularization. We formalize this reward model optimization problem as a Stackelberg game, and show that a simple reward shaping scheme can effectively approximate the optimal reward model. We empirically evaluate our method in inference-time alignment settings and demonstrate that it integrates seamlessly into existing alignment methods with minimal overhead. Our method consistently improves average reward and achieves win–tie rates exceeding 66% against all baselines, averaged across evaluation settings.
Bio: Tao Lin is an incoming assistant professor in the School of Data Science at the Chinese University of Hong Kong Shenzhen, and currently a postdoctoral researcher with the EconCS group at Microsoft Research (New England), mainly working with Alex Slivkins. Tao obtained his PhD in Computer Science from Harvard University in 2025, advised by Yiling Chen. Tao’s research covers algorithmic game theory, mechanism design, information design, and their intersections with machine learning.
Research Talk: Zifeng Zhao (Notre Dame)
Chicago Data Night: Tian Li (UChicago)
AICE Speaker Series: Tapio Schneider (Caltech)