Skip to main content

Bio: Laixi Shi is a Ph.D. candidate in the Department of Electrical and Computer Engineering at Carnegie Mellon University, fortunately advised by Prof. Yuejie Chi. She has also interned at Google Research Brain Team and Mitsubishi Electric Research Laboratories. Her research interests include reinforcement learning (RL), non-convex optimization, high-dimensional statistical estimation, and signal processing, ranging from theory to applications. Her current research focuses on 1) theoretical works: designing provable sample-efficient algorithms for value-based RL, offline RL, and robust RL problems, resorting to optimization and statistics; 2) practical works: reinforcement learning algorithms (DRL) on different large-scale problems such as Atari games, web navigation and etc.

Talk Title: Provable Algorithms for Reinforcement Learning: Efficiency and Robustness

Talk Abstract: Reinforcement learning (RL) has achieved remarkable success recently to deal with unknown environments and maximize some long-term cumulative rewards. Contemporary reinforcement learning has to deal with unknown environments with unprecedentedly large dimensionality. In order to fundamentally understand and improve the effectiveness of RL algorithms in high dimensions, a recent body of work sought to develop a finite-sample theoretical framework to analyze the algorithms of interest. My work belongs to this recent line of delineating the dependency of algorithm performance on salient parameters in a non-asymptotic fashion in different RL settings. Besides maximizing the expected total reward, perhaps an equally important goal—to say the least— for an RL agent is safety and robustness, especially in high-stake applications such as robotics, autonomous driving, clinical trials, financial investments, and so on. Inspired by the robustness requirement, in this talk, we introduce our work on distributional robust RL with Kullback-Leibler (KL) divergence as the distance, referring to the problem of learning an optimal policy that is robust to future environment perturbations. I will introduce our proposed robust RL algorithm, which to the best of our knowledge is the first near-optimal algorithm for this problem, which matches the minimax lower bound up to a polynomial factor of the effective horizon length.

arrow-left-smallarrow-right-large-greyarrow-right-large-yellowarrow-right-largearrow-right-long-yellowarrow-right-smallfacet-arrow-down-whitefacet-arrow-downCheckedCheckedlink-outmag-glass