Noah Golowich
Bio: Noah Golowich is a PhD Student at the Massachusetts Institute of Technology, advised by Constantinos Daskalakis and Ankur Moitra. He completed his A.B. and S.M. at Harvard University. His research interests lie in theoretical machine learning, with a particular focus on the connections between multi-agent learning, game theory, and online learning, and in theoretical reinforcement learning. He is supported by a Fannie & John Hertz Foundation Fellowship and an NSF Graduate Fellowship.
Talk Title: Exploring and Learning in Sparse Linear MDPs without Computationally Intractable Oracles
Abstract: The key assumption underlying linear Markov Decision Processes (MDPs) is that the learner has access to a known feature map from state-action pairs to d-dimensional vectors, and that the rewards and transitions are linear functions in this representation. But where do these features come from? In the absence of expert domain knowledge, a tempting strategy is to use the “kitchen sink” approach and hope that the true features are included in a much larger set of potential features. In this work we revisit linear MDPs from the perspective of feature selection. In a k-sparse linear MDP, there is an unknown subset of size k containing all the relevant features, and the goal is to learn a near-optimal policy in only poly(k, log d) interactions with the environment. Our main result is the first polynomial-time algorithm for this problem. In contrast, earlier works either made prohibitively strong assumptions that obviated the need for exploration, or required solving computationally intractable optimization problems.
As a corollary of our main result, we give an algorithm for learning a near-optimal policy in block MDPs whose decoding function is a low-depth decision tree; the algorithm runs in quasi-polynomial time and takes a polynomial number of samples. This can be seen as a reinforcement learning analogue of classic results in computational learning theory. Furthermore, it gives a natural model where improving the sample complexity via representation learning is computationally feasible.