Part of the Autumn 2023 Distinguished Speaker Series.
Many behaviors empirically observed in deep neural networks still lack satisfactory explanation; e.g., how does an overparameterized neural network avoid overfitting and generalize to unseen data? Empirical evidence suggests that generalization depends on which zero-loss local minimum is attained during training. The shape of the training loss around a local minimum seems to strongly impact the model’s performance: “Flat” local minima—around which the loss grows slowly—appear to generalize well. Clarifying this phenomenon can help explain generalization properties, which still largely remain a mystery.
Towards this goal, in this talk we focus on the simplest class of overparameterized nonlinear models, those arising in low-rank matrix recovery. We study the following key models: overparametrized matrix sensing, bilinear sensing and phase retrieval, robust Principal Component Analysis, covariance matrix estimation, and single hidden layer neural networks with quadratic activation. We prove that in all these models, flat minima (measured by the trace of the Hessian, a notion of average curvature) exactly recover the ground truth, under standard statistical assumptions. These results suggest (i) a theoretical basis for favoring methods that bias iterates towards flat solutions, (ii) use of Hessian trace as a good regularizer for some learning tasks. Since the landscape properties we proved are algorithm-agnostic, a future direction is to pair these findings with the analysis of common training algorithms to understand the interplay between the loss landscape and algorithmic implicit bias.
Bio: Maryam Fazel is the Moorthy Family Professor of Electrical and Computer Engineering at the University of Washington, with adjunct appointments in Computer Science and Engineering, Mathematics, and Statistics. Maryam received her MS and PhD from Stanford University, her BS from Sharif University of Technology in Iran, and was a postdoctoral scholar at Caltech before joining UW. She is a recipient of the NSF Career Award, UWEE Outstanding Teaching Award, a UAI conference Best Student Paper Award with her student. She directs the Institute for Foundations of Data Science (IFDS), a multi-site NSF TRIPODS Institute. She serves on the Editorial board of the MOS-SIAM Book Series on Optimization, and is an Associate Editor of the SIAM Journal on Mathematics of Data Science. Her current research interests are in the area of optimization in machine learning and control.
Friday, October 6, 2023
Lunch will be provided on a first come, first serve basis.