Maryam Fazel (University of Washington) - Flat Minima and Generalization in Learning: the Case of Low-rank Matrix Recovery

Part of the Autumn 2023 Distinguished Speaker Series.

A headshot of Maryam Fazel is accompanied by text describing the Distinguished Speaker Series event: it takes place at 12:30pm in John Crerar Library Room 390.

Many behaviors empirically observed in deep neural networks still lack satisfactory explanation; e.g., how does an overparameterized neural network avoid overfitting and generalize to unseen data? Empirical evidence suggests that generalization depends on which zero-loss local minimum is attained during training. The shape of the training loss around a local minimum seems to strongly impact the model’s performance: “Flat” local minima—around which the loss grows slowly—appear to generalize well. Clarifying this phenomenon can help explain generalization properties, which still largely remain a mystery.

Towards this goal, in this talk we focus on the simplest class of overparameterized nonlinear models, those arising in low-rank matrix recovery. We study the following key models: overparametrized matrix sensing, bilinear sensing and phase retrieval, robust Principal Component Analysis, covariance matrix estimation, and single hidden layer neural networks with quadratic activation. We prove that in all these models, flat minima (measured by the trace of the Hessian, a notion of average curvature) exactly recover the ground truth, under standard statistical assumptions. These results suggest (i) a theoretical basis for favoring methods that bias iterates towards flat solutions, (ii) use of Hessian trace as a good regularizer for some learning tasks. Since the landscape properties we proved are algorithm-agnostic, a future direction is to pair these findings with the analysis of common training algorithms to understand the interplay between the loss landscape and algorithmic implicit bias.

Bio: Maryam Fazel is the Moorthy Family Professor of Electrical and Computer Engineering at the University of Washington, with adjunct appointments in Computer Science and Engineering, Mathematics, and Statistics. Maryam received her MS and PhD from Stanford University, her BS from Sharif University of Technology in Iran, and was a postdoctoral scholar at Caltech before joining UW. She is a recipient of the NSF Career Award, UWEE Outstanding Teaching Award, a UAI conference Best Student Paper Award with her student. She directs the Institute for Foundations of Data Science (IFDS), a multi-site NSF TRIPODS Institute. She serves on the Editorial board of the MOS-SIAM Book Series on Optimization, and is an Associate Editor of the SIAM Journal on Mathematics of Data Science. Her current research interests are in the area of optimization in machine learning and control.

Agenda

Friday, October 6, 2023

12:00 pm–12:30 pm

Lunch

Lunch will be provided on a first come, first serve basis.

12:30 pm–1:30 pm

Initiatives

Programs

Academic Programs

Other Programs

Community Data Fellow Stephania Tello Zamudio helps broaden internet access for Illinois residents

DSI Software Engineers create interactive map tool to maximize climate investment tax benefits

Transform cohort 3 participant Healee uses AI to improve healthcare

Towards New Physics at Future Colliders: Machine Learning Optimized Detector and Accelerator Design

Uncovering Patterns in Structure for Voltage Sensing Membrane Proteins with Machine Learning

Finding the likely causes when potential explanatory factors look alike

First Annual UChicago Transit Datathon

Ask a Student in MS in Applied Data Science

Bryce Meredig (Northwestern) – AI+Science Schmidt Fellows Speaker Series

Agenda

Friday, October 6, 2023

Lunch

Talk and Q&A

Autumn 2023 Distinguished Speaker Series

Elizabeth Barnes (Colorado State University) – Explainable AI for Climate Science: Opening the black box to reveal planet Earth

Adam Klivans (UT Austin) – Testable Learning

Milind Tambe (Harvard) – Integrating ML+Optimization: Driving Social Impact in public health and conservation

Q. Vera Liao (Microsoft Research) – Human-Centered AI Transparency

More on this topic

First Annual UChicago Transit Datathon

AI+Science Hackathon

Derivatives of Data Science Spring Quarter Kick-Off

Irina Rish (Université de Montréal) – Scaling to AGI