Skip to main content

Bio: Jingfeng Wu is a postdoc at the Simons Institute at UC Berkley, hosted by Prof. Peter Bartlett and Prof. Bin Yu. He earned his Ph.D. in Computer Science at Johns Hopkins University, advised by Prof. Vladimir Braverman. Before that, he obtained his B.S. in Mathematics and M.S. in Applied Math from Peking University.

His research interests are in the theory and algorithms of deep learning and related topics in machine learning, optimization, and statistics.

Talk Title: Theoretical Insights into Gradient Descent and Stochastic Gradient Descent in Deep Learning

Abstract: Gradient Descent (GD) and Stochastic Gradient Descent (SGD) are fundamental optimization algorithms in machine learning, but their behaviors sometimes defy intuitions from classic optimization and statistical learning theories. In deep learning, GD often exhibits local oscillations while still converging over time. Moreover, SGD-trained models generalize effectively even when overparameterized. In this talk, I will revisit the theories of GD and SGD for classic problems but in new scenarios motivated by deep learning, presenting two novel insights:

(1) For logistic regression with separable data, GD with an arbitrarily large stepsize minimizes empirical risk, potentially in a non-monotonic fashion.

(2) For linear regression and ReLU regression, one-pass SGD and its variants can achieve low excess risk, even in overparameterized regime.

arrow-left-smallarrow-right-large-greyarrow-right-large-yellowarrow-right-largearrow-right-long-yellowarrow-right-smallfacet-arrow-down-whitefacet-arrow-downCheckedCheckedlink-outmag-glass