Skip to main content

Please join us for a Statistics and DSI joint colloquium.

Monday, February 24
11:30am – 12:30pm
Jones Hall 303
5747 S Ellis Avenue
Chicago, IL 60637

Abstract: We build neural networks in a modular and programmatic way using software libraries like PyTorch and JAX. But optimization theory has not caught up to the flexibility of this paradigm, and practical advances in neural net optimization are largely driven by heuristics. In this talk we argue that to treat deep learning rigorously, we must build our optimization theory programmatically and in lockstep with the neural network itself. To instantiate this idea we propose the “modular norm”, which is a norm on the weight space of general neural architectures. The modular norm is constructed by stitching together norms on individual tensor spaces as the architecture is constructed. The modular norm has several applications: automatic Lipschitz certificates for general architectures in both weights and inputs; automatic learning rate transfer across scale; and most recently we built the “duality theory” for the modular norm, leading to fast optimizers like “Muon”, which set speed records for training transformers. We are building the theory of the modular norm into a software library called Modula to ease the development and deployment of metrized deep learning algorithms—you can find out more at https://modula.systems/.

Bio: Jeremy Bernstein received the B.S. and M.S. degrees in experimental and theoretical physics from the Trinity College, Cambridge, U.K., in 2016, and the Ph.D. degree in computation and neural systems from Caltech, in 2022. He is currently a Postdoctoral Researcher with Massachusetts Institute of Technology (MIT) focusing on the mathematical foundations of natural and artificial intelligence. His research interests include uncovering the computational and statistical laws of natural and artificial intelligence, thereby designing learning systems that are more efficient, automatic, useful in practice, and societal implications of artificial intelligence. He has been awarded several fellowships, including the NVIDIA Graduate Fellowship.

arrow-left-smallarrow-right-large-greyarrow-right-large-yellowarrow-right-largearrow-right-long-yellowarrow-right-smallclosefacet-arrow-down-whitefacet-arrow-downCheckedCheckedlink-outmag-glass