Margo Seltzer (University of British Columbia): Distinguished Speaker Series
Part of the 2025 Distinguished Speaker Series.
The University of Chicago Data Science Institute, Department of Statistics, Department of Computer Science, and Committee on Computational and Applied Mathematics are proud to announce our 2025 Distinguished Speaker Series. Join us for stimulating talks from leading data science researchers exploring and expanding the fundamental methods and approaches that transform large and complex datasets into knowledge and action.
Events will take place in the John Crerar Library Building, Room 390, with a lunch social starting at noon before the lecture at 12:30 p.m (unless otherwise noted).
Abstract: Data Science has emerged as a new discipline that unites data, statistics, and the mathematical and computational methods that allow us to derive insights from data. In a previous era, we might have described this field as the the new, computational methodological pillar of research, complementing the theoretical and empirical pillars. Although computing had been fundamental to many fields of scientific research since the advent of computers, this new pillar recognized the impact that computation offered unprecedented scalability in both data and compute. The emergence of data science recognizes that the methodology that underlies this computational pillar is its own field of inquiry.
However, the data scientist is faced with scientific challenges on two levels. First, there are the fundamental domain-specific questions one is trying to answer using this data-centric methodology. The research practices for addressing such questions are well-understood but do not easily translate into practice. Second, the practice of data science introduces new methodological questions concerning the authenticity and veracity of data and the reproducibility of computational experiments. I call these the challenges of meta-science. These meta-scientific challenges appear similar to software engineering. However tools developed for software engineering treat computational artifacts as primary output; in data science, computational artifacts are not primary output, they are the tools used to produce primary output. This mismatch in purpose leads to a dearth of solutions to the meta-scientific questions.
In this talk, I’ll discuss the myriad ways that we have failed to produce the right tools to support data scientists. I’ll contrast the process and goals of software engineering and data science, how they differ, and why these differences have led to the reproducibility crisis. Finally, I’ll present a few examples of a way forward, including tools that meet data scientists where they work, make reproducibility easier, and provide a vision of how we improve upon the meta-scientific practices in data science.
Bio: Margo Seltzer is Canada 150 Research Chair in Computer Systems and the Cheriton Family chair in Computer Science at the University of British Columbia. Her research interests are in systems, construed quite broadly: systems for capturing and accessing data provenance, file systems, databases, transaction processing systems, storage and analysis of graph-structured data, and systems for constructing optimal and interpretable machine learning models.
She is the author of several widely-used software packages including database and transaction libraries and the 4.4BSD log-structured file system. Dr. Seltzer was a co-founder and CTO of Sleepycat Software, the makers of Berkeley DB, the recipient of the 2021 ACM Software Sytems award and the 2020 ACM SIGMOD Systems Award.
She serves on the Computer Science and Telecommunications Board (CSTB) of the (US) National Academies. She is a past chair and vice-chair of the Computer Science Committee of the National Academy of Engineering and a past President of the USENIX Assocation. She served as the USENIX representative to the Computing Research Association Board of Directors and on the Computing Community Consortium.
She is a member of the National Academy of Engineering and the American Academy of Arts and Sciences, a Sloan Foundation Fellow in Computer Science, an ACM Fellow, a Bunting Fellow, and was the recipient of the 1996 Radcliffe Junior Faculty Fellowship. She is also recognized as an outstanding teacher and mentor, having received the Phi Beta Kappa teaching award in 1996, the Abrahmson Teaching Award in 1999, the Capers and Marion McDonald Award for Excellence in Mentoring and Advising in 2010, the CRA-E Undergraduate Research Mentoring Award in 2017, and a UBC Killam Teaching prize in 2023.
Professor Seltzer received an A.B. degree in Applied Mathematics from Harvard/Radcliffe College and a Ph. D. in Computer Science from the University of California, Berkeley.
Agenda
Friday, April 11, 2025
Lunch
Lunch will be provided on a first come, first serve basis.