Autumn 2023 Cohort
Emily AikenPhD Candidate, University of California, Berkeley
Shicong CenPhD Candidate, Carnegie Mellon University
Serina ChangPhD Candidate, Stanford University
Marie-Laure CharpignonPhD Candidate, Massachusetts Institute of Technology
Xinyi ChenPhD Candidate, Princeton University
Lijun DingPostdoctoral Scholar, Institute for Foundations of Data Science, University of Wisconsin-Madison and University of Washington
Hao-Wen DongPhD Candidate, University of California San Diego
Dongqi FuPhD Candidate, University of Illinois Urbana-Champaign
Johann GaeblerPhD Student, Harvard University
Kristina GligorićPostdoctoral Scholar, Stanford University
Noah GolowichPhD Student, Massachusetts Institute of Technology
Ying JinPhD Candidate, Stanford University
Zhijing JinPhD Candidate, Max Planck Institute & ETH Zürich
Mikhail KhodakPhD Student, Carnegie Mellon University
Yuanyuan LeiPhD Student, Texas A&M University
Ben LengerichPostdoctoral Associate, Massachusetts Institute of Technology
Lucy LiPhD Candidate, University of California, Berkeley
Paul LiangPhD Student, Carnegie Mellon University
Zhijian LiuPhD Candidate, Massachusetts Institute of Technology
Siddharth Mishra-SharmaPostdoctoral Fellow, Massachusetts Institute of Technology
Roshni SahooPhD Student, Stanford University
Hua ShenPostdoctoral Research Fellow, University of Michigan
Zhiqing SunPhD Student, Carnegie Mellon University
Bahareh TolooshamsPostdoctoral Scholar, California Institute of Technology
Rose WangPhD Candidate, Stanford University
Yuqing WangPhD Candidate, Georgia Institute of Technology
Jingfeng WuPostdoctoral Fellow, University of California, Berkeley
Lily XuPhD Candidate, Harvard University
Yuzhe YangPhD Student, Massachusetts Institute of Technology
Sam ZhangPhD Candidate, University of Colorado Boulder
Shufan ZhangPhD Student, University of Waterloo
Chudi ZhongPhD Candidate, Duke University
Bio: Emily is a PhD candidate at the UC Berkeley School of Information, where she studies the application of novel algorithms and digital data sources for social protection programs in low-income countries. Her work has been published in venues including Nature and Science Advances, and she is a recipient of a Microsoft Research PhD fellowship. Prior to Berkeley, Emily received her bachelor’s degree in computer science from Harvard.
Talk Title: Targeting humanitarian aid with machine learning and digital data
Abstract: The vast majority of humanitarian aid and social protection programs globally are targeted, providing assistance to individuals or communities identified to be poorest or most in need. In low and middle-income countries, the targeting of aid programs is often limited by low-quality, out-of-date, or missing data on poverty and vulnerability. Novel “big” digital data sources, such as those captured by satellites, mobile phones, and financial services providers — when combined with advances in machine learning — can improve the accuracy of aid program targeting. In this talk, I will cover empirical results on the accuracy of these new data-driven and algorithmic approaches to aid allocation, and will discuss emergent implications for fairness, privacy, transparency, and community dynamics.
Bio: Shicong Cen is a final-year Ph.D. candidate in the Department of Electrical and Computer Engineering of Carnegie Mellon University, advised by Prof. Yuejie Chi.
Shicong’s research focuses on efficient learning for reinforcement learning and game theory, by developing in-depth theoretical understanding of policy optimization methods, so as to turn heuristics into rigorous principles and inspire the design of new algorithms that achieve provably fast convergence.
Shicong obtained his bachelor’s degree in Information and Computing Science from School of Mathematical Sciences, Peking University in 2019.
Talk Title: Algorithmic Foundations of Policy Optimization in Reinforcement Learning and Multi-agent Systems
Abstract: Policy optimization methods have been integral to the empirical success of comtemporary reinforcement learning (RL). A recent flurry of works endeavored to developing policy optimization methods with provable convergence guarantee. Our works contribute to this recent line of research by investigating non-asymptotic convergence guarantee and improving the dependencies on salient problem parameters.
We start with policy optimization for tabular single-agent RL, where the intrinsic non-concavity poses a significant challenge to obtaining convergence guarantee. By combining natural policy gradient (NPG) methods with entropy regularization, we deliver the first provably linearly converging policy optimization method to the global optimal policy at a dimension-free rate, assuming access to exact policy evaluation. Going beyond entropy regularization, we design a novel policy optimization method that accommodates various choices of regularizers by leveraging Bregman divergence and mirror descent and provably works even when the regularizer lacks strong convexity and smoothness.
In the presence of multiple agents, the goal of policy optimization is to find an approximate Nash equilibrium (NE), preferably through an independent learning process. Noticeably, existing analyses on learning NE for competitive games fall short of characterizing the last-iterate of the learning algorithm in a dimension-free manner. We make progress by designing novel extragradient policy optimization methods that converge to the quantal response equilibrium (QRE) of competitive games at a dimension-free linear rate, enabling learning approximate NE without introducing additional assumptions typically needed in prior results. We further extend our algorithms and the accompanying analysis to competitive Markov games and multi-player zero-sum polymatrix games.
Bio: Serina Chang is a 5th year PhD student in Computer Science at Stanford University. She develops methods in machine learning and data science to tackle complex societal challenges, from pandemics to polarization to supply chains. Her research focuses on large-scale human networks and novel data sensors, such as mobility networks from location data and query-click graphs from search engines. Her work has been published in venues including Nature, PNAS, KDD, AAAI, EMNLP, and ICWSM, and featured in over 650 news outlets, including The New York Times and The Washington Post. Her work is also recognized by the KDD 2021 Best Paper Award, NSF Graduate Research Fellowship, Meta PhD Fellowship, EECS Rising Stars, Rising Stars in Data Science, Cornell Future Faculty Symposium, and CRA Outstanding Undergraduate Researcher Award. Beyond research, Serina has also served as head course assistant for Stanford’s Machine Learning with Graphs, Program Chair of Machine Learning for Health 2023, and a co-Chair of the Data Science for Social Good Workshop at KDD 2023.
Talk Title: Inferring networks and behaviors from novel data for critical decision-making
Abstract: In an interconnected and fast-moving world, effective policymaking increasingly relies on understanding complex human networks and dynamic behaviors. Novel, high-frequency data sources, such as cell phones and search engines, introduce new opportunities to capture these networks and behaviors at scale. However, there remain substantial gaps between real-world systems and what is recorded in data, such as incomplete networks, unlabeled data, and unknown causal mechanisms. My research seeks to close these gaps by developing methods in data science and machine learning to infer robust signals about real-world networks and behaviors from novel data sources. Using these methods, I derive behavioral insights and build policy tools for high-stakes societal challenges, from pandemic response to political polarization to supply chain disruptions.
In this talk, I will focus on my research related to pandemic response. Specifically, I will cover two lines of work: (1) inferring fine-grained mobility networks from aggregated location data and modeling the spread of COVID- 19, (2) developing machine learning systems to accurately detect vaccine seeking and discover the concerns of vaccine holdouts from anonymized search logs. The findings from these works informed public health policies for reopening, reducing disparities, and distributing vaccines. More broadly, these works demonstrate the potential for large-scale data and computation to aid policymakers, paving a new way to support critical decision-making across domains.
Bio: Marie is a PhD candidate at MIT Institute for Data, Systems, and Society, conducting research at the Laboratory for Information and Decision Systems and the Broad Institute. Her research combines statistical machine learning, causal inference, and network science, with applications in medicine and public health.
In particular, she builds disease-specific hypergraphs to formulate drug repurposing hypotheses and utilizes the target trial emulation framework to evaluate repurposing candidates for Alzheimer’s disease and related dementias using structured (e.g., diagnosis codes) and unstructured data (e.g., clinical notes) from electronic health records, across multiple medical centers. Methodologically, her work involves the use of doubly-robust machine learning and causal matrix estimation and the development of synthetic control and instrumental variable approaches adapted to survival analysis. With a focus on aging populations, she also designs comprehensive agent-based models that integrate granular sociodemographic and mobility data to optimize the distribution of vaccines against infectious diseases at the state and national level and inform public policy decision-making.
Previously, Marie obtained a BSc in Engineering Sciences from Ecole Centrale Paris in France and a MSc in Computational and Mathematical Engineering from Stanford University. As a data scientist at Microsoft, she analyzed the effects of technology usage and digital collaboration on student academic outcomes and socio-emotional learnings in school networks with treatment spillovers. She interned at Goldman Sachs, Microsoft Research, the European Commission, INSERM-INRIA in France, and Clalit Innovation in Israel.
Talk Title: Antihypertensive drug repurposing for dementia prevention: target trial emulations in two large-scale electronic health record systems
Abstract: Alzheimer’s disease (AD)—the most common type of dementia—affects 5.7 million people in the US and costs about $250 billion annually. Since there are no disease-modifying therapies to date, repurposing FDA-approved drugs preventing dementia onset offers an expedited path to reduce the impact of this epidemic. Beyond age, type 2 diabetes and hypertension are two major risk factors for dementia onset. However, mixed results are emerging from prior observational studies contrasting various antihypertensive drug classes, including Angiotensin Converting Enzyme (ACE) inhibitors, Angiotensin Receptor Blockers (ARB), and Calcium Channel Blockers (CCB).
To address this complexity and evaluate the repurposing potential of antihypertensives with different mechanisms of action, we deployed a causal inference approach accounting for the competing risk of death in emulated clinical trials using two distinct electronic health record (EHR) systems, one from the UK Clinical Research Practice Datalink (CPRD) and the other from the US Research Patient Data Registry (RPDR). We performed intention-to-treat analyses among patients aged 50 or older at baseline, applying Inverse Propensity score of Treatment Weighting to balance the two treatment arms with respect to a set of confounders curated by cardiologists and neurologists. Specifically, we compared antihypertensive treatment initiation with any of seven ARBs vs any of eleven ACE inhibitors.
In the US RPDR database, a total of 45,206 patients were eligible to participate in the emulated target trial. Treatment initiation with any of the seven ARBs was associated with lower hazard of all-cause mortality (HR=1.16 [95% CI: 1.10-1.23]) and lower cause-specific hazard of dementia onset (HR=1.34 [95% CI: 1.27-1.42]) in cause-specific Cox Proportional Hazards (PH) models, after accounting for prolonged survival, relative to treatment initiation with any of the eleven ACE inhibitors. In addition, within the ARB drug class, the gap in the dementia risk difference over time was more pronounced among patients initiating on compounds that cross the blood-brain barrier (BBB). Results of the competing risks analysis were robust to the structure of the outcome models (i.e., Cox PH vs nonparametric). The directionality and strength of our findings were similar in the UK CPRD database.
Target trial emulations in two large-scale EHR systems suggest that treatment initiation with BBB-crossing ARBs might reduce the risk of dementia among hypertensive patients. Ongoing work includes the conduct of a mediation analysis in the two considered cohorts to assess the role played by enhanced blood pressure control towards dementia prevention.
Bio: Xinyi Chen is a PhD candidate in the Department of Computer Science at Princeton University advised by Prof. Elad Hazan. Her research is at the intersection of machine learning, optimization, and dynamical systems, with a focus on developing provably robust and efficient methods for sequential decision-making and control. Previously, she obtained her undergraduate degree from Princeton in mathematics, where she received the Middleton Miller Prize. She is a recipient of the Siebel Scholarship and the NSF Graduate Research Fellowship, as well as a participant of EECS Rising Stars at UC Berkeley.
Talk Title: A Nonstochastic Control Approach to Optimization
Abstract: In the modern deep learning pipeline, selecting the best optimization algorithm and the associated set of hyperparameters for a particular problem instance is crucial. The choice of the optimizer can significantly influence the performance of the trained model. However, this is a nonconvex task, and a result, iterative optimization methods such as hypergradient descent lack global optimality guarantees in general.
We propose an online nonstochastic control methodology for mathematical optimization. First, we formalize the setting of meta-optimization, an online learning formulation of learning the best optimization algorithm from a class of methods. The meta-optimization problem over gradient-based methods can be framed as a feedback control problem over the choice of hyperparameters, including the learning rate, momentum, and the preconditioner.
Although the original optimal control problem is nonconvex, we show how recent methods from online nonstochastic control using convex relaxations can be used to overcome the challenge of nonconvexity, and obtain regret guarantees against the best offline solution. This guarantees that in meta-optimization, given a sequence of optimization problems, we can learn a method that attains convergence comparable to that of the best optimization method in hindsight from a class of methods. Finally, we demonstrate empirically that our meta-optimization algorithm can continuously improve given problem instances, and become competitive with the best tuned method among a class of methods.
Bio: Lijun Ding is a post-doctoral scholar at the Institute for Foundations of Data Science (IFDS) at the University of Wisconsin and the University of Washington, supervised by Stephen J. Wright, Dmitry Drusvyatskiy, and Maryam Fazel. Before joining IFDS, he obtained his Ph.D. in Operations Research at Cornell University, advised by Yudong Chen and Madeleine Udell. He graduated with an M.S. in Statistics from the University of Chicago, advised by Lek-Heng Lim. He received a B.S. in Mathematics and Economics from the Hong Kong University of Science and Technology.
Talk Title: Flat minima generalize for low-rank matrix recovery
Abstract: Empirical evidence suggests that for a variety of overparameterized nonlinear models, most notably in neural network training, the growth of the loss around a minimizer strongly impacts its performance. Flat minima — those around which the loss grows slowly — appear to generalize well. This work takes a step towards understanding this phenomenon by focusing on the simplest class of overparameterized nonlinear models: those arising in low-rank matrix recovery. We analyze overparameterized matrix and bilinear sensing, robust PCA, covariance matrix estimation, and single hidden layer neural networks with quadratic activation functions. In all cases, we show that flat minima, measured by the trace of the Hessian, exactly recover the ground truth under standard statistical assumptions.
Bio: Hao-Wen (Herman) Dong is a PhD Candidate in Computer Science at University of California San Diego working with Prof. Julian McAuley and Prof. Taylor Berg-Kirkpatrick. Herman’s research aims to empower music and audio creation with machine learning. His long-term goal is to lower the barrier of entry for music composition and democratize audio content creation. He is broadly interested in music generation, audio synthesis, multimodal machine learning and music information retrieval. He has collaborated with researchers at NVIDIA, Adobe, Dolby, Amazon, Sony and Yamaha through internships. Prior to his PhD, he was a research assistant at Academia Sinica working with Prof. Yi-Hsuan Yang. Herman’s research has been recognized by the ICASSP Rising Stars in Signal Processing and UCSD GPSA Interdisciplinary Research Award. His PhD study has been supported by IEEE SPS Scholarship, Taiwan Government Scholarship to Study Abroad, J. Yang Scholarship and UCSD ECE Department Fellowship.
Talk Title: Generative AI for Music and Audio
Abstract: Generative AI has been transforming the way we interact with technology and consume content. In this talk, I will briefly introduce the three main directions of my research centered around generative AI for music and audio: 1) multitrack music generation, 2) assistive music creation tools, and 3) multimodal learning for audio and music. I will then zoom into my recent work on learning text-to-audio synthesis from videos using pretrained language-vision models and diffusion models. Finally, I will close this talk by discussing the challenges and future directions of generative AI for music and audio.
Bio: Dongqi Fu is a final-year Ph.D. Candidate majoring in Computer Science from University of Illinois at Urbana-Champaign. He is interested on developing data mining and machine learning algorithms on graph data (i.e., non-IID, relational, non-grid, non-Euclidean data). Moreover, the real-world graph data can be (1) related to the temporal information (i.e., time-evolving topological structures, time-evolving node/graph features/labels, etc.) and (2) imperfect (i.e., missing features, scarce labels, hard-to-interpret, redundant, privacy-leaking, robustness-lacking, etc.). Hence, his research focuses on investigating (1) Natural Dynamics (e.g., leveraging spatial-temporal properties of graphs) and (2) Artificial Dynamics (e.g., augmenting and pruning graph components) in Graph Mining, Graph Representations, and Graph Neural Networks to achieve task performance upgrades in accuracy, efficiency, explanation, privacy, fairness, etc., and he is also keen on Graph Data Management and Graph Theory. He used to be a research scientist intern at IBM T.J. Watson Research Center and Meta AI for graph research and applications.
Talk Title: Towards Powerful Graph Learning via Natural and Artificial Dynamics
Abstract: In the era of big data, the relationship between entities has become much more complex than ever before. As a kind of relational data structure, graphs (or networks) attract much research attention for dealing with this
In the long run, graph learning methods face two general challenges when adapting to the complexities of the real world. Firstly, the graph structure and features may change over time (i.e., time-evolving topological structures, time-evolving node/graph features/labels, etc.). The resulting problems include but are not limited to ignoring entity temporal correlation, overlooking causality discovery, computation inefficiency, non-
generalization, etc. Secondly, the initial topological structure and node or graph features may be imperfect (e.g., having construction errors, sampling noises, missing features, scarce labels, hard-to-interpret, redundant, privacy-leaking, robustness-lacking, etc.). The corresponding problems include but are not limited to non- robustness, indiscriminative representations, non-generalization, etc.
Hence, this research talk will concentrate on investigating how to study (1) Natural Dynamics (e.g., leveraging spatial-temporal properties of graphs) and (2) Artificial Dynamics (e.g., augmenting and pruning graph components) for Graph Mining, Graph Representations, and Graph Neural Networks to achieve task performance upgrades in accuracy, efficiency, trustworthiness, etc.
Bio: Johann D. Gaebler is a Ph.D. student in Statistics at Harvard University. His research develops and applies data science and statistical tools to complex social problems, such as mass incarceration, hiring discrimination, policing, election administration, and other issues at the intersection of statistics, computer science, and policy. His research has appeared in JMLR; PAMS; AAAI; PNAS: Nexus; ICML, where it won an outstanding paper award; as well as general-audience publications like the Washington Post. Before transferring to Harvard, Johann was a Ph.D. student in Computational and Mathematical Engineering at Stanford, where he was also a Knight-Hennessy Scholar.
Talk Title: The Measure and Mismeasure of Fairness
Abstract: The field of fair machine learning aims to ensure that decisions guided by algorithms are equitable. Over the last decade, several formal, mathematical definitions of fairness have gained prominence. Here we first
assemble and categorize these definitions into two broad families: (1) those that constrain the effects of decisions on disparities; and (2) those that constrain the effects of legally protected characteristics, like race and gender, on decisions. We then show, analytically and empirically, that both families of definitions typically result in strongly Pareto dominated decision policies. For example, in the case of college admissions, adhering to popular formal conceptions of fairness would simultaneously result in lower student-body diversity and a less academically prepared class, relative to what one could achieve by explicitly tailoring admissions policies to achieve desired outcomes. In this sense, requiring that these fairness definitions hold can, perversely, harm the very groups they were designed to protect. In contrast to axiomatic notions of fairness, we argue that the equitable design of algorithms requires grappling with their context-specific consequences, akin to the equitable design of policy. We conclude by listing several open challenges in fair machine learning and offering strategies to ensure algorithms are better aligned with policy goals.
Bio: Kristina Gligorić is a Postdoctoral Scholar at Stanford University Computer Science Department, advised by Dan Jurafsky at the NLP group. Previously she obtained her Ph.D. in Computer Science at EPFL, where she was advised by Robert West. Her research focuses on developing computational approaches that help solve burning societal issues, understand and improve human well-being, and promote social good. She leverages large-scale textual data and digital behavioral traces and tailors computational methods drawn from AI, NLP, and causal inference. She puts a strong emphasis on understanding and addressing the biases, limitations, and social implications of computational approaches deployed in high-stakes scenarios. Her work has been published in top computer science conferences (such as ACM CSCW, AAAI ICWSM, and TheWebConf) and broad audience journals (Nature Communications and Nature Medicine). She is a Swiss National Science Foundation Fellow. She received awards for her work, including CSCW 2021 Best Paper Honorable Mention Award, ICWSM 2021 and 2023 Best Reviewer Award, and EPFL Best Teaching Assistant Award.
Talk Title: Computational Approaches for Studying Dietary Behaviors with Digital Traces
Abstract: Human dietary habits shape our health, daily life, societies, the environment, and life on earth. However, it remains challenging to understand and attempt to change dietary behaviors using traditional methods due to
measurement and causal identification challenges.
In this talk, I will describe computational and causal approaches leveraging large-scale passively sensed digital traces to shed new light on our dietary behaviors and derive novel scientific insights. I study dietary behaviors in two contexts: campus-wide and worldwide, based on digital traces capturing behaviors of tens of thousands of people on campus and millions of internet users.
First, I will present studies based on situated on-campus food purchase logs. The first study reveals how, when a person acquires a new eating partner on campus or has a meal together, the healthiness of their food choice shifts significantly in the direction of their new eating partner’s dietary patterns. Second, I will describe a study leveraging online information-seeking traces (Google search query logs). Studying worldwide dietary behaviors, I identify and describe global shifts in dietary interests during the first wave of the COVID-19 pandemic. Third, I critically investigate the limits to how much computational approaches can reveal about dietary behaviors in the general population. I outline a framework for reasoning about biases of digital traces and present a case study of food consumption in Switzerland.
Overall, these novel scientific findings and methodological advances contribute to the existing knowledge about human dietary behaviors and inform the design of food systems, policies, and interventions.
Bio: Noah Golowich is a PhD Student at the Massachusetts Institute of Technology, advised by Constantinos Daskalakis and Ankur Moitra. He completed his A.B. and S.M. at Harvard University. His research interests lie in theoretical machine learning, with a particular focus on the connections between multi-agent learning, game theory, and online learning, and in theoretical reinforcement learning. He is supported by a Fannie & John Hertz Foundation Fellowship and an NSF Graduate Fellowship.
Talk Title: Exploring and Learning in Sparse Linear MDPs without Computationally Intractable Oracles
Abstract: The key assumption underlying linear Markov Decision Processes (MDPs) is that the learner has access to a known feature map from state-action pairs to d-dimensional vectors, and that the rewards and transitions are linear functions in this representation. But where do these features come from? In the absence of expert domain knowledge, a tempting strategy is to use the “kitchen sink” approach and hope that the true features are included in a much larger set of potential features. In this work we revisit linear MDPs from the perspective of feature selection. In a k-sparse linear MDP, there is an unknown subset of size k containing all the relevant features, and the goal is to learn a near-optimal policy in only poly(k, log d) interactions with the environment. Our main result is the first polynomial-time algorithm for this problem. In contrast, earlier works either made prohibitively strong assumptions that obviated the need for exploration, or required solving computationally intractable optimization problems.
As a corollary of our main result, we give an algorithm for learning a near-optimal policy in block MDPs whose decoding function is a low-depth decision tree; the algorithm runs in quasi-polynomial time and takes a polynomial number of samples. This can be seen as a reinforcement learning analogue of classic results in computational learning theory. Furthermore, it gives a natural model where improving the sample complexity via representation learning is computationally feasible.
Bio: I am a fifth-year PhD student at Department of Statistics, Stanford University, advised by Professors Emmanuel Candès and Dominik Rothenhäusler. Prior to Stanford, I obtained my bachelor’s degree in Mathematics from Tsinghua University. My research aims at developing modern statistical methods for trusted inference and decision-making, that are simple, robust, generalizable, and minimal in assumptions. A large proportion of my work develops model-free methods that provide solid guarantees for the use of black-box prediction models in complex scientific discovery and decision-making processes, especially with applications to drug discovery. In addition, I study how to generalize causal inference to new contexts and new decision rules; my work under this theme concerns methodological and empirical foundations for the replicability, robustness, and transportability of causal effects, as well as algorithms and theories for policy learning with offline data from sequential experiments.
Talk Title: Model-free selective inference with conformal p-values and its application in drug discovery
Abstract: In decision-making or scientific discovery pipelines such as job hiring and drug discovery, before any resource-intensive step, there is often an initial screening step that uses predictions from a machine learning model to shortlist a few candidates from a large pool. We introduce a framework that allows using any prediction model to select candidates whose unobserved outcomes exceed user-specified values, while rigorously controlling the false positives. Given a set of calibration data that are exchangeable with the test sample, we leverage conformal inference ideas to construct p-values that allow us to shortlist candidates with exact false discovery rate (FDR) control. In addition, I will discuss new ideas to further deal with covariate shifts between calibration and test samples, a scenario that occurs in almost all such problems including drug discovery, hiring, causal inference, and healthcare. Our methods are flexible wrappers around any complex model. In practical drug discovery tasks, our methods greatly narrow the set of promising drug candidates to manageable sizes while maintaining rigorous FDR control.
Bio: Zhijing Jin (she/her) is a Ph.D. at Max Planck Institute & ETH. Her research focuses on socially responsible NLP via causal and moral principles. Specifically, she works on expanding the impact of NLP by promoting NLP for social good, and developing CausalNLP to improve robustness, fairness, and interpretability of NLP models, as well as analyze the causes of social problems. She has published at many NLP and AI venues (e.g., ACL, EMNLP, NAACL, NeurIPS, AAAI, AISTATS). Her work has been featured in MIT News, ACM TechNews, and Synced. She is actively involved in AI for social good, as the organizer of NLP for Positive Impact Workshops at ACL 2021, EMNLP 2022, and EMNLP 2024, Moral AI Workshop at NeurIPS 2023, and RobustML Workshop at ICLR 2021. To support the NLP research community, she organizes the ACL Year-Round Mentorship Program. To foster the causality research community, she organized the Tutorial on CausalNLP at EMNLP 2022, and served as the Publications Chair for the 1st conference on Causal Learning and Reasoning (CLeaR).
Talk Title: Socially responsible NLP via causal and moral principles
Abstract: We are currently navigating through an era marked by numerous social challenges: the COVID pandemic, climate change, as well as escalating concerns on the safety of large language models (LLMs). With all these concerning challenges, I establish my research on socially responsible NLP using causal and moral principles. Specifically, I use causal inference to benchmark existing LLMs’ reasoning ability, analyze the failure modes of LLMs, and interpret the relation between data collection and model learning. Further, I combine interdisciplinary knowledge from moral philosophy to design socially-important moral questions to test LLMs, and propose standards for future models to be more morally safe. In this talk, I will introduce my research on these lines, and put forward an outlook for socially responsible NLP.
Bio: Misha Khodak is PhD student at Carnegie Mellon University advised by Nina Balcan and Ameet Talwalkar. He studies foundations and applications of machine learning, with a particular focus on designing and meta-learning algorithms—from statistical estimators to numerical solvers to online policies—that can take advantage of multi-instance data. He has also led the push to develop automated machine learning methods for diverse tasks and has worked on model compression, neural architecture search, and natural language processing. Misha is a recipient of the Facebook PhD Fellowship and has interned at Microsoft Research, Google Research, the Lawrence Livermore National Lab, and the Princeton Plasma Physics Lab. Previously, he received an AB in Mathematics and an MSE in Computer Science from Princeton University.
Talk Title: ARUBA: Efficient and Adaptive Meta-learning with Provable Guarantees
Abstract: Meta-learning has recently emerged as an important direction for multi-task learning, dynamic environments, and federated settings. We present a theoretical framework for designing practical meta-learning methods that integrates natural formalizations of task-similarity with the extensive literature on online convex optimization and sequential prediction algorithms. Our approach, which works by learning surrogate losses bounding the within-task regret of base learning algorithms, enables the task-similarity to be learned adaptively and leads to straightforward derivations of average-case regret bounds that improve if the tasks are similar. We highlight how our theory can be extended to numerous settings, especially for deriving multi-task guarantees for bandit algorithms.
Bio: Yuanyuan Lei is a PhD student in the Department of Computer Science at Texas A&M University. Her research direction is Natural Language Processing, Machine Learning, and Deep Learning. More specifically, she works on opinion and argument mining, misinformation detection, as well as their applications into media bias and polarization. At the core of her research on media bias, she aims to build responsible natural language processing models to understand subjective bias like opinions, detect media framing bias, and identify misinformation or disinformation.
Talk Title: Sentence-level Media Bias Analysis through Content Structures Modeling
Abstract: News media plays a vast role not only by providing information, but also by selecting, packaging, and organizing the information to shape public opinions. Media outlets are becoming more partisan and polarized nowadays, and developing sophisticated models to detect media bias becomes necessary. Most prior work in the NLP community focused on detecting media bias either at the source level or the article level. However, articles consist of multiple sentences, and each sentence serves different purposes in narrating a news story. We aim to identify media bias at a more fine-grained sentence level, and pinpoint bias sentences that seek to implant ideological bias and manipulate public opinions. Sentence-level media bias can be very subtle and tends to be presented in a neutral and factual tone. Our research demonstrates that modeling the content structures of news articles can reveal such implicit ideological bias.
Bio: Ben Lengerich is a Postdoctoral Associate and Alana Fellow at MIT’s Computer Science and Artificial Intelligence Lab (CSAIL) and the Broad Institute of MIT and Harvard, where he is advised by Manolis Kellis. His research in machine learning and computational biology emphasizes the use of context-adaptive models to understand complex diseases and advance precision medicine. Through his work, Ben aims to bridge the gap between data-driven insights and actionable medical interventions. He holds a PhD in Computer Science and MS in Machine Learning from Carnegie Mellon University, where he was advised by Eric Xing. His work has been recognized with spotlight presentations at conferences including NeurIPS, ISMB, AMIA, and SMFM, and is currently financially supported by the Alana Foundation.
Talk Title: Decoding Real-World Evidence: Bridging GAMs and LLMs to Expose “Death by Round Numbers”
Abstract: Large language models (LLMs) are revolutionizing natural language processing, but combining their implicit knowledge with explicit statistical machine learning models is an ongoing challenge. To bridge these systems, we propose to use generalized additive models (GAMs). GAMs decompose complex outcomes into separable component functions, creating a modular structure that fits seamlessly into LLM context windows. As a result, GAMs can unleash the power of LLMs for traditional machine learning tasks.
We demonstrate an exciting application of this perspective: automated surprise finding. Real-world data science grapples with complications like hidden confounding, often necessitating manual model inspections by domain experts. For instance, treatment-based confounding in medical data may inadvertently teach data- driven systems to perpetuate inherent biases. However, by bridging GAMs with LLMs, we harness the vast knowledge reservoir within LLMs to auto-detect anomalies that appear to contradict domain expertise.
This approach not only improves machine learning models, but also reveals quirks of the underlying system. Using this combination of glass-box GAMs and foundational LLMs, we evaluate four datasets of real-world medical evidence, identifying two characteristic types of medical biases: (1) discontinuous risk at sharp treatment thresholds and (2) counter-causal paradoxes where aggressive treatment reduces perceived risk.
Finally, we present the TalkToEBM package to establish this bridge between GAMs and LLMs and offer a lens to automatically identify surprising effects in statistical models.
Bio: Lucy Li is a PhD student at the University of California, Berkeley, working on natural language processing (NLP), computational social science, cultural analytics, and AI fairness. She researches how social groups are discussed and represented in language models and textual data, such as textbooks, fiction, and online forums. She is passionate about bridging NLP with the humanities and social sciences, especially education and curriculum studies. She is supported by a NSF Graduate Research Fellowship, and has interned at Microsoft Research and the Allen Institute for AI, the latter which awarded her Outstanding Intern of the Year.
Talk Title: Measuring Language By Social Groups in Natural Language Processing
Abstract: Language data embeds social identities, behaviors, and beliefs. That is, who we are is expressed through how we communicate. My research leverages natural language processing (NLP) methods to measure large-scale language patterns across social groups, and uses these measurements to answer sociolinguistic questions and inform model development. First, I’ll present studies quantifying community-specific words and meanings across two domains: online discussion forums and scholarly literature. In these studies, I leverage perspectives from sociolinguistics to relate communities’ use of distinctive language to various social factors, such as member loyalty, activity, network density, audience, and cross-community impact. Second, I’ll present an ongoing study analyzing the effects of large language model (LLM) pretraining data practices on text spanning a range of socioeconomic and geographic origins. Model developers often implement filters to extract “high-quality” data to train models. I’ll show whether notions of “quality” vary across popular LLMs’ filtering strategies, and what kinds of webtext are disproportionately removed during these curation processes. Finally, I will conclude by discussing challenges and questions raised by my research that point towards future directions for computational social science and NLP.
Bio: Paul Liang is a Ph.D. student in Machine Learning at CMU, advised by Louis-Philippe Morency and Ruslan Salakhutdinov. He studies the foundations of multimodal machine learning with applications in socially intelligent AI, understanding human and machine intelligence, natural language processing, healthcare, and education. His research has been recognized by the Siebel Scholars Award, Waibel Presidential Fellowship, Facebook PhD Fellowship, Center for ML and Health Fellowship, and 3 best-paper awards at NeurIPS workshops and ICMI. Outside of research, he loves teaching and advising, and received the Alan J. Perlis Graduate Student Teaching Award for instructing courses and organizing workshops and tutorials on multimodal machine learning.
Talk Title: Foundations of Multisensory Artificial Intelligence
Abstract: The study of multisensory intelligence aims to understand the computational principles of multisensory integration and design practical AI systems that can similarly learn from multiple sensory inputs such as text, speech, audio, video, real-world sensors, wearable devices, and medical data. These multisensory AI systems can support human physical, emotional, and social well-being, enable multimedia content processing, and enhance real-world autonomous agents, which hold great promise for impact in many scientific areas with practical benefits.
Multisensory intelligence brings unique challenges to machine learning given the heterogeneity and interactions often found between modalities. In this talk, I will discuss my recent research on the foundations of multisensory intelligence through a new theoretical framework formalizing how modalities interact with each other to give rise to new information for a task. These interactions are the basic building blocks in all multimodal problems, and their quantification enables users to understand their multimodal datasets and design principled approaches to learn these interactions. To further learn temporal interactions from sequential data, I will present cross-modal attention and multimodal transformers which now underpin many of today’s multimodal foundation models. Finally, I will discuss broader real-world applications of scaling up multisensory learning to many modalities and tasks: (1) aiding mental health practitioners in their treatment by predicting daily mood fluctuations in patients using multimodal smartphone data, (2) supporting doctors in cancer prognosis using genomics readings and pathology images, and (3) enabling robust control of physical robots using cameras and touch sensors.
Bio: Zhijian Liu is a Ph.D. candidate at MIT, advised by Song Han. His research focuses on efficient deep learning. He has developed efficient algorithms and systems for deep learning and applied them to computer vision, robotics, natural language processing, and scientific discovery. His research has been adopted by Microsoft, NVIDIA, Intel, and Waymo. He was selected as the recipient of Qualcomm Innovation Fellowship and NVIDIA Graduate Fellowship. He was also recognized as a Rising Star in ML and Systems by MLCommons and a Rising Star in Data Science by UChicago and UCSD. Previously, he received his B.Eng. degree from Shanghai Jiao Tong University.
Talk Title: Efficient Deep Learning with Sparsity — Algorithms, Systems and Applications
Abstract: Deep learning has catalyzed advancements across numerous scientific and engineering fields. It is also a driving force behind many successful real-world applications, such as autonomous driving, large language models, and generative AI. Despite its remarkable accomplishments, the computational requirements of deep learning have surged dramatically. For instance, the model size of large language models is increased from 0.11B parameters in GPT-1 to 175B in GPT-3 within three years, an over 1500X growth. However, the pace of hardware acceleration has slowed down in the post-Moore era. This raises a large (and potentially growing) gap between the supply and demand of AI computing.
In this talk, I will introduce my research efforts on efficient deep learning with sparsity. I will first talk about my efforts on developing efficient sparse algorithms and systems. I will mainly focus on TorchSparse that effectively translates the theoretical savings from activation sparsity to practical speedups. It is three times faster than the leading industry solution from NVIDIA. I will then delve into my efforts on incorporating sparsity into different applications, such as computer vision, robotics and high-energy physics. To conclude, I will provide a glimpse into some of my ongoing research projects and outline my long-term research vision.
Bio: Siddharth Mishra-Sharma is an IAIFI Fellow at the NSF AI Institute for Artificial Intelligence and Fundamental Interactions (IAIFI), affiliated with the Center for Theoretical Physics at MIT and the Department of Physics at Harvard. Prior to this, he was a postdoc at NYU’s Center for Cosmology and Particle Physics from 2018-2021 and obtained his Ph.D. in Physics from Princeton University in 2018. Siddharth is interested in developing methods that leverage deep learning specifically and differentiable programming more generally to accelerate searches for new physics using astrophysical and cosmological data at all observable scales.
Talk Title: Illuminating the dark Universe with probabilistic machine learning
Abstract: The next several years will witness an influx of astrophysical data that will enable us to accurately map out the distribution of matter in the Universe, image billions of stars and galaxies to unprecedented precision, and create the highest-resolution maps of the Milky Way to-date. These observations may contain signatures of new physics, including hints about the nature of dark matter, offering significant discovery potential. At the same time, the complexity of the data and the presence of unknowable systematics pose significant challenges to robustly characterizing these signatures through conventional methods.
I will describe how overcoming these challenges will require a qualitative shift in our approach to statistical inference in astrophysics, bringing together several recent advances in generative modeling, differentiable programming, and simulation-based inference. I will showcase applications of these methods to diverse astrophysical systems, emphasizing how these will drive progress on key questions in astro- and particle-physics over the next decade.
Bio: Roshni Sahoo is a PhD student in Computer Science at Stanford University, advised by Stefan Wager. She studies data-driven decision making, with a focus on applications in public health and medicine. Her work contributes statistical methodologies for tackling challenges such as biased sample selection, heavy-tailed data, and uncertainty quantification. Her research is supported by a Spectrum Population Health Sciences Pilot Grant, NSF Graduate Research Fellowship, Stanford Data Science Scholar Award, and Stanford Ethics in Society Fellowship. Prior to Stanford, she received BS degrees in computer science and mathematics at MIT.
Talk Title: Learning from a Biased Sample
Abstract: The empirical risk minimization approach to data-driven decision making assumes that we can learn a decision rule from training data drawn under the same conditions as the ones we want to deploy it in. However, in a number of settings, we may be concerned that our training sample is biased, and that some groups (characterized by either observable or unobservable attributes) may be under- or over-represented relative to the general population; and in this setting empirical risk minimization over the training set may fail to yield rules that perform well at deployment. We propose a model of sampling bias called Gamma-biased sampling, where observed covariates can affect the probability of sample selection arbitrarily much but the amount of unexplained variation in the probability of sample selection is bounded by a constant factor. Applying the distributionally robust optimization framework, we propose a method for learning a decision rule that minimizes the worst-case risk incurred under a family of test distributions that can generate the training distribution under Gamma-biased sampling. We apply a result of Rockafellar and Uryasev to show that this problem is equivalent to an augmented convex risk minimization problem. We give statistical guarantees for learning a model that is robust to sampling bias via the method of sieves, and propose a deep learning algorithm whose loss function captures our robust learning target. We empirically validate our proposed method in simulations and a case study on ICU length of stay prediction.
Bio: Hua Shen is a postdoc research fellow at University of Michigan, advised by Dr. David Jurgens, working on facilitating human-centered AI explanation and interaction in the fields of NLP and HCI. She obtained her PhD at Penn State during 2019-2023, supervised by Kenneth Huang, and closely worked with Sherry Wu from CMU. Her research focused on helping humans explain and communicate with AI models (e.g., LLM) via interactions or conversations and, moreover, enhancing AI models by aligning their behavior with human feedback. Her work has been recognized as the Best Paper Honorable Mentioned Award at IUI 2023. She was invited to talk about her research at CMU, Princeton, Google, Amazon, etc. The research finding was also covered by media like PSU News, TechXplore News, etc. Besides, she conducted research on Speech Processing in her Google AI and Amazon AI internships.
Talk Title: Towards Useful AI Interpretability for Humans via Interactive AI Explanations
Abstract: Although the plethora of eXplainable AI (XAI) approaches are validated to reflect the model behavior faithfully, how humans can understand and further use AI explanations still needs to be explored. Therefore, we conducted two human studies to investigate how much AI explanations can be useful for end users in Computer Vision and NLP fields, including asking humans to analyze model failures in image classification tasks and simulate model predictions in text classification tasks, respectively. Our results showed that explanations are not always helpful for human understanding in practical tasks of both fields. To gain insights into possible reasons, we examined the gaps between the status quo of XAI techniques and real-world user demands, by surveying 200+ XAI papers and comparing them with the practical user needs. As a result, we found that humans need diverse XAI types to interpret the whole AI system lifecycle. However, there is a lack of one-size-fits-all XAI techniques to cater to the diverse and dynamic human needs in practice. Hence, we propose a Conversational XAI prototype, ConvXAI, as a universal XAI dialogue interface that empowers users to request various XAI questions across AI lifecycle and receive explanations with diverse formats. We applied ConvXAI to improve human performance in the human-AI co-writing tasks. The findings from two studies with 21 users show that ConvXAI is useful for humans in understanding AI feedback and improving writing productivity and quality. Overall, the studies contribute insights into the design space of facilitating useful XAI techniques for humans in practice.
Bio: I am a Ph.D. student in Language Technologies Institute, Carnegie Mellon University. I am fortunately advised by Prof. Yiming Yang. I received my M.S. degree in May 2021 and anticipate completing my Ph.D. studies in 2024. My academic journey began at Peking University, where I received a B.S. in Computer Science, advised by Prof. Zhi-Hong Deng. My research focuses on neural-symbolic reasoning, which entails synergistically leveraging the strengths of machine learning systems with symbolic systems such as knowledge graphs and combinatorial optimization solvers. More recently, my research interests have been oriented towards the alignment of Large Language Models (LLMs) and Large Multimodal Models (LMMs), with a special emphasis on improving reliability through scalable oversight (minimal human supervision) such as human-defined principles or factual feedback from real-world interactions.
Talk Title: Aligning Large Language and Multimodal Models with Scalable Oversight
Abstract: There has been an increasing desire to bridge the gap between what we expect AI systems to generate and what they actually produce. At the forefront of this exploration, we introduce novel methodologies for aligning Large Language Models (LLMs) and Large Multimodal Models (LMMs): Principle-Driven Self-Alignment (SELF-ALIGN), Self-Alignment with Principle-Following Reward Models (SALT), and Aligning Large Multimodal Models with Factually-Augmented RLHF (Fact-RLHF). Driven by the aspiration to diminish the constraints of exhaustive human supervision and to magnify the reliability of AI outputs, SELF-ALIGN ingeniously harnesses principle-driven reasoning with LLMs’ generative prowess, crafting content that resonates with human values. Building on this motivation, SALT evolves the alignment landscape further by seamlessly integrating minimal human guidance with reinforcement learning from synthetic preferences, offering a tantalizing glimpse into the future of self-aligned AI agents. Shifting our lens to the multimodal realm, where misalignment often translates into AI “hallucinations” that are inconsistent with the multimodal inputs, Fact-RLHF emerges as a general and scalable solution. By merging RLHF’s strength with factual augmentations, this method not only mitigates misalignments but also pioneers in setting robust standards for AI’s vision-language capabilities.
Bio: I am currently a Postdoctoral Scholar at AI for Science Lab at the California Institute of Technology. I received my PhD in 2023 from the School of Engineering and Applied Sciences at Harvard University, where I was also an affiliate to the Center for Brain Science. During my PhD, I also worked at Amazon AI and Microsoft as a Research Intern. I obtained my BASc with distinction in 2017 from the Department of Electrical and Computer Engineering at the University of Waterloo, where I received a President’s Scholarship and multiple First in Class Engineering Scholarships. My research achievements are recognized by multiple distinguished awards and distinctions, including the Swartz Foundation Fellowship in Theoretical Neuroscience, the AWS Machine Learning Research Award and the Harvard Quantitative Biology Student Fellowship. My research advances artificial intelligence for science and engineering with a focus on computational and theoretical neuroscience.
Talk Title: Deep Interpretable Generative Learning for Science and Engineering
Abstract: Discriminative and generative AI represent two deep learning paradigms that have sparked a revolution in predicting and generating high-quality realistic images from text prompts. Nonetheless, discriminative learning lacks the capacity to generate data, while deep generative models face challenges in decoding capabilities. A key challenge, which could brought the next breakthrough in AI, lies in the unification of these two paradigms. Moreover, both deep learning approaches are data-hungry and thus does not perform well in data-scarce applications. Furthermore, deep learning suffers from low interpretability; no comprehensive framework exists to describe construction of non-trivial representations and predictions by discriminative models. These drawbacks have posed significant barriers to the adoption of deep learning in applications where a) acquiring supervised data is expensive or infeasible, and b) goals extend beyond data fitting to attain scientific insights. Specifically, deep learning applications are fairly unexplored in fields with rich mathematical and optimization frameworks such as inverse problems, or those in which interpretability matters. This talk discusses the theory and applications of deep learning in data-limited or unsupervised inverse problems. These include applications in radar sensing, Poisson image denoising, and computational neuroscience.
Bio: Rose E. Wang is a Computer Science PhD student at Stanford University, advised by Prof. Dorottya Demszky and Diyi Yang. She received her B.Sc. in Computer Science at MIT working with Prof. Joshua Tenenbaum and Google Brain. She focuses on large-scale modeling of education interactions, such as 1-1 tutoring settings. She develops data science and machine learning methods to infer robust signals from interaction data for interventions. Her recent projects include modeling remediation practices of mathematics teachers with large language models. Her work is recognized by the NSF GRFP and best paper awards from CogSci and NeurIPS Cooperative AI.
Talk Title: Leveraging Large Language Models for Remediation in Mathematics Tutoring: Insights and Challenges
Abstract: In this talk, I will present our study of how experienced teachers and large language models (LLMs)
remediate student mistakes in mathematics and outline the potential of LLMs to support novice tutors
Scaling high-quality tutoring is a major challenge in education. Because of the growing demand, many platforms employ novice tutors who, unlike professional educators, struggle to effectively address student mistakes and thus fail to seize prime learning opportunities for students. In this work, we explore the potential for LLMs to assist math tutors in remediating student mistakes. We analyze thousands of tutoring sessions from novice tutors, which reveal poor pedagogical practices like immediately saying and explaining the solution when responding to student mistakes. We used these findings to build ReMath, a benchmark co-developed with experienced math teachers that deconstructs their thought process for remediation. We evaluate the performance of state-of-the-art instruct-tuned and dialog models on ReMath. Our findings suggest that although models consistently improve upon original tutor responses, we cannot rely on models alone to remediate mistakes. Providing models with the error type (e.g., the student is guessing) and strategy (e.g., simplify the problem) leads to a 75% improvement in the response quality over models without that information. Nonetheless, despite the improvement, the quality of the best model’s responses still falls short of experienced math teachers. Our work is the first to shed light on the potential and current limitations of using LLMs to provide high-quality learning experiences for both tutors and students at scale.
Bio: Yuqing Wang is a Ph.D. candidate in Mathematics at Georgia Institute of Technology, advised by Prof. Molei Tao. Her research focuses on the intersection of machine learning, optimization, sampling, (stochastic) dynamics, and computational math. Her interest lies in quantitatively understanding and improving the dynamics in machine learning, including the implicit bias of large learning rate and the acceleration of sampling dynamics. Before coming to Georgia Tech, Yuqing Wang received her B.S. in Computational Mathematics from Nankai University.
Talk Title: What creates edge of stability, balancing, and catapult
Abstract: When large learning rate is used in gradient descent for nonconvex optimization, various phenomena that are not explainable by classical optimization theory can arise, including edge of stability, balancing, and catapult. There are a lot of theoretical works trying to analyze these phenomena, while the high level idea is still missing: it is unclear when and why these phenomena occur. In this talk, I will show that these phenomena are actually various tips of the same iceberg. They occur when the objective function of optimization has some good regularity. This regularity, together with the effect of large learning rate on guiding gradient descent from sharp regions to flatter ones, leads to the control of the largest eigenvalue of Hessian, i.e., sharpness, along the GD trajectory, which results in various phenomena. This result is based on the nontrivial convergence analysis under large learning rate on a family of nonconvex functions of various regularities without Lipschitz gradient which is usually a default assumption in nonconvex optimization. In addition, it contains the first non-asymptotic result on the rate of convergence in this circumstance. Neural network experiments will also be presented to validate this result.
Bio: Jingfeng Wu is a postdoc at the Simons Institute at UC Berkley, hosted by Prof. Peter Bartlett and Prof. Bin Yu. He earned his Ph.D. in Computer Science at Johns Hopkins University, advised by Prof. Vladimir Braverman. Before that, he obtained his B.S. in Mathematics and M.S. in Applied Math from Peking University.
His research interests are in the theory and algorithms of deep learning and related topics in machine learning, optimization, and statistics.
Talk Title: Theoretical Insights into Gradient Descent and Stochastic Gradient Descent in Deep Learning
Abstract: Gradient Descent (GD) and Stochastic Gradient Descent (SGD) are fundamental optimization algorithms in machine learning, but their behaviors sometimes defy intuitions from classic optimization and statistical learning theories. In deep learning, GD often exhibits local oscillations while still converging over time. Moreover, SGD-trained models generalize effectively even when overparameterized. In this talk, I will revisit the theories of GD and SGD for classic problems but in new scenarios motivated by deep learning, presenting two novel insights:
(1) For logistic regression with separable data, GD with an arbitrarily large stepsize minimizes empirical risk, potentially in a non-monotonic fashion.
(2) For linear regression and ReLU regression, one-pass SGD and its variants can achieve low excess risk, even in overparameterized regime.
Bio: Lily Xu is a computer science PhD student at Harvard developing AI techniques to address planetary health challenges, with a focus on biodiversity conservation and public health. Her research enables effective decision-making in these high-stakes settings using methods across machine learning, sequential planning, and causal inference. Her work building the PAWS system to predict poaching hotspots has been deployed in multiple countries and is being scaled globally through integration with SMART conservation software. Lily co-organizes the Mechanism Design for Social Good (MD4SG) research initiative and serves as AI Lead for the SMART Partnership. Her research has been recognized with best paper runner-up at AAAI, the INFORMS Doing Good with Good OR award, a Google PhD Fellowship, and a Siebel Scholarship.
Talk Title: High-stakes decisions from low-quality data: AI decision-making for planetary health
Abstract: Planetary health recognizes the inextricable link between human health and the health of our planet. Our planet’s growing crises include biodiversity loss, with animal population sizes declining by an average of 70% since 1970, and maternal mortality, with 1 in 49 girls in low-income countries dying from complications in pregnancy or birth. Overcoming these crises will require effectively allocating and managing our limited resources. My research develops data-driven AI decision-making methods to do so, overcoming the messy data ubiquitous in these settings. Here, I’ll present technical advances in multi-armed bandits, robust reinforcement learning, and causal inference, addressing research questions that emerged from on-the-ground challenges across conservation and maternal health. I’ll also discuss bridging the gap from research and practice, with anti-poaching field tests in Cambodia, field visits in Belize and Uganda, and large-scale deployment with SMART conservation software.
Bio: Yuzhe Yang is a PhD candidate in computer science at MIT. He received his B.S. with honors in EECS from Peking University. His research interests include machine learning, and AI for human disease, health and medicine. His works on AI-enabled biomarkers for Parkinson’s disease were named as Ten Notable Advances in 2022 by Nature Medicine, and Ten Crucial Advances in Movement Disorders in 2022 by The Lancet Neurology. His research has been published in Nature Medicine, Science Translational Medicine, NeurIPS, ICML, ICLR, CVPR, and UbiComp. His works have been recognized by the MathWorks Fellowship, Takeda Fellowship, Baidu PhD Scholarship, and media coverage from MIT Tech Review, Wall Street Journal, Forbes, BBC, The Washington Post, etc.
Talk Title: Learning to Assess Disease and Health At Your Home
Abstract: The future of healthcare lies in delivering comprehensive medical services to patients in their own homes. As the global population ages and chronic diseases become increasingly prevalent, objective, longitudinal and reliable health assessment at home becomes crucial for early detection and prevention of hospitalization. Advances in machine learning and smart sensors hold immense potential for transforming in-home healthcare. However, enabling these technologies for at-home clinical applications requires addressing several challenges, including discovering effective biomarkers with accessible vitals, learning from real-world sparse and biased health data, and making ML algorithms reliable for deployment across diverse environments and populations.
In this talk, I will present new learning methods with everyday devices for in-home healthcare that address these challenges. I will first describe a simple self-supervised framework for remote human vitals sensing just using daily smartphones. I will then introduce an AI-powered digital biomarker for Parkinson’s disease that detects the disease, estimates its severity, and tracks its progression using nocturnal breathing signals. Furthermore, I will showcase the potential of AI-based in-home assessment for various diseases and human health sensing, enabling remote monitoring of health-related conditions, timely care and enhancing patient outcomes.
Bio: I am a 5th year PhD candidate in applied mathematics at the University of Colorado at Boulder, supported by an NSF Graduate Research Fellowship and advised by Aaron Clauset. My research applies a wide range of techniques from data science — from statistics and mathematical modeling to surveys and experimentation — toward the study of complex social systems, ranging from academic careers to human rights. During my PhD, I was a two-time research intern with the Microsoft Research NYC Computational Social Science group and a Data Science & Human Rights Fellow with the Human Rights Data Analysis Group. Previously, I worked as a software consultant at Pivotal Labs, a software engineer at multiple startups including on a DARPA project, and as a Data Science for Social Good Fellow at the University of Chicago.
Talk Title: Labor advantages drive the greater productivity of faculty at elite universities
Abstract: Faculty at prestigious institutions dominate scientific discourse, producing a disproportionate share of all research publications. Environmental prestige can drive such epistemic disparity, but the mechanisms by which it causes increased faculty productivity remain unknown. Here, we combine employment, publication, and federal survey data for 78,802 tenure-track faculty at 262 PhD-granting institutions in the American university system to show through multiple lines of evidence that the greater availability of funded graduate and postdoctoral labor at more prestigious institutions drives the environmental effect of prestige on productivity. In particular, greater environmental prestige leads to larger faculty-led research groups, which drive higher faculty productivity, primarily in disciplines with group collaboration norms. In contrast, productivity does not increase substantially with prestige for faculty publications without group members or for group members themselves. The disproportionate scientific productivity of elite researchers can be largely explained by their substantial labor advantage rather than inherent differences in talent.
Bio: Shufan Zhang is a PhD student in Computer Science at the University of Waterloo. His research interests include data privacy and security, on both theory and system aspects, as well as their intersections with database systems and machine learning. He has published in major data science conferences including SIGMOD, VLDB, ITCS, ICDCS, and journals such as IEEE TIT.
Talk Title: DProvDB: Differentially Private Query Processing with Multi-Analyst Provenance
Abstract: Recent years have witnessed the adoption of differential privacy (DP) in practical database systems like PINQ, FLEX, and PrivateSQL. Such systems allow data analysts to query sensitive data while providing a rigorous and provable privacy guarantee. However, the existing design of these systems does not distinguish data analysts of different privilege levels or trust levels. This design can have an unfair apportion of the privacy budget among the data analyst if treating them as a single entity, or waste the privacy budget if considering them as non-colluding parties and answering their queries independently. In this paper, we propose DProvDB, a fine-grained privacy provenance framework for the multi-analyst scenario that tracks the privacy loss to each single data analyst. Under this framework, when given a fixed privacy budget, we build algorithms that maximize the number of queries that could be answered accurately and apportion the privacy budget according to the privilege levels of the data analysts.
Bio: Chudi Zhong is a Ph.D. candidate in computer science at Duke University, advised by Cynthia Rudin. Her research focuses on developing interpretable machine learning algorithms and pipelines to facilitate human-model interaction for high-stakes decision-making problems. Her work has been published in top-tier conferences (NeurIPS/ICML) and was selected as a finalist for the INFORMS Data Mining Best Student Paper Award. She won 2nd place in the 2023 Bell Labs Prize. Prior to her Ph.D., Chudi earned her bachelor’s degree in Statistics from UNC-Chapel Hill and a master’s degree in Statistics from Duke.
Talk Title: Towards Trustworthy AI: Interpretable Machine Learning Algorithms that Produce All Good Models
Abstract: Machine learning has been increasingly deployed for high-stakes decisions that deeply impact people’s lives. However, not all models can be trusted. To ensure the safe and efficient utilization of machine learning models in the decision-making process, we change both model training and evaluation steps within the standard machine learning pipeline. In this talk, I will introduce our new paradigm, which shifts from finding a single optimal model to enumerating all good interpretable models. This paradigm allows users an unprecedented level of control over model choice among all models that are approximately equally good. I will present algorithms used to find the set of all good models, discuss how this set enables practitioners to explore alternative models that might have desirable properties beyond what could be expressed within a loss function, and show applications.