Belinda Li (MIT): CS and DSI Joint Colloquium
Monday, March 9
Time: 2:30pm – 3:30pm
Data Science Institute, Room 105
5460 S University Ave, Chicago, IL 60615
Abstract: Modern language models (LMs) are increasingly capable. Despite their increasing capability, they still suffer from persistent failures: they hallucinate facts, adapt poorly to users, and produce unfaithful explanations. Rather than viewing these failures as inevitable outcomes of neural networks, I present evidence that LMs learn to build structured internal models of the world, the user, and themselves, and that these can be leveraged to build more reliable agents. First, I use interpretability techniques to show that LMs indeed build latent representations of world state, and I characterize the algorithms they use to track state changes. Next, I augment LMs with external Bayesian frameworks for interactive user modeling, enabling them to proactively elicit and track user preferences. Finally, I develop training methods to equip LMs with self-models, enabling them to produce faithful explanations of their own computations. Together, these lines of work allow future AI systems to maintain coherent and updateable beliefs, to adapt to individual users, and to communicate their reasoning transparently to humans, pointing towards a future of collaborative AI systems which augment rather than replace human capabilities.
Bio: Belinda Z. Li is a PhD candidate at MIT EECS focused on building AI systems that have coherent and interpretable models of the world, the user, and themselves. Belinda has developed methods for understanding LMs’ implicit world models, eliciting user models through proactive question-asking, and training LMs to faithfully explain their internal computations. Belinda’s work aims to make AI systems more transparent, reliable, and amenable to human collaboration. Belinda is a recipient of a Rising Stars in EECS Award, a Clare Boothe Luce Fellowship, and an NDSEG Fellowship.
AICE Speaker Series: Daniel C Reuman (University of Kansas)
AICE Speaker Series: Colm-cille P. Caulfield (Cambridge)
Follow the Money Tools Demonstration and Q&A