Skip to main content

Please join us for a Statistics and DSI joint colloquium.

Thursday, February 13
2:00pm – 3:00pm
Jones Hall 303
5747 S Ellis Avenue
Chicago, IL 60637

Abstract: The deployment of machine learning in high-stakes settings has raised fundamental questions about the reliability and fairness of black-box models. For example, does a model treat different groups equitably, or can we quantify model uncertainty before taking action on each prediction? While numerous assumptionlean methods appear to address these types of questions, their guarantees can often be misaligned with practitioners’ needs. My research aims to resolve the inherent tension of model-free statistical inference: the generic validity of such methods is appealing, but without a well-specified model, it is challenging to identify guarantees that are also practically useful. To make these observations concrete, this talk will primarily focus on a set of new conformal inference methods for obtaining validity guarantees on the output of large language models (LLMs). Prior work in language modeling identifies a subset of the text that satisfies a high-probability guarantee of factuality. These methods work by filtering a claim from the LLM’s original response if a scoring function evaluated on the claim fails to exceed some estimated threshold. Existing methods in this area suffer from two deficiencies. First, the guarantee is not conditionally valid. The trustworthiness of the filtering step may vary based on the topic of the response. Second, because the scoring function is imperfect, the filtering step can remove many valuable and accurate claims. Our work addresses both of these challenges via two new conformal prediction methods. First, we show how to issue an error guarantee that is both valid and adaptive: the guarantee remains well-calibrated even though it can depend on the prompt (e.g., so that the final output retains most claims). Second, we will show how to optimize the accuracy of the scoring function used in this procedure, e.g., by ensembling multiple scoring approaches. This is joint work with Isaac Gibbs and Emmanuel Candès.

Bio: John Cherian, Department of Statistics, Stanford University. I’m a fifth year PhD student in statistics at Stanford University, where I’m grateful to be advised by Emmanuel Candès and supported by the John and Fannie Hertz Foundation. My research is motivated by fundamental questions about the reliability and fairness of black-box models. For example, does a model treat different groups equitably, or can we quantify model uncertainty before taking action on each prediction? When I’m not working on new methods for model-free inference, I apply these ideas to election forecasting as a consultant to The Washington Post.

Before starting my PhD, I spent 3 years at D.E. Shaw Research (“DESRES”) where I worked on improving polarizable force fields for all-atom simulations. I joined DESRES after earning my B.S. in Mathematical and Computational Science and M.S. in Statistics from Stanford University in 2017 You can reach me at jcherian at stanford dot edu. A copy of my CV is available here.

arrow-left-smallarrow-right-large-greyarrow-right-large-yellowarrow-right-largearrow-right-long-yellowarrow-right-smallclosefacet-arrow-down-whitefacet-arrow-downCheckedCheckedlink-outmag-glass