Rose Wang
Bio: Rose E. Wang is a Computer Science PhD student at Stanford University, advised by Prof. Dorottya Demszky and Diyi Yang. She received her B.Sc. in Computer Science at MIT working with Prof. Joshua Tenenbaum and Google Brain. She focuses on large-scale modeling of education interactions, such as 1-1 tutoring settings. She develops data science and machine learning methods to infer robust signals from interaction data for interventions. Her recent projects include modeling remediation practices of mathematics teachers with large language models. Her work is recognized by the NSF GRFP and best paper awards from CogSci and NeurIPS Cooperative AI.
Talk Title: Leveraging Large Language Models for Remediation in Mathematics Tutoring: Insights and Challenges
Abstract: In this talk, I will present our study of how experienced teachers and large language models (LLMs)
remediate student mistakes in mathematics and outline the potential of LLMs to support novice tutors
at scale.
Scaling high-quality tutoring is a major challenge in education. Because of the growing demand, many platforms employ novice tutors who, unlike professional educators, struggle to effectively address student mistakes and thus fail to seize prime learning opportunities for students. In this work, we explore the potential for LLMs to assist math tutors in remediating student mistakes. We analyze thousands of tutoring sessions from novice tutors, which reveal poor pedagogical practices like immediately saying and explaining the solution when responding to student mistakes. We used these findings to build ReMath, a benchmark co-developed with experienced math teachers that deconstructs their thought process for remediation. We evaluate the performance of state-of-the-art instruct-tuned and dialog models on ReMath. Our findings suggest that although models consistently improve upon original tutor responses, we cannot rely on models alone to remediate mistakes. Providing models with the error type (e.g., the student is guessing) and strategy (e.g., simplify the problem) leads to a 75% improvement in the response quality over models without that information. Nonetheless, despite the improvement, the quality of the best model’s responses still falls short of experienced math teachers. Our work is the first to shed light on the potential and current limitations of using LLMs to provide high-quality learning experiences for both tutors and students at scale.