Skip to main content

Bio: Paul Liang is a Ph.D. student in Machine Learning at CMU, advised by Louis-Philippe Morency and Ruslan Salakhutdinov. He studies the foundations of multimodal machine learning with applications in socially intelligent AI, understanding human and machine intelligence, natural language processing, healthcare, and education. His research has been recognized by the Siebel Scholars Award, Waibel Presidential Fellowship, Facebook PhD Fellowship, Center for ML and Health Fellowship, and 3 best-paper awards at NeurIPS workshops and ICMI. Outside of research, he loves teaching and advising, and received the Alan J. Perlis Graduate Student Teaching Award for instructing courses and organizing workshops and tutorials on multimodal machine learning.

Talk Title: Foundations of Multisensory Artificial Intelligence

Abstract: The study of multisensory intelligence aims to understand the computational principles of multisensory integration and design practical AI systems that can similarly learn from multiple sensory inputs such as text, speech, audio, video, real-world sensors, wearable devices, and medical data. These multisensory AI systems can support human physical, emotional, and social well-being, enable multimedia content processing, and enhance real-world autonomous agents, which hold great promise for impact in many scientific areas with practical benefits.

Multisensory intelligence brings unique challenges to machine learning given the heterogeneity and interactions often found between modalities. In this talk, I will discuss my recent research on the foundations of multisensory intelligence through a new theoretical framework formalizing how modalities interact with each other to give rise to new information for a task. These interactions are the basic building blocks in all multimodal problems, and their quantification enables users to understand their multimodal datasets and design principled approaches to learn these interactions. To further learn temporal interactions from sequential data, I will present cross-modal attention and multimodal transformers which now underpin many of today’s multimodal foundation models. Finally, I will discuss broader real-world applications of scaling up multisensory learning to many modalities and tasks: (1) aiding mental health practitioners in their treatment by predicting daily mood fluctuations in patients using multimodal smartphone data, (2) supporting doctors in cancer prognosis using genomics readings and pathology images, and (3) enabling robust control of physical robots using cameras and touch sensors.

arrow-left-smallarrow-right-large-greyarrow-right-large-yellowarrow-right-largearrow-right-long-yellowarrow-right-smallfacet-arrow-down-whitefacet-arrow-downCheckedCheckedlink-outmag-glass