Gari Clifford (Emory) – Mythological Medical Machine Learning: Boosting the performance of a deep learning medical data classifier using realistic medical models
This event will take place both in-person at John Crerar Library Building Room 390 and online via Zoom or YouTube.
Abstract: There is a myth in modern machine learning, that as the size of a database increases, and the network depth increases, the performance of an algorithm will continue to improve. This myth is particularly untrue for medical data, which require intense curation to create high-quality labels. As the databases increase in size, the data labels drop in quality or even vanish. Often, the data become noisier with rising levels of non-random missingness. Increasingly, transfer learning is being leveraged to mitigate these problems, allowing algorithms to tune on smaller (or rarer) populations while leveraging information from much larger datasets. I’ll present an emerging paradigm in which we insert an extensive model-generated database in the transfer learning process to help a deep learner explore a much larger and denser data distribution. Since a model allows the generation of realistic data beyond the boundaries of the real data, the model can help train the deep learner to extrapolate beyond the observable collection of samples. Using cardiac time series data, I’ll demonstrate that this technique provides a significant performance boost. I’ll then discuss how general this approach is, and how it can distinguish AI from machine learning.
Bio: Gari Clifford is a tenured Professor of Biomedical Informatics and Biomedical Engineering at Emory University and the Georgia Institute of Technology, and the Chair of the Department of Biomedical Informatics at Emory. His research team applies signal processing and machine learning to medicine to classify, track and predict health and illness. His focus research areas include critical care, digital psychiatry, global health, mHealth, neuroinformatics and perinatal health, particularly in LMIC settings. After training in Theoretical Physics, he transitioned to Machine Learning and Engineering for his doctoral work at the University of Oxford in the 1990’s. He subsequently joined MIT as a postdoctoral fellow, then a Principal Research Scientist, where he managed the creation of the MIMIC II database, the largest open-access critical care database in the world. He later returned to Oxford as an Associate Professor of Biomedical Engineering, where he helped found its Sleep & Circadian Neuroscience Institute and served as Director of the Centre for Doctoral Training in Healthcare Innovation at the Oxford Institute of Biomedical Engineering. Gari is a strong supporter of commercial translation, working closely with industry as an advisor to multiple companies, co-founding and serving as CTO of an MIT spin-out (MindChild Medical) since 2009, and co-founding and serving as CSO for Lifebell AI since 2020. Gari is a champion for open-access data and open-source software in medicine, particularly through his leadership of the PhysioNet/CinC Challenges and contributions to the PhysioNet Resource. He is committed to developing sustainable solutions to healthcare problems in resource poor locations, with much of his work focused in Guatemala.
Part of the Data Science Institute Distinguished Speaker Series:
Defining The Field of Data Science
As data science evolves from buzzword to a mature and singular field, its research questions dive deeper into the foundations of this new discipline. The Fall 2021 Distinguished Speaker Series convenes world-class experts actively exploring and expanding the fundamental methods and approaches that transform large and complex datasets into knowledge and action, fueling new applications in areas such as artificial intelligence, healthcare, and the social sciences. Join the new UChicago Data Science Institute for provocative talks and discussion that will illuminate the bedrock and promise of the flourishing field of data science.