Mengdi Huai
Bio: Mengdi Huai is a Ph.D. candidate in the Department of Computer Science at the University of Virginia, advised by Professor Aidong Zhang. Her research interests are in the general area of data mining and machine learning, with an emphasis on the aspects of model transparency, security, privacy and algorithm design. Mengdi’s research has been published in international conferences and journals, including top conferences in data mining and AI (KDD, AAAI, IJCAI, NeurIPS, WWW, ICDM, SDM, BIBM) and top journals (TKDD, NanoBioscience). She has received multiple awards, including the Rising Star in EECS at MIT, the John A. Stankovic Research Award, the Sture G. Olsson Fellowship in Engineering, and the Best Paper Runner-up for KDD2020.
Talk Title: Malicious Attacks against Deep Reinforcement Learning Interpretations
Talk Abstract: The past years have witnessed the rapid development of deep reinforcement learning (DRL), which incorporates deep learning into the solution and makes decisions from unstructured input data without manual engineering of the state space. However, the adoption of deep neural networks makes the decision-making process of DRL opaque and lacking transparency. Motivated by this, various interpretation methods for DRL have been proposed. Those interpretation methods make an implicit assumption that they are performed in a reliable and secure environment. However, given their data-driven nature, these DRL interpretation methods themselves are potentially susceptible to malicious manipulations. In spite of the prevalence of malicious attacks, there is no existing work studying the possibility and feasibility of malicious attacks against DRL interpretations. To bridge this gap, in my work, I investigated the vulnerability of DRL interpretation methods. Specifically, I introduced the first study of the adversarial attacks against DRL interpretations, and proposed an optimization framework based on which the optimal adversarial attack strategy can be derived. In addition, I also studied the vulnerability of DRL interpretation methods to the model poisoning attacks, and present an algorithmic framework to rigorously formulate the proposed model poisoning attack. Finally, I conducted both theoretical analysis and extensive experiments to validate the effectiveness of the proposed malicious attacks against DRL interpretations.