Skip to main content

Bio: Chan Park is a 5-th year PhD student at Language Technologies Institute at Carnegie Mellon University. She is advised by Professor Yulia Tsvetkov, and is currently a visiting PhD student at University of Washington. She is a recipient of a KFAS Overseas PhD Scholarship. Her research interests lie in the intersection of natural language processing and computational social science and her work has been published to various top-tier conferences and journals including PNAS, EMNLP, ICWSM, and WWW. She has collaborated with the Washington Post and the Data for Black Lives organization to apply her research to address real-world problems.

Talk Title: Challenges in Opinion Manipulation Detection: An Examination of Wartime Russian Media

Talk Abstract: Information warfare has been at the forefront of the 2022 Russia-Ukraine war, as all sides attempt to shape online narratives. NLP could be valuable for combating manipulation, but most prior work focuses on detecting extreme manipulation strategies like fake news and relies on supervised approaches with pre-annotated data, which is unavailable in emerging crises. We release a new dataset, VoynaSlov, containing 38M+ social media posts from Russian media and their public responses. We characterize VoynaSlov along three dimensions: media ownership (state-affiliated, independent), platform (Twitter, VKontakte), and time (pre-war, during-war). Drawing from political communication, we investigate subtle manipulation strategies by applying state-of-the-art NLP models to quantify agenda-setting, framing, and priming. Our methodology includes word frequencies, topic modeling, supervised classification, and consideration of metadata such as likes or shares. Each technique reveals interesting, yet sometimes conflicting, insights; for example, independent outlets are more likely to use war-related words, but state-affiliated outlets discuss war-related topics more. Throughout the talk, we discuss numerous limitations: topic model interpretability and instability, unclear framing typologies and domain-specificity, and issues surrounding validity and misuse for priming. Our dataset, initial investigation, and discussion of open technical challenges can facilitate the identification and mitigation of information manipulation tactics in ongoing crisis situations.

arrow-left-smallarrow-right-large-greyarrow-right-large-yellowarrow-right-largearrow-right-long-yellowarrow-right-smallfacet-arrow-down-whitefacet-arrow-downCheckedCheckedlink-outmag-glass