Today’s younger generations have grown up immersed in a world of data and computing. So naturally, these students are eager to learn the skills to work with these technologies, applying the tools of computer and data science to build a better future for all.
That technological upbringing also allowed the 37 high school and undergraduate students of the CDAC Data & Computing Summer Lab to adapt when the second year of the program switched to remote due to the COVID-19 pandemic. Instead of gathering on the University of Chicago campus, students gathered in virtual meetings to work with their faculty mentors on research projects, hear talks from leading scientists, and form an international community with one another. Through their patience and persistence and the hard work of Summer Lab staff and mentors, a difficult and socially distant summer was transformed into a valuable learning experience for students just beginning their research careers.
The students’ talent and creativity is on full display in their final videos — short presentations that summarized their research accomplishments over the 10-week program. These 3-to-5-minute videos reflect the breadth of the Summer Lab research topics, from machine learning, theoretical computer science, natural language processing, and computer vision to high-energy physics, genetics, and the social sciences. In many cases, the short duration of the program was nevertheless enough for students to complete publication-quality work.
“The CDAC Summer Lab was a great experience for me to have exposure to the applications of computer science in other domains and gain technical knowledge,” said Aarthi Koripelly, an incoming University of Chicago student and Summer Lab Research Assistant in both 2019 and 2020, whose work was nominated for Best Undergraduate Poster awards at the Supercomputing 2020 conference. “My projects have helped me hone my research and communication skills in writing reports, presenting to others, and submitting to a conference, which would not have been possible without the opportunities CDAC has provided.”
Machine learning (ML) and artificial intelligence drive many technological innovations today, and several CDAC students spent their summer studying how to improve the accuracy and performance of these algorithms. But importantly, they also studied how to make machine learning approaches more fair, and more “human.” Ray Fregly examined how to utilize ML to help researchers understand how children acquire language by listening to their parents. Christine Jacinto and Daniel Serrano each studied methods for improving the fairness and reducing bias in how ML is applied to real-world situations. And Jamar Sullivan worked on integrating human feedback into ML processes, improving how models are trained to extract important features from text or photographs.
Security and Misinformation
The dark side of technology is how it can offer opportunities for malicious actors to access or manipulate data, with harmful effects. Summer Lab projects examined many dimensions of these threats, including the use of ML to detect anomalous network activity (Lia Troy), a tool for cloaking photos to interfere with non-consent facial recognition tools (Jiawen Shen), and strategies for combating the spread of misinformation on social media sites such as WhatsApp (Jason Chee) and Twitter.
To produce tech safety guidelines for people participating in Black Lives Matters protests, Maia Boyd combined online advice articles and a survey of protestors. Julio Ramirez and Nikki Chakravarthy utilized data from the CDAC Internet of Things Lab and their own home installations to study the vulnerabilities of smart home devices such as speakers, lights, and appliances.
Almost every day brings new tools and methods for capturing knowledge from data, from text and time series to images and video. Summer Lab researchers worked with UChicago projects such as Chameleon, Globus, funcX, and GeoDa, finding new ways of processing data in the cloud (Akhil Kodumuri), using best practices for spatial data research (R.E. Stern), making scientific projects more easily expandable and reproducible (Avery Schwartz and Isabel Brunkan), and optimizing databases for research and discovery (Ryan Wong). Projects also developed strategies for automatically extracting information from large pools of scientific journal text, using dependency parsers or clustering methods (Aarthi Koripellly and Chimaobi Amanchukwu).
Applications and Education
All of these data and computing approaches are no longer confined to the field of computer science, as they have crossed over to help researchers in myriad fields of inquiry. In the physical sciences, Yair Atlas and Ishan Malhotra built and tested algorithms for a “self-driving” telescope that automatically directs cosmic surveys towards interesting astronomical phenomena, while Chinmaya Mahesh (with some pretty impressive accent work) created ML-driven triggers for spotting and saving potential discoveries within data from the Large Hadron Collider. In a project appropriate for the remote summer, Melissa Tovar helped adapt the Scratch Encore curriculum for early-age computer science education to the online learning environments many students are using this fall.
Other students created pipelines for studying the genetics of breast cancer (Arvind Krishnan), neighborhood segregation in Chicago (Felix Farb), ancient cuneiform tablets from the Middle East (Grace Su), or scraping PDFs to study access to medications for opioid use disorder (Olina Liang).
You can watch videos from almost all of the CDAC Data & Computing Summer Lab projects in our YouTube playlist; more will be added as results are published. For more on the Summer Lab program, visit the information hub or read our feature from June.