Building the wide open future of data science requires bringing new students into the fold today. And at the University of Chicago, for the third consecutive year, the CDAC Data & Computing Summer Lab serves as one of those gateways. Welcoming 55 high school, undergraduate, and master’s students to serve as research assistants on projects with more than 39 mentors and adding a new “social impact” track, it’s the largest year yet for the program designed to train and inspire the next generation of interdisciplinary computational and data scientists.
For ten weeks, students will work with their mentors building new wearable devices and robots, creating privacy and security protections from machine learning systems and online data trackers, and studying the quality of broadband access in Chicago communities. They will work with sociologists, astronomers, molecular engineers, and physicists on projects that blend data and computer science with scientific exploration and real-world applications. Social impact projects will partner with non-profit organizations collecting oceanographic data, monitoring human rights violations, and investigating international finance.
“At the University of Chicago, our goal is to define the field of data science so that it can fully reach its potential for enabling world-changing research and applications,” said David Uminsky, executive director of data science initiatives at UChicago. “The Summer Lab program is an integral part of that mission. We want to foster an inclusive research environment where students from all backgrounds gain hands-on research experience and build critical data science skills, creating a future generation of data scientists representative of the world we hope the field will positively impact.”
A Bigger, Broader Program
For the second consecutive year, Summer Lab programming will be virtual due to the lingering pandemic. But as with the summer of 2020, participants will make the best of the remote format, gathering online for weekly talks and panels, research group and cluster meetings, and social events. The structure also allows for long-distance participation, with research assistants attending the Summer Lab from California to North Carolina and Vietnam to Kosovo.
That reach is also reflected in the diversity of the 2021 cohort. The cohort is almost equally split between male- and female-identifying students (51% to 49%) and made up of 22 percent minorities underrepresented in computing. One-fifth of the undergraduates are first-generation college students, and the 17 high schoolers and incoming undergraduates come from schools across the Chicagoland area, including Lane Tech, Kenwood Academy, and the Illinois Mathematics and Science Academy.
In addition to research experience, those students will also take part in an “on-ramp” workshop introducing them to basic computational skills, such as using data science libraries like NumPy and Pandas, version control with Git, and Jupyter notebooks. Weekly speaker events gather academics, national laboratory researchers, and Summer Lab alumni to discuss topics ranging from reproducibility in data science to natural language processing to broadband equity. A career development workshop illustrates the different paths that students can take with their talents and passion for science, and a symposium at summer’s end highlights videos produced by each participant, outlining their research accomplishments.
“What I see us doing even more this year is honing and expanding opportunities for students to build their academic, research, and team-building skill sets,” said Katie Rosengarten, Program Manager for CDAC and co-leader of the Summer Lab with Kyle Chard, Julia Lane, and David Uminsky. “We challenged ourselves to think of engaging talks for the program’s speaker series that help broaden students’ notions of data science, and I think we have a really good mix of new and interesting topics. And we’re continually ramping up the social programming: If the past year has taught us anything, it’s that virtual socializing doesn’t replace in-person, but it can still be a blast! The cohort experience is built both through collaborating on projects as part of a research team and through connections and having fun with games.”
New Track, New Applications
The largest addition to this year’s Summer Lab is the new Social Impact track, overseen by Uminsky, Daniel Grzenda, Mindi Mysliwiec, Kathryn Mattie, and Alyssa Szynal. Instead of working with UChicago research groups, students in this cluster will work directly with organizations such as the Schmidt Ocean Institute (SOI), the UN Office of the High Commissioner for Human Rights (UN – OHCHR), and Inclusive Development International (IDI). The track is also unique in the participation of master’s students, coming from programs such as the Master’s Program in Computer Science, the M.S. in Computational Analysis & Public Policy, and the Masters in Computational Social Science.
“There’s all these smart students coming from all over the country to work on research and impactful projects, and there are plenty of cutting-edge machine learning and data science applications from our nonprofit partners that could use student support,” Grzenda said. “Students get the benefit of experiencing research while also having a practical application, and at the same time we’re bringing master’s students together with undergraduates and high schoolers in multidisciplinary teams.”
Projects include building a video processing pipeline to help SOI search and filter video data collected by their underwater remotely-operated vehicles and the creation of natural language processing algorithms to assess the reliability of sources in news articles about human rights. Other teams will help IDI expand a dashboard to monitor tree loss near palm oil mills in Indonesia and combine datasets from the World Bank and other development banks to assess the impact of major infrastructure projects on local communities.
In addition to adding a new dimension to Summer Lab, the Social Impact track also provides continuity with the Civic Data & Technology Clinic operated by CDAC and the Harris School of Public Policy. Partnerships established during the academic year can carry over into the summer, Grzenda said, while new summer projects may hand off their work to clinic students in the fall. The social impact cluster will also give Summer Lab participants and representatives from their partner organizations exposure to a wide variety of real-world applications.
“The clusters are where groups of students meet with the lab coordinator and talk about their projects all together,” Grzenda said. “So these students are going to get a full picture of nonprofit data work in human rights, energy, and marine technology. They’ll get three or four separate social impact organizations all together talking about the data science they’re doing for each.”
Expanding the Data Science Pipeline
From the beginning, a primary goal of the CDAC Summer Lab was to train and inspire future data scientists from a variety of backgrounds, in many cases providing them with their initial experience with computational tools and the research lifecycle. Three years in, the fruits of that effort are starting to show, with Summer Lab alumni advancing their data science studies and even returning to the program in advanced positions.
Yujie Tao discovered the Summer Lab program in 2020, as a graduating senior from the University of North Carolina at Chapel Hill. Seeking experience in human-computer interaction, she was drawn to the work of Pedro Lopes, assistant professor of computer science at UChicago and a Summer Lab mentor. Despite the remote format of last year’s program, Tao was able to collaborate with Lopes and other students on dextrEMS, a wearable haptic device that uses electrical muscle stimulation to manipulate a user’s fingers. In one application, the device can help people learn and perform sign language.
Tao’s experience was so positive she stayed with the University of Chicago, joining the Predoctoral Masters in Computer Science program and continuing to work in Lopes’ Human-Computer Integration Lab. This year she has reunited with Summer Lab in the role of lab coordinator, helping new students work on HCI projects from near and afar.
“One thing that I really like about this program is that it is quite interdisciplinary,” Tao said. “I’m in the HCI field but there are many other directions that people are pursuing. For a participant, it’s really great to see a program that embraces a wide range of disciplines, you’re just a step away from so many other research topics that could also inspire you a lot.”
Chimaobi Amanchukwu learned of the program as a high school junior in Houston, Texas, where he struggled to find computer science courses to pursue an interest in programming and research. Though intimidated at first to apply to a university program, he followed through, was accepted, and joined for the 2020 Summer Lab, working on a natural language processing project with Globus Labs.
“The program really went out of its way to make me feel accepted and make me feel good and open to ask questions and to talk to everybody,” Amanchukwu said. “Even in the first week, I remember actually really liking the on-ramp program, the tutorial and learning process. It was just really interesting, and I wasn’t expecting to be taught at an internship.”
The experience was so positive, Amanchukwu not only applied to return to the CDAC Summer Lab in 2021, but also to enroll at the University of Chicago, where he will start as a first-year this coming fall. For his encore run as a summer research assistant, he’ll work with assistant professor Marshini Chetty, studying the privacy and data use policies of school districts using educational technologies.
“Before Summer Lab, I didn’t have much of a scope of what the computer science world was like,” Amanchukwu said. “I didn’t really know there was much of an industry for research and I didn’t know about a lot of the different projects that people are coming here to pursue. But a lot of the faculty and students have ideas that I’ve never even thought of before. I’m looking forward to learning more from other people this year.”