Skip to main content
  • Overview

    Apply Register: Information Session 1/21/22

    2022 Program Dates: June 13th – August 19th, 2022


    A summer research opportunity for high school, undergraduate, and UChicago Masters students focusing on rigorous, applied, interdisciplinary data science research and rooted in a cohort community.

    The Data Science Institute Summer Lab program (launched in 2018 as the Data & Computing Summer Lab) is an immersive 10-week paid summer research program at the University of Chicago. In the program, high school, undergraduate, and UChicago Masters students are paired with a data science mentor in various domains, including: computer science, data science, social science, climate and energy policy, public policy, materials science, and biomedical research. Through this pairing the research assistant will engage with and hone their skills in research methodologies, practices, and teamwork. UChicago Masters students specifically, as well as eligible high school and undergraduate students, are paired with projects in the Social Impact Track (more details below). We encourage participation from a broad range of students, and require no prior research experience to apply.


    Students in the program are immersed in a research lab and given unparalleled, first-hand access to impactful, applied data science research. Students will gain not only an understanding of fundamental data science methodologies but specialized training within the application areas specific to their lab’s research thrust. Students are asked to practice communicating their research findings throughout the summer, culminating in final videos. The final videos are presented during an end-of-summer symposium, which is run like a professional conference and provides students a chance to field questions about their project and share the outcomes of their research projects. Students also engage in professional development and training that can help them prepare for future careers in data science and computing. Additionally, many alumni continue research work with their mentor after the program ends.


    In the program, students are welcomed into a cohort of their peers who represent diverse backgrounds, interests, and ambitions. Through near-peer mentoring, social gatherings, and group work on projects, students in this cohort not only become better trained data and computational scientists, but better equipped to tackle any challenges ahead through their experience with group work and collaboration. Students meet weekly in small thematic groups called “clusters” to discuss progress, ask questions, and hear about each others’ projects.


    Broadening participation in data science, especially among underrepresented groups, is essential not only for equalizing opportunities but envisioning – and creating – a future that is truly representative of the world around us. Computational work is often stereotyped as people working alone writing code, when in reality data science is a team sport, inherently interdisciplinary, and in constant conversation with real-world issues to achieve measurable, meaningful impact. We aim to train and immerse students in the research lifecycle, and prepare them for critical transitions and sustained career paths.


    To supplement their research work, we provide an exciting array of programming for students during the summer. A highlight of the summer programming is a weekly speaker series featuring researchers at the forefront of data science. Speakers address topics ranging from their own unique and unconventional paths to data science research, to their innovative approaches to tackling important, impactful research questions. Students have the chance not only to hear from first-class speakers but also to introduce and be in conversation with them. In the 2021 program, we hosted 28 different speakers from a wide array of data science domains. You can watch select talks from the 2021 speaker series here.


    The Social Impact Track is an opportunity for students to work as a part of a team on a data science project, with topics ranging from energy, food and agriculture, human rights, to marine technology. The projects are scoped and run in coordination with organizations who have been awarded grants by the 11th Hour Project, a grant making foundation serving the nonprofit community. Teams in the social impact track serve as a centralized hub for software and data science for the organizations – providing both open-source and custom data-driven solutions.All student types – high school, undergraduate, and Masters students – are eligible to participate in the Social Impact Track. 1st year UChicago Masters students are only eligible for projects through the Social Impact Track.


    Summer Lab alumni have been co-authors on published papers and posters, created apps and software tools used by thousands of people, and pursued a variety of future paths within research and beyond. Check out the Project Profiles to learn more about previous student cohorts, and watch videos overviewing their summer research projects. Summer Lab alumna Aarthi Koripelly (‘19, ‘20) shared this about her experience in the program:

    Summer Lab was a great experience for me to have exposure to the applications of computer science in other domains and gain technical knowledge. My projects have helped me hone my research and communication skills in writing reports, presenting to others, and submitting to a conference, which would not have been possible without the opportunities the program has provided.

    Read about the 2021 Summer Lab program.

    2021 Summer Lab Cohort
  • Application


    Register: Information Session 1/21/22

    Application Timeline

    • General Application Deadline: February 20th, 2022, 11:59pm CT
    • Notification Deadline: by April 15, 2022
    • Program Dates: June 13th – August 19th, 2022
      • Chicago Public Schools (CPS) students may request a late start on June 15th.

    Application Overview

    • Research Areas of Interest + Skills Evaluation
      • Areas of interest: list up to 4 keyword areas, and explain your interest and relevant experience in those areas (300 words max)
      • Self-evaluation of core computational tools and skills
    • Resume (PDF Upload)
      • The resume should include: Education history, GPA, relevant courses; Relevant previous experiences (internships, online courses/certificates, programs, clubs, volunteer experience, etc); Technical skills for computer science/data science (programming languages, libraries, tools, softwares, etc); If relevant, links to websites, portfolios, GitHub pages, or LinkedIn pages.
    • Transcript (PDF Upload) – UChicago Masters students only
    • Short Answer Questions
      • Internship Goals: What are your big picture academic goals, and what role does this program in particular play in them? This can be broad and far-reaching, but it should be clear why data and computing research plays a role. (500 words max)
      • Cohort Community Statement: What does being part of a cohort look like to you? What value do you anticipate getting from being part of a cohort? What kind of communities are you a part of now, and what impact have they had on you? (3oo words max)
      • Project Description: Describe a project you’ve undertaken. It can be a final project from a class, a side project, or one from a previous research program. In detail, describe:
        • (1) the goals of the project and your approach; (2) any tools (softwares, libraries, programming languages, etc) you used and why you used them; (3) one challenge you faced and how you addressed it; (4) one achievement from the project of which you’re particularly proud; and (5) any outcomes or tangible results of the project (upload or link to any outcomes if relevant).
        • With this question, more value will be placed on how you approached the project, rather than how advanced or technical it is.
        • Previous program participants will be asked to complete this question in reference to their research project during the previous Summer Lab program.


    If you have any questions about your eligibility, please email

    • Student type/grade
      • High school: current freshman, sophomore, junior, or senior
      • Undergraduate: current freshman, sophomore, or junior
      • Masters: current UChicago 1st year Masters student
    • International students are eligible to apply so long as they have authorization to work in the United States, so that they can receive a stipend. In order to receive a stipend, all participants must provide the documentation noted here.
    • Participants must be available for the entire 10 weeks from June 13th – August 19th, 2022.
      • Chicago Public Schools (CPS) students may request a late start on June 15th.

    Review Criteria

    • Intellectual Curiosity: Evident interest about data science and the applied domain areas chosen.
    • Skills Baseline: Familiarity with at least one programming language, and translation of self-evaluated skills ratings in CV/relevant coursework/other experiences.
    • Initiative + Teamwork: Student has acted upon interest by pursuing available options and opportunities for computational and data science classes, training, and programs, and has successfully worked as part of a team before.
    • Research Aptitude: Creativity and curiosity, self-direction, goal-oriented and adaptable work ethic, resilient problem solving, time management and communication skills.
    • Program Fit: Clear why this program versus others is uniquely valuable to the applicant. Student is uniquely positioned to benefit from the program due to lack of access to similar programs at their home institution, or potential for growth.

    Due to the volume of applications we receive, we will be unable to provide individual feedback on applications that are not accepted.

  • Mentors

    Mentor Interest Form

    We are currently recruiting mentors for the 2022 DSI Summer Lab program. Read more below, and fill out this form by February 7th to indicate your interest.


    At the core of the Summer Lab program are the mentors, who welcome research assistants onto a project and supervise their growth and progress over the 10 weeks. Mentoring in the Summer Lab program is an opportunity to work with incredible students from across the country, join a community dedicated to training the next generation of data scientists, and engage in cutting-edge data science research. The 2021 mentor cohort represented 17 different departments, labs, and research units across UChicago and the National Labs, reflecting the interdisciplinary reach and potential of data science.


    • Access to a pre-reviewed applicant portfolio, tailored for your project’s specifications.
    • On-ramp workshop for research assistants to establish baseline data science skills (Python for data science, Git, Unix).
    • Auxiliary technical support for research assistants through weekly standup meetings and office hours.
    • Activities can contribute to broader impacts sections for grant proposals.
    • Cohort activities such as a weekly speaker series, social activities, and other programming.
    • Potential to continue working with students during the academic year.


    If you have any questions regarding your eligibility as a mentor, please feel free to contact Katie Rosengarten (

    • UChicago faculty member or researcher, or National Lab scientist or CASE affiliate.
    • Capacity to mentor 1-2 students.
    • Dedication to the program’s mission to help broaden participation in data science, and serving as a supportive, inclusive, and welcoming resource for students.
    • Technical Expertise & Mentoring Team
      • Mentors should either have sufficient computer science or data science expertise themselves, or should collaborate with a co-mentor who can provide such expertise. Our team can help identify potential co-mentors as needed.
      • Mentors should have at least 1 PhD student, postdoc, or researcher who can serve as a co-mentor and additional resource to student(s) on the project.
    • Data Science Project
      • Mentors should have one or more projects that fit within the umbrella of data science. We encourage interdisciplinary projects from other domains, however each project must have a significant computational and data-driven component.
      • Projects should be pre-scoped for students. Our team can provide resources to help with this process.
      • Datasets and any necessary computing resources should be identified and readily available for students to access.


    • Participate during the program dates (June 13th – August 19th, 2022).
    • Meet with mentees at least 1x/week.
    • 1 hour minors training for any mentors working with minors.
  • Project Profiles

    See 2019-2020 Cohort Project Profiles here.

    2021 Cohort

    Mentor: Samantha Riesenfeld

    Project Title: Learning How to Identify Noisy Features from Persistent Homology

    Mentor: Samantha Riesenfeld

    Project Title: Topological Features in Drug Tolerant Cells

    Project Description: Using a published data set of scRNA-seq data in PC9 cells treated with Erlotinib, I used R and Python to identify characteristics of cells with prolonged treatment and acquired resistance to help learn the manifold of the data.

    Mentor: Eamon Duede

    Project Title: Evolution of Annoyingness

    Project Description:We analyzed the time and the sentiment (as a contributing factor to the emotion of irritation) of the tweets from a data pipeline we had created using the Academic Twitter API. Leveraging Machine Learning algorithms, we found that the day of the month can be almost as predictive as the tweet content for predicting/classifying the sentiment.

    Mentor: Ravi Madduri

    Project Title: Machine Learning Mobile Applications for Health Promotion

    Project Description: This summer, alongside fellow cohort member Daniel Chechelnitsky, she worked with Dr. Ravi Madduri to create a mobile health app implementing machine learning models of disease prediction. Using Flutter, an open-source UI software development kit launched by Google, she designed and implemented the front-end and back-end components of the application, working with SQL databases and TensorFlowLite machine learning models among other things. The code for the final product can be found here.

    Mentor: Chenhao Tan

    Project Title: AI-Driven Tutorials

    Project Description: Leveraging artificial intelligence and medical imaging datasets, this project aims to create an educational tool that will help train future radiology students. I had the pleasure of contributing to the web development work (BE & FE) for this project.

    Mentors: Margaret Beale Spencer & Chris Graziul

    Project Title: Analysis of Police Broadcast Audio at Scale

    Project Description: With policing coming under greater scrutiny in recent years, researchers have begun to more thoroughly study the effects of contact between police and minority communities. Despite data archives of hundreds of thousands of recorded Broadcast Police Communications (BPC) being openly available to the public, a closer look at a large-scale analysis of the language of policing has remained largely unexplored. While this research is critical in understanding a “pre-reflective” notion of policing, the large quantity of data presents numerous challenges in its organization and analysis.

    We conducted preliminary work toward enabling Speech Emotion Recognition (SER) in an analysis of the Chicago Police Department’s (CPD) BPC by demonstrating the pipelined creation of a datastore to enable a multimodal analysis of composed raw audio files.

    Mentors: Anjali Adukia & Teodora Szasz

    Project Title: Measuring Race and Gender Representation in Children’s Books Using Sentiment Analysis

    Project Description: We measured race and gender representation in award-winning children’s books using sentiment analysis. Our goal was to find the sentiment towards characters to understand how different racial groups or genders are represented in these books. Sharing our findings with teachers, librarians, and parents will help us move towards a more equitable society.

    Mentor: Marshini Chetty

    Project Title: Investigating Privacy Implications of Educational Technologies for School Children

    Project Description: Using Google sheets and links scraped from school district websites we compiled and analyzed data on student privacy. Using Plotly and Dash we visualized our findings to be displayed on a dashboard.

    Mentors: Daniel Grzenda & David Uminsky

    Project Title: Social Impact Track – Schmidt Ocean Institute, ROV Dive Processing

    Project Description: In collaboration with the Schmidt Ocean Institute, our team was tasked with contributing to the foundation of an open source oceanographic video processing pipeline. Our primary goal was to implement an unsupervised video summarization model which will produce highlight reels of underwater ROV dive videos. Our secondary goal was to produce a pipeline which will tag dive video frames with informative labels using a variety of pixel-based algorithms and models.

    Mentors: Daniel Grzenda & David Uminsky

    Project Title: Social Impact Track – PalmWatch

    Mentor: Ravi Madduri

    Project Title: Exploring Machine Learning Applications in Mobile Health Development

    Project Description: Mobile apps have real potential in helping individuals understand their risks for various diseases and help make better choices to lower their risks. Using Flutter, we developed a navigable and scalable mobile app environment, decided on UI/UX design of the app, implemented a static risk calculator (prostate cancer) and a image classification ML model (skin cancer). We also tried building and training our own regression models, but we were not able to deploy them in the Flutter framework.

    Mentor: Heather Zheng

    Project Title: Exploring POV Effect for Stealthy Adversarial Patch Generation

    Project Description: Facial recognition is becoming more popular nowadays, but how can we protect our privacy and prevent cameras from using images to recognize us? Using eyeware that projects light onto the face, flashing the light in a distinct pattern creates an effect called the “pov” effect, essentially making it so when cameras take an image of your face, the image will be distorted and recognition of the individual will fail.

    Mentors: Daniel Grzenda & David Uminsky

    Project Title: Social Impact Track – Schmidt Ocean Institute, ROV Dive Processing

    Project Description: The Schmidt Ocean Institute uses a remote operated robot to collect video footage of the ocean but needed ways to efficiently parse through this video. Our team developed deep learning unsupervised models to create highlight reels of the robot dives. We also used various machine learning techniques to create tags for notable aspects of the videos.

    Mentor: Sarah Sebo

    Project Title: Emotionally Intelligent Robots

    Project Description: Developing a machine learning model to predict psychological safety and inclusion for participants of a group conversation from audio-visual data. This could be a jumping off point for social robots, to behave according to group-dynamics, and perhaps even create ways to improve those dynamics.

    Mentors: Daniel Grzenda & David Uminsky

    Project Title: Social Impact Track – PalmWatch

    Project Description: We investigate whether top-level corporate commitments to sustainability are reflected down the supply chain, focusing on Indonesian palm oil production, which has nearly quadrupled in the past decade. Combining satellite datasets on deforestation and oil palm vegetation, we modeled the risk profile of individual palm suppliers.

    Mentors: Dylan Halpern & Julia Koschinsky

    Project Title: Web Geoda Development

    Project Description: WebGeoda is an open-source, fully client-side browser geospatial analysis tool that allows researchers with little to no coding experience to quickly develop and share visualizations. Built in ReactJS, it leverages the jsgeoda library to perform analysis in-browser without any server overhead costs.

    Mentor: Kyle Chard

    Project Title: Generalizing Metadata Extraction Workflows

    Mentors: Daniel Grzenda & David Uminsky

    Project Title: Social Impact Track – Development Bank Investment Tracker

    Mentors: Nick FeamsterNicole Marwell

    Project Title: Mapping and Mitigating the Digital Divide

    Project Description: Building an Android app and REST API server to collect and store street-level network infrastructures’ data in AWS S3.

    Mentors: Daniel Grzenda & David Uminsky

    Project Title: Social Impact Track – Human Rights Media Analysis

    Project Description: Working with the UN Human Rights Office (UN OHCHR), my team built a feature extraction and NLP classification pipeline that categorised the credibility level of news articles on human rights incidents. In the pipeline, we used sci-kits learn, hugginface, spaCy, and gensim. The resulting pipeline will streamline the process of human rights analysis for UN analysts.


    Mentor: Kyle Chard

    Project Title: Foundry

    Mentors: Kate Keahey & Zhuo Zhen, Argonne National Laboratory

    Project Title: Bidirectional Edge Computing Research

    Project Description: Using Chameleon Cloud resources, I collected and interpreted a variety of network measurements to test possible network configurations between the edge and the cloud. Additionally, I wrote a pipeline over HTTP that allows edge devices to query machine learning models hosted in the cloud.

    Mentors: Daniel Grzenda & David Uminsky

    Project Title: Social Impact Track – Schmidt Ocean Institute, ROV Dive Processing


    Mentor: Blase Ur

    Project Title: Debugging Trigger Action Programming (TAP) in Smart Home Devices

    Project Description: Debugging Trigger Action Programming in Smart Home Devices. Developing software to be used by non-technical participants to help them fix any issues in existing programming rules for smart home devices.

    Mentor: Pedro Lopes

    Project Title: Batteryless Haptics

    Mentor: Sarah Sebo

    Project Title: Meaningful Conversations

    Mentor: Nick Feamster

    Project Title: IoT Activity Recognition Using Audio Data

    Mentor: Blase Ur

    Project Title: Improving Data Downloads

    Mentor: Bryon Aragam

    Project Title: Integrating Generative Models and Causal Inference with Applications in Fair Machine Learning

    Mentors: Giuseppe B. Cerati & Jeremy Hewes & Daniel Grzenda

    Project Title: Exa.TrkX

    Project Description: I worked on the Exa.TrkX project which presents a graph neural network (GNN) technique for low-level reconstruction of neutrino interactions in a Liquid Argon Time Projection Chamber (LArTPC). Graphs describing particle interactions are formed by treating each detector hit as a node, with vertices describing the relationships between hits. The model itself is a multihead attention message passing network which performs graph convolutions in order to label each node with a particle type.

    Mentor: Blase Ur

    Project Title: Debugging Trigger Action Programming (TAP) in Smart Home Devices

    Mentor: Pedro Lopes

    Project Title: InterventionEMS

    Mentors: Daniel Grzenda & David Uminsky

    Project Title: Social Impact Track – Development Bank Investment Tracker

    Project Description: Analyzed relationship between development bank investments and local complaints, facilitating financing processes and protecting human and environmental rights using data engineering and machine learning; Built automatic and continuous investment data collection mechanism with Google Cloud, created SQL database and APIs for data flow, scaffolded front-end webpages for public access, and generated auto-update graphs to provide insights on data trends.

    Mentor: Brian Nord

    Project Title: Deep Diagnostics of Convolutional Neural Networks

    Project Description: My project focused on how to efficiently access fundamental diagnostics to train and optimize CNN’s. I investigated multiple diagnostics programs to determine how they function to help evaluate model performance. However, Testing these diagnostic tools supported my initial hypothesis that these programs didn’t offer easy access to the fundamental diagnostics of a model I was looking for. So I built a diagnostic package that cuts out extraneous features, and with those, the need for external resources or a deep knowledge of coding to provide new and inexperienced users with the fundamental diagnostics they need.

    Mentor: Jai Yu

    Project Title: Behavior Modeling in Rats

    Mentors: Dylan Halpern & Julia Koschinsky

    Project Title: In-Browser Spatial Analytics: Observable Notebook + WebGeoda Scaffolding

    Project Description: This summer, I contributed to the creation of in-browser spatial analytics tools, which improve shareability and flexibility of geospatial research. With ObservableHQ, a Javascript environment, I built an interactive tutorial for exploring local spatial autocorrelation, a key concept in spatial econometrics. I also worked on WebGeoda, a browser version of Luc Anselin’s desktop GeoDa app, by creating various data analysis widgets for spatial autocorrelation.

    Mentors: Kyle Chard, Matt Baughman

    Project Title: AWS Spot Market Trends from 2018 and 2021

    Project Description: In late 2017, Amazon changed the spot market algorithm with the aim of decreasing price variability, increasing spot instance durability, and regularizing the market (Baughman et al, 2019). These changes have made it impossible to rely on the previous strategy of using supply and demand to make decisions. Our research looks at 2021 spot market prices and comparing them to 2018 and compare findings to results found in Deconstructing the 2017 Changes to AWS Spot Market.

    Mentors: Giuseppe B. Cerati & Jeremy Hewes & Daniel Grzenda

    Project Title: Exa.TrkX: Improving Graph Neural Network Performance for Classifying Neutrino Interactions in MicroBooNE Data

    Project Description: The Exa.TrkX project presents a graph neural network (GNN) technique for low-level reconstruction of neutrino interactions in a Liquid Argon Time Projection Chamber (LArTPC). We discovered that the GNN model trained on DUNE simulation data performs quite poorly on the data from another neutrino detection experiment (MicroBooNE). After my colleague Kaushal modified the model architecture that allowed to detach the physical meaning from neutrino interaction graph edges, I explored new edge-forming techniques (such as Delaunay triangulation, KNN-graph, and radius graph) and retrained the model on MicroBooNE data, which resulted in 80% classification accuracy for physically meaningful interactions.

    Mentor: Junchen Jiang

    Project Title: Quality of Experience Personalization Project

    Mentors: Jean-Baptiste Reynier & Anna WoodardOlopade Lab

    Project Title: Self-Supervised Deep Learning for Breast Cancer Risk Prediction

    Mentors: Nick Feamster

    Project Title: Understanding the Reliability of Encrypted DNS Resolvers

    Project Description: We studied the deployment of encrypted DNS outside of the mainstream resolvers by measuring DNS query response times and ping times for resolvers located across the world. We compared non-mainstream resolvers to mainstream resolvers, such as Google and Cloudflare, to better understand the reliability of the lesser known resolvers and the DNS encrypted ecosystem as a whole.

    Mentor: Chenhao Tan

    Project Title: Using AI to Improve Radiology Residence Process

    Mentor: Nicholas Marchio

    Project Title: Interactively Mapping Urban Human Development

    Project Description: We studied the deployment of encrypted DNS outside of the mainstream resolvers by measuring DNS query response times and ping times for resolvers located across the world. We compared non-mainstream resolvers to mainstream resolvers, such as Google and Cloudflare, to better understand the reliability of the lesser known resolvers and the DNS encrypted ecosystem as a whole.

    Mentor: Ben Zhao

    Project Title: Finding Physical Backdoors in Existing Datasets

    Project Description: Roma Bhattacharjee is a freshman at Princeton University. This summer, she worked with Professor Ben Zhao and Emily Wenger on a project regarding physical backdoor attacks in computer vision models. She developed an automated process using graph analysis techniques to uncover viable physical triggers in pre-existing object datasets for training backdoored models.

    Mentors: Kate Keahey & Zhuo Zhen, Argonne National Laboratory

    Project Title: Driving Autonomous Cars From Edge to Cloud with CHI@Edge

    Project Description: We created a cloud-based pipeline for driving autonomous cars via Chameleon’s CHI@edge testbed. Specifically, we developed base containers with libraries for access to a car’s interfaces and launched them onboard small, remote-control cars in addition to exploring the effect of different machine learning models on the performance of the car.

    Mentors: Daniel Grzenda & David Uminsky

    Project Title: Social Impact Track – PalmWatch

    Project Description: Built a model in Jupyter Lab that compares correlations between columns of risk scores,
    created an overlaid histogram of risk scores per mill type, found the risk scores for mills from 2001-2019 using a function, found the year each mill was certified and used this certification column and the risk score columns to build a random forest. The Random Forest predicts for every single year whether or not mills are certified. I also built a logistic regression to predicts what type of certification they have, if they are not certified or certified.

    Mentor: Shan Lu

    Project Title: An IDE Plugin for Machine Learning Software Testing

    Project Description: This project is about the creation of a tool that helps developers use Machine Learning Cloud APIs correctly and more efficiently. The tool automatically generates test cases to thoroughly test an application’s use of Machine Learning Cloud APIs and identify many previously unknown inefficiencies or bugs.

    Mentors: Daniel Grzenda & David Uminsky

    Project Title: Social Impact Track – Human Rights Media Analysis

    Project Description: The ongoing pandemic disrupted the UN’s Office of Human Rights’ ability to conduct field monitoring, leading them to identify human rights incidents from news media. We implemented a Human Rights Media Analysis software tool which automates much of the early stages of data processing for the UNOHCHR. Our tool extracts features from a human rights report/news article and assigns a credibility score (low, medium or high) to the article.

    Mentors: Nick FeamsterNicole Marwell, Guilherme Martins & Kyle MacMillan

    Project Title: Combating the Digital Divide

    Project Description: The work included working with a team to build 100 devices. Wrote a script to automate and speed up the flashing process for devices. Built a script for querying data to find trends in the digital divide.

    Mentors: Daniel Grzenda & David Uminsky

    Project Title: Social Impact Track – PalmWatch

    Mentors: Brian Nord & Yuxin Chen

    Project Title: SPOKES: an End-to-End Simulation Facility for Spectroscopic Cosmological Surveys

    Project Description: I worked under Brian D. Nord, an astrophysicist and machine learning researcher at Fermi National Accelerator Laboratory, on an open-source Python package providing an end-to-end simulation facility for spectroscopic cosmological surveys called SPOKES. SPOKES is built upon an integrated infrastructure, modular functioning organization, coherent data handling, and fast data access. SPOKES is published on PyPI at

    Mentors: Daniel Grzenda & David Uminsky

    Project Title: Social Impact Track – Development Bank Investment Tracker

    Project Description: Xi is a Research Assistant working on the Development Bank Investment Tracker (DeBIT) project to leverage data science for advancing development bank project financing and complaints tracking. Partnered with Accountability Counsel and Inclusive Development International’s Follow the Money initiative, the DeBIT project hopes to hold government, financial institutions, and corporate actors in investment projects around the world accountable for human rights violations and environmental damage.

    Mentors: Daniel Grzenda & David Uminsky

    Project Title: Social Impact Track – Human Rights Media Analysis

    Project Description: I built a data pipeline of web scraping, data cleaning and NLP analysis to develop machine learning classification models predicting the credibility of news articles for the United Nations.

    Mentor: Lorenzo Orecchia

    Project Title: Local Spectral Method for Graph Clustering

    Project Description: Using PLINK to clean and analyze a European gene dataset with 2000 samples. Finding the associations between gene and geographical locations based on spectral graph theory.

    Mentor: Jai Yu

    Project Title: Analyzing Rat Behavior

    Project Description: I used exploratory analysis to examine rats’ behavior and choices in different mazes. Further I looked into pose analysis to pick out smaller behavioral patterns within the rats’ movements.

  • Staff

    Summer Lab Leadership

    Kyle Chard is a Research Assistant Professor in the Department of Computer Science at the University of Chicago and Argonne National Laboratory. He has been Program Director of the Data & Computing Summer Lab since its first iteration under CDAC in 2019, and previously oversaw the Summer Internship Program ran by the former Computation Institute.

    He received his Ph.D. in Computer Science from Victoria University of Wellington in 2011. He co-leads the Globus Labs research group which focuses on a broad range of research problems in data-intensive computing and research data management. He currently leads projects related to parallel programming in Python, scientific reproducibility, and elastic and cost-aware use of cloud infrastructure.

    Julia Lane is the Executive Director for Research Partnerships & Strategy, responsible for shaping and executing the strategic vision of DSI, building new research partnerships and outreach strategies to foster interdisciplinary collaborations, and ensuring that the University continues to broaden applications of data science and computing approaches.

    Katie Rosengarten is Program Manager at the Data Science Institute, responsible for overseeing strategic partnerships, management, execution, and evaluation of student research engagement opportunities for early high school learners through PhD students.

    David Uminsky joined the University of Chicago in September 2020 as a senior research associate and Executive Director of Data Science. He was previously an associate professor of Mathematics and Executive Director of the Data Institute at University of San Francisco (USF). His research interests are in machine learning, signal processing, pattern formation, and dynamical systems.  David is an associate editor of the Harvard Data Science Review.  He was selected in 2015 by the National Academy of Sciences as a Kavli Frontiers of Science Fellow. He is also the founding Director of the BS in Data Science at USF and served as Director of the MS in Data Science program from 2014-2019. During the summer of 2018, David served as the Director of Research for the Mathematical Science Research Institute Undergrad Program on the topic of Mathematical Data Science.

    Before joining USF he was a combined NSF and UC President’s Fellow at UCLA, where he was awarded the Chancellor’s Award for outstanding postdoctoral research. He holds a Ph.D. in Mathematics from Boston University and a BS in Mathematics from Harvey Mudd College.

    Social Impact Track

    Launa is a software engineer responsible for executing Data Clinic projects with student teams in conjunction with the 11th Hour Project, as well as internal projects for the DSI. She received her bachelor’s degree in the humanities at Princeton University and her master’s degree in Computational Analysis and Public Policy at the University of Chicago. Prior to joining the University, she worked as an adult education instructor and then as a software consultant at a Microsoft partner company.

    Daniel Grzenda is a Staff Data Scientist supporting the partnership with the 11th Hour Project at the University. He engages with social impact organizations across the domains of energy, food and agriculture, human rights, and marine technology, to support applications of data and computer science. Daniel’s work is focused on increasing the data capacity of these organizations, lowering the barriers to mission driven data science, and empowering these organizations through the development of sustainable technical solutions.

    Mindi has experience managing, sourcing and scoping a portfolio of data science experiential learning partnerships across social impact organizations, corporate, civic and government entities. Mindi managed earlier iterations of our work with the 11th Hour Project at the University of San Francisco where she served as the Senior Director of Strategy and Operations of the Data Institute. Mindi is a member (inactive) of the California and Illinois bars and received her JD from UC Hastings and her BS, Political Science from Santa Clara University.

    Trevor is an 11th Hour Software Engineer with the DSI. He works with social impact organizations on implementing data science projects that further their mission. Additionally, Trevor mentors student teams for the Civic Data and Technology Clinic. Before the DSI, Trevor worked with computational materials science at Argonne National Laboratory and received his BS from MIT.

    2021 Lab Coordinators

    Kyle Macmillan is a PhD Student in Computer Science at the University of Chicago where he is advised by Professor Nick Feamster. He is a member of the NOISE Lab and is broadly interested in internet measurement, networks, and their applications in law and policy.

    He received my BSE in Electrical Engineering from Princeton University in 2020. His undergraduate thesis was advised by Professor Prateek Mittal.

    Matt is a second year CS PhD student in systems. His research focuses on cost-aware heterogeneous computing and he is part of the Globus Labs research group. Matt has helped mentor students over the past two years and joins the Summer Lab team formally this year as a graduate lab coordinator. Outside of his current research, Matt is an avid pilot and has a background in philosophy, law, and finance.

    Valerie Zhao is a 3rd-year PhD student in Computer Science at the University of Chicago, advised by Professor Blase Ur. Her work focuses on enhancing the experiences of non-technical users in programming smart homes and social robots. She graduated from Wellesley College in 2018, with a bachelor’s degree in Neuroscience and Computer Science with honors.

    Yujie is a predoctoral master’s student in CS and a member of Human Computer Integration Lab. Her research focuses on human-computer interaction, wearables, and haptics. She is interested in devising novel interactive devices and understanding their potential to improve user’s wellbeing. As a part of the CDAC Summer Lab 2020 cohort, Yujie is excited to be back to Summer Lab as a graduate lab coordinator this year to support aspiring students. 

    Zhuokai Zhao is a PhD student in the Computer Science Department at the University of Chicago, advised by Prof. Gordon Kindlmann and Prof. Michael Maire. His work focuses on understanding, visualizing, and building equivariant machine learning systems. Zhuokai received his bachelor’s degree in Electrical Engineering from the University of Illinois at Urbana-Champaign and his master’s degree in Robotics from the Johns Hopkins University.

  • FAQ

    For more questions, contact Katie Rosengarten at


    Apply (UC Lab Schools Only)

    Register: Information Session 1/21/22

    • When is the application due?

      The 2022 application is due Sunday February 20th by 11:59pm CT. Late applications will not be considered for review.

      Please subscribe to the DSI Mailing List to receive notifications about the 2022 program and application.

    • Where can I apply?

      The application for the 2022 DSI Summer Lab program can be found here. If you have any issues accessing or submitting the form, please email Katie Rosengarten (

    • When will I be notified of my application decision?

      The 2022 application is due Sunday February 20th. Decision notifications will be sent out in early April 2021, no later than April 15th.

      Please subscribe to the DSI Mailing List to receive notifications about the 2022 program and application.

    • Are letters of recommendation required?

      We do not require letters of recommendation for the application.

    • Are there any program prerequisites?

      We do not require any previous research experience to participate in the program. Familiarity in at least one programming language (Python, Java, C++, etc.) is preferred, as well as relevant coursework in areas such as computer science, statistics, and math.

    • What grade or age of students are eligible to apply?

      The following students are eligible to apply to the DSI Summer Lab:

      • High School: current freshmen, sophomores, juniors, and seniors;
      • Undergraduate: current freshmen, sophomores, and juniors;
      • Masters: current 1st year UChicago Masters students.
    • Are international students eligible to apply?

      Yes, international students are eligible apply. However, all students must be authorized to work in the United States and provide all necessary documentation in support of their stipend. To see the documentation required to process stipends, please consult this page. We recommend that all international students check with their home institution’s international affairs office to ensure that they qualify.

    • What are the stipend rates for the program?

      The stipend rates for the 2022 program are below:

      • High School stipend rate: $5,625
      • Undergraduate stipend rate: $6,000
      • Masters stipend rate: $6,375
    • How many hours a week is the program?

      While students are not required to log their hours, we expect each student to work roughly a full-time schedule each week (>37.5 hrs/wk) — i.e. 8am-4pm; 9am-5pm; or 10am-6pm. Schedules are to be consulted with and confirmed by program mentors.

    • Is housing provided?

      Unfortunately, we do not provide housing as part of the program.

    • Will the 2022 program be in person?

      We anticipate that the 2022 program will take place in person. If so, participants will be provided work space on the UChicago campus, and will meet often in the John Crerar Library, home to the Data Science Institute and the Computer Science Department. Unless otherwise agreed upon by their mentor and program leadership, all students are expected to work in the open research space provided — the goal of which is to foster problem-solving and engagement across projects, domains, ages, and skill sets.

      The program administration will consult the recommendations of the University of Chicago and UChicago Medicine to determine the safest format for the Summer Lab 2022 program. We successfully ran the 2021 program remotely, and are prepared to do so for 2022 if necessary.

      We will confirm the format of the program as soon as possible, no later than when decision notifications are sent out in April 2022.

    • What are the 2022 program dates?

      The 2022 program dates are June 13th – August 19th, 2022. Chicago Public Schools (CPS) students may request a late start on June 15th.

    • Where can I see past projects in the program?

      You can view profiles of past projects in the program on our Project Profiles page. Each profile includes details about the student’s mentor, a description of the project, and their final poster. Final videos from the 2021 cohort will be available shortly.

    • If admitted, how will I be paired with a project?

      On the application, we ask for your research areas of interest, as well as self-reported experience and expertise in relevant data science and computational skills and tools. During the application review process, in combination with your research goals and resume, we will use those self-assessments to determine an applicant’s aptitude and eligibility for available research projects.

    • What is the Social Impact Track?

      The Social Impact Track is an opportunity for students to work as a part of a team on a data science project, with topics ranging from energy, food and agriculture, human rights, to marine technology. The projects are scoped and run in coordination with organizations who have been awarded grants by the 11th Hour Project, a grant making foundation serving the nonprofit community. Teams in the social impact track serve as a centralized hub for software and data science for the organizations – providing both open-source and custom data-driven solutions.

    • Who is eligible to participate in the Social Impact Track?

      All student types – high school, undergraduate, and Masters students – are eligible to participate in the Social Impact Track. 1st year UChicago Masters students are only eligible for projects through the Social Impact Track.

    • How can I indicate interest in the Social Impact Track?

      On the application, you will be able to answer “yes” or “no” to the question, “Are you interested in participating in the Social Impact Track?” Selecting “yes” does not limit you to projects within the Social Impact Track, but will flag your interest in potential eligibility for projects within that Track.

  • Student Resources
    • DSI Mailing List
      • Subscribe to receive updates about the Data & Computing Summer Lab and other student research opportunities.
    • College Center for Research & Fellowships (CCRF)
      • CCRF supports undergraduates as they pursue transformative, educational experiences through scholarly undergraduate research and nationally competitive fellowships. Visit their website to subscribe to their weekly newsletter that highlights new research opportunities.
    • Department of Computer Science Job Board
      • Available to UChicago students as a resource for internship opportunities as well as part-time and full-time positions.

    Email us at with any student research-related questions.