CDAC Announces Inaugural Cohort of Data Science Discovery Projects
The Center for Data and Computing is proud to announce its first cohort of Data Science Discovery Grant recipients — 11 research projects and an exploratory convening that will expand the community of collaborative data science activity at the University of Chicago.
Our inaugural call for proposals attracted applications from nearly every division, department, and professional school on campus, as well as collaborators from partner institutions such as the Toyota Technological Institute at Chicago and Fermilab. From this exceptional pool, the CDAC steering committee chose projects that forge new data science foundations and span the spectrum of domain applications, from studies of climate change, financial resilience, and high-energy physics to explorations of job skills, protein engineering, and the ethics of artificial intelligence.
You can read brief descriptions of each funded project below, and we will provide in-depth articles about the research over the next several months. Our next cycle of Data Science Discovery Grant funding will open in Fall 2019. To learn more about CDAC, attend our May Speaker Series to hear from leading data scientists about their groundbreaking work.
Computational Modeling to Quantify Social Determinants of Cardiovascular Disease
Corey Tabit, Medicine
Marynia Kolak, Center for Spatial Data Science
Elizabeth Tung, Medicine
Cardiovascular disease is the #1 killer of American adults. Preliminary data suggest that community social risk factors such as violent crime events, closures of food stores or pharmacies, and outages along critical mass transit lines are associated with increased risk for cardiovascular events and worsened control of cardiovascular risk factors such as blood pressure and cholesterol. The project will determine how community stressors relate to cardiovascular events and develop a prediction model and a prototype software tool to identify patients at highest risk for cardiovascular events during their periods of peak risk, so that healthcare providers can more efficiently target interventions to the patients who need them most.
Empowering Doctors in Areas with No-Internet Coverage with a Mobile Decision-making Interface
Andrey Rzhetsky, Medicine, Human Genetics
Pedro Lopes, Computer Science
A large sector of the medical profession in Nigeria lacks traditional decision making systems because of inadequate internet infrastructure, and the country has no electronic medical records. This project will build a mobile system running on conventional Android devices that allows doctors in this region to access basic medical records and perform queries (such as “side effects of diazepam”) over encrypted text messages, as well as access real-time information on the spread of diseases such as contagious infections.
Racial Inequality in Financial Resilience
Peter Ganong, Harris School of Public Policy
Damon Jones, Harris School of Public Policy
Pascal Noel, Booth School of Business
How do income instability and access to resources shape racial inequality? This project will use administrative data to construct the first estimates of financial resilience by race. Using anonymized data from millions of bank accounts to measure income and spending on a day-to-day basis, the researchers will study how households cope with difficult economic shocks such as job loss and measure financial resilience by whether household members cut back on grocery spending, skip doctors appointments, and miss bill payments. This project may uncover a new dimension of racial inequality, one with significant welfare and policy implications.
Predicting Shifts in Biological Growth Driven by Climate Change: A Geometric Deep Learning Approach
Tingran Gao, Statistics
David Jablonski, Geophysical Sciences
Understanding the impact of climate change on the diversity of global marine species has important economic value for fishery and policy making. Changes in water temperature, nutrients, and ecological associations can affect an organism’s physical form and growth, which are critical to determining where a species can survive. Using modern 3D imaging technology, this project will quantify large-scale variations in bivalve shell morphology and statistically analyze how they respond to global climate change by developing highly efficient end-to-end geometric deep learning systems that can automatically extract latent semantic representations from millions of micro-CT scans.
N-body Networks for Jet Physics at the Energy Frontier
Risi Kondor, Computer Science
David Miller, Physics
Machine learning methods are becoming essential for high energy physics experiments at facilities like the Large Hadron Collider at CERN. However, existing ML algorithms typically do not account for underlying symmetry properties of the physical systems and interactions of interest, and must learn, often imperfectly, the phenomenological implications of any symmetries involved. This project will develop a new approach based on a novel type of covariant neural network architecture called N-body networks, and will develop a Lorentz invariant neural network software library for systems of particles whose properties are invariant under generalized Lorentz transformations.
The Role of Rapid Bacterial Evolution in Human Health and Disease
A. Murat Eren, Medicine
Michael Yu, TTIC
The human body is home to an astonishing number of microbes, commonly referred to as the ‘microbiome’, which collectively encodes ten times more genes than the human genome itself. Although the human microbiome has been implicated in a multitude of diseases, the full set of biological mechanisms underlying these connections remain unexplored. This project will conduct a large-scale analysis of ‘plasmids’, small pieces of DNA that can be exchanged between different bacterial cells but are not part of the bacterial chromosome by employing the power of most recent high-throughput sequencing strategies and machine learning approaches to investigate relationships among plasmid genetics, microbial community structures, and human health.
A Data Processing Pipeline to Transcribe Broadcast Police Communications for Further Study
Margaret Beale Spencer, Comparative Human Development
Karen Livescu, TTIC
The procedural language law enforcement officers use to communicate with each other can help us understand the institutional, individual, and contextual factors driving the outcomes of policing incidents. Using a public archive of law enforcement radio communications in the City of Chicago (~45,000 hours of observation), the project will develop a data processing pipeline capable of transforming these broadcast communications into transcripts of discrete policing incidents. This novel data source will allow researchers to test the linguistic antecedents to adverse outcomes, such as uses of force, involving minority male youth.
Disentangling Visual Style and Content
Jason Salavon, Visual Arts
Greg Shakhnarovich, TTIC
How does a young child learn that a plush toy, a crude drawing, and a photograph are all distinct representations of the same content, such as a bear? Embedded in this question is a requirement to disentangle the content of visual input from its form of delivery. These concerns are important to many domains, including computer vision and the creation of visual culture. This project considers the problem of formalizing the concepts of ‘style’ and ‘content’ in images and video. Despite their importance, and our intuitive understanding of these distinctions, there are no compelling and technically useful definitions of these concepts. The researchers will investigate possible ways to define them and operationalize these definitions to improve the state of the art in style transfer and image analysis, as well as produce novel AI-driven visual artifacts.
Rational Protein Engineering using Data-Driven Generative Models
Andrew Ferguson, Pritzker School of Molecular Engineering
Rama Ranganathan, Biochemistry
The rational design of proteins to perform particular functions is a grand challenge in modern science with the potential to revolutionize medicine, materials, and engineering. The primary obstacle is the vast size of protein sequence space making trial-and-error search intractable. This project will establish a platform encompassing a model for functional proteins that can be used to rationally design new candidates, a “gene machine” to prototype designs, and high-throughput quantitative assays to experimentally test the designed proteins. A computational-experimental feedback loop will supercharge the search through sequence space equivalent to many millions of years of natural evolution to discover novel synthetic proteins with unprecedented control over functional properties.
Engineering Opportunity: Identifying Optimal Skill and Relationship mixtures for Individual, Enterprise, and Regional Prosperity
James Evans, Sociology
Eamon Duede, Philosophy & CHSS
Lingfei Wu, Social Sciences
Matt Gee, Center for Data Science and Public Policy
Trends in automation and the digital reorganization of work pose challenges for those seeking to make strategic skilling and hiring decisions. Existing scholarship offers limited guidance for these decisions as it tends to model sectors of the knowledge economy as modules, rather than parts of a connected, adaptive system. This project will leverage unprecedented access to the complete LinkedIn dataset to identify mixtures of knowledge, skills, and relationships that maximize income and future potential for individual participants in the knowledge economy. Coupled with enterprise and regional data, the researchers will perform natural experiments to evaluate predictions about marginal skill development and optimal skill sourcing to yield a wealth of intellectual, technological, and commercial innovations that will have a direct and positive impact on society.
Clinical Behavior Analysis via Unsupervised Learning
Heather Zheng, Computer Science
Samuel Volchenboum, Medicine
This study will apply the technique of unsupervised feature clustering to identify and model patient clinical behaviors from their electronic health records. The project will allow medical providers to understand groups and patterns in patient clinical behavior, identify fundamental factors contributing to adverse medical events such as death and cardiac arrest, and even discover unknown clinical behaviors.
Exploratory Convening | When Technology Transforms Society: Considering the Societal and Ethical Impacts of Quantum Computing and AI
In partnership with the Chicago Quantum Exchange
Daniel Bowring, Fermilab
Chihway Chang, Astronomy and Astrophysics
Eamon Duede, Philosophy & CHSS
Brian Nord, Fermilab
Quantum computing and artificial intelligence are currently making significant technical progress, with commensurate interest from the public, media outlets, funding agencies, and corporate partners. Stakeholders frequently point to the potential of these technologies to “transform society,” but what does this mean, practically? Should we, as researchers, anticipate the social, political, and ethical consequences of our work and steer our research programs accordingly? Can we draw from scholarship in the social sciences and the humanities to inform understanding of the distributional impacts our programs? This workshop will explore these questions and develop collaborations across disciplines, institutions, and key stakeholders who may be able to help responsibly steer the evolution of these revolutionary technologies in ethical and socially beneficial ways.