Skip to main content

Welcome to a new quarter, a new year, and a new decade! At the Center for Data and Computing, we’re excited to spend 2020 continuing to support and inspire interdisciplinary data science and artificial intelligence research at the University of Chicago. As an example, read about the latest round of Data Science Discovery Grants, funding a remarkable array of innovative projects that push science forward and combine expertise in powerful ways. 

In my first six months here at the University of Chicago, I’ve had the opportunity to meet many faculty, students, and researchers working at the frontier of data science in their respective fields across many corners of campus. My experiences so far have confirmed and even surpassed the impression that originally enticed me to come to UChicago. There is a thriving culture of collaboration, and people are excited to collaborate across disciplines and departments. Because of this unique environment, UChicago offers the perfect substrate for CDAC to catalyze both inter-disciplinary research collaborations and the development of data science as a new scholarly discipline. 

In my experience, one of the most difficult challenges to interdisciplinary research is establishing a shared vocabulary and set of values. Data science in many ways offers a conduit. Scholars from economics to medicine understand regression and least squared error. Engineers and computer scientists alike face issues around noise and error correction. Similarly, researchers who grapple with data face a common set of problems. Researchers in digital humanities, climate science, genomic medicine, and urban policy all struggle with issues of data sampling, data representation, data cleaning and quality, and ethical issues concerning automated decision making. 

As machine learning models from random forests to deep neural nets become more popular and pervasive, the steps that take place before any model is applied, from data quality to feature representation, become even more important. These decisions can drastically affect the accuracy of the resulting model. And yet, unlike machine learning models, many of which are heavily optimized and in some cases well-understood, the steps that precede the training and application of the models remain a black art. Within each of our respective domains — whether the data is a traffic capture from the Internet or readings from an environmental sensor — we face similar problems and challenges with data. Similarly, when applying research to practice, these models often need to be deployed in settings where real-world considerations, from label accuracy to model drift, must be carefully considered.

By convening experts from a broad range of domains and disciplines and catalyzing new data science projects that bring researchers together across disciplines, CDAC helps establish a common language and purpose around the problems we all face when manipulating and analyzing data — the principles of the new field of “data science” that we at the University of Chicago will help define. We aim to make CDAC the nexus where researchers from computer science and statistics collaborate with domain scientists to develop advances both within respective problem domains and to the general field of data science. In turn, these cross-disciplinary conversations will inspire new scientific inquiry, as researchers using similar approaches or studying the same data modalities in different disciplines discover common problems with data, allowing us all to develop the new science of deriving insights and knowledge from the growing sea of data.

In 2020, CDAC will continue to support these goals through both existing activities and new initiatives. The Discovery Grant program has just completed another cycle, bringing 8 new projects to the program for a total of 22 projects, in areas spanning high-energy physics, climate science, medical imaging, artistic representation, and much more. We also just concluded our Distinguished Lecture series, which brought the data science community across campus together for a series of exciting talks from accomplished social scientists, computer scientists, and software engineers who each brought a unique perspective on data science frontiers. 

To foster the next generation of data science researchers, we will continue our Data and Computing Summer Lab program for high school, undergraduate, and masters students. In 2020, we will launch our new Data Science and Applied AI Postdoctoral Scholars Program, and open our new Internet of Things (IoT) Lab in the John Crerar Library building, where students and faculty can experiment with a wide range of “smart” devices and datasets for research and applications, studying applications from Internet security and privacy to personal health.

Later this year, we are also planning to launch new initiatives in data journalism, open source software, and broadband access equity. In a follow-up series of articles, I will expand on each of these efforts in more detail; be sure to follow the CDAC blog and social media for updates as these exciting new initiatives develop.

If you are excited by our initiatives or if you have thoughts or ideas of your own, we invite you to visit us any time on the 2nd floor of the John Crerar Library building, or to get in touch with me or Julia Lane, CDAC executive director.  We are so excited about what CDAC has achieved so far with all of you and about what the future holds for data science research at the University of Chicago. I look forward to engaging with you more in the coming months.