First Two UChicago Data Science Majors Graduate, Enter Industry
This weekend, the University of Chicago Class of 2022 will graduate and celebrate convocation. Among those hundreds of students will be the first two recipients of the new undergraduate data science degree, Noor Elmasry and Alex Hayward. When the major launched last fall, Elmasry and Hayward immediately signed up, participating in the Data Science Clinic and taking courses on ethics, fairness, responsibility, and privacy in data science, machine learning, and data visualization. Both students immediately parlayed their degree and experience into prestigious positions at Google and Prudential. We talked to Noor and Alex before their graduation in early June.
Tell us what attracted you to data science and why you chose the major.
Noor Elmasry: I came in as a philosophy and physics double major. I really wanted to answer the big questions in life, and I felt like those two were particularly compatible. But then my interest grew into data science when I realized that a lot of the work I’d be doing with physics after graduation wouldn’t address those larger questions. I wanted to make a more immediate impact, and that led me to consider something like data science.
I realized it still has all the elements of scientific process and methodology. But the way that you define the questions makes it easier to come up with an answer or a resolution and address things on a higher turnaround. I guess just being an impatient person, I wanted to make an immediate impact, but obviously not at the expense of a rigorous scientific method. So I found data science to be a really well suited mix of what I wanted to find.
I still feel like philosophy and data science are incredibly compatible, and also quite similar. You have premises, definitions, and well-defined questions that you can assess whether or not you can answer given your tools.
Alex Hayward: Previously, I was a molecular engineering major, and I was fortunate enough to engage in a couple of research projects at the Pritzker School of Molecular Engineering and UChicago Medicine. Throughout these experiences, I grew a hunger for creating some kind of social impact. I think there are a lot of opportunities for that in molecular engineering, but my exposure to research really showed me the opportunities for impact throughout the data science lifecycle. We were defining a problem, figuring out how to collect or acquire data, really getting into the weeds of the data to figure out whether it’s the right type of data to answer our question or solve our problem, then running analyses and translating results to different kinds of audiences.
So I grew really excited about data science across the different research projects I was on and thought that gaining more foundational, theoretical background through the data science minor would strengthen my skills in the research environment. I used to joke with my friends that if there was a data science major, then I would switch to it immediately.
I think it’s one of the best decisions I’ve made during my undergraduate career. It’s been fascinating to see the parallels between the major and my research experiences, and I think that it’s prepared me tremendously for the field.
Tell us about the core data science sequence and how it helped move you along this path towards data science.
Alex Hayward: UChicago is well known for its foundational, theoretical courses across different fields. I think that the introduction to data science sequence definitely follows that path, in that it gives you the tools to learn how to break down a problem into different steps of the data science lifecycle. First, we ask if data science is the right type of approach to answer our question, defining our problem. Then we really dive step-by-step through the process of how to produce a data science result. I still sometimes look at the slides from those courses, because the other classes in the major draw so much from those introductory principles. I think that’s what will make the data science major really strong: learning how to decompose your question or problem from the very beginning and getting more advanced experiential work later on.
Noor Elmasry: One thing that I think was really influential is how they differentiated thinking like a data scientist versus thinking like a statistician or computer scientist. The intro course is really important to set up how you enter problems differently as a data scientist, and what framework of thinking applies to you more than someone else who might be focused on the user experience or back end.
Alex Hayward: We also have classes like data engineering to hone our database skills, as well as data science ethics which teaches us to think critically about the kinds of statistical models and tools we may use and what inferences we may draw from our experiments.
Emphasizing ethics from the start, whether you’re in the minor or major, has made this program really unique. We’re not only having discussions about privacy, responsibility, and other ethical principles within the data science industry, but we’re also learning how to implement quantitative methods that may reflect those principles in data science systems themselves. We got to explore what those values might look like by design and go beyond just a conversation about ethics in data science.
Noor Elmasry: Realizing, wow, there are actually quantitative measures for fairness, and being able to draw those connections with my other major in philosophy was really important. I also really enjoyed mathematical foundations of machine learning, partly just because I really enjoyed the learning environment and the professor [Eric Jonas], but also because I felt like previously I was either just doing a lot of linear algebra or talking at a high level about bias and machine learning, but this class really connected the two for me.
Alex Hayward: The faculty have been really engaging and excited about the data science that’s being done here at the university now. So I think I also owe it to them that we’re graduating, because they made it easy to love the data science program.
The Data Science Clinic is central to the major, tell us about your experience in these courses.
Noor Elmasry: I worked with Hohonu on the water level prediction project. I really enjoyed the project. I think it speaks to the variety of different fields where data science can help supplement research or analysis. We were working with tides, and how to use specific algorithms that could take a time series that has cycles, not just within a day, but within a month or a year. Taking into consideration all of these parameters of the natural world into your algorithm was really exciting.
Initially, we were cleaning their pipeline. So all the nodes across the east and west coast as well as around Hawaii that gather information about water level, we had to identify nodes that were failing. So defining those different thresholds, and different tests, and then producing visualizations to show the improvement from our cleaning process. In the second quarter, we focused on prediction algorithms and comparing that to the national standard. Hohonu already has a pretty impressive prediction rate, and it was our job to increase that accuracy.
I also loved working with my team, I feel like I learned so much from them, because we all came from different backgrounds. And I really loved the workflow as well, being able to work on a real world problem, take code from other people and manage it. I think that just provided a lot of confidence that the skills that we’re learning in classes can carry over in really tangible ways.
Alex Hayward: I worked on the Prudential project throughout the fall and winter. Our project focused on what it means to use AI ethically, what kinds of ethical principles currently exist, and what ethical AI looks like in practice. Essentially, we were trying to build a tool or metric to evaluate the extent to which companies use data and AI ethically.
Our main limitation was that we could only use publicly available data. So we explored news media, as well as SEC filings to gain a better picture of how different companies might be interested in or using AI. One thing that I really appreciated from the start of the project was that there was, again, a focus on foundational principles. The first thing we did was dive into a deep literature review of what it means to use AI ethically. We created an ethical AI framework around which we built different AI systems to investigate ethical data and AI use.
It was a really formative experience to be thrown into what data science looks like in industry. When I was applying for different projects in the course, I was really curious about how the data science lifecycle looks from the industry side of things. We were lucky to be able to frequently meet with Prudential and gain their constant feedback. I learned a lot in terms of communicating with a client and being able to integrate their feedback in a timely manner. I also learned how to understand the value of your research question, and how to communicate that value and potential impact to others. I learned how to accessibly present data science results to diverse audiences, including technical and non-technical stakeholders.
One new experience that I had on the project was being able to engage in a lot of natural language processing (NLP). The structure of the clinic allows you to adapt and hone your skills in a really independent way. I was able to implement different NLP tasks and apply the theoretical concepts I learned in my data science coursework to a real-world project. It was the most transformative course of my undergraduate career.
How did your experience with the data science major prepare you for your post-graduation plans?
Alex Hayward: I’ll be starting as a data scientist at Prudential, our partner from the Data Science Clinic project.
Working on Prudential’s data science initiatives was very exciting. I felt lucky to experience how Prudential thinks about data science and learn about the kinds of problems they’re interested in exploring with data science. I also gained fantastic mentors that taught me more than I could have hoped for. The fact that they were even interested in this question of ‘what does ethical AI mean?’ set them apart from other teams. To invest in working jointly with a research institution to explore these problems shows how passionate Prudential is about data and AI ethics.
I learned so much from my mentors at Prudential, and and their constant feedback led the project towards success. I was blown away by how strong Prudential’s support and communication was throughout the project. So I’m lucky that I got to experience what the team is like, who ultimately made it an easy decision to express interest in continuing with Prudential after graduation. It’s been an incredible experience to work with Prudential on the Clinic project, and I am so grateful to have the opportunity to continue with Prudential after graduation.
Noor Elmasry: I will start this summer as an associate business data scientist at Google.
I recognize that there still might be a small disconnect with how data science is being taught and how it’s being used in industry, just purely by virtue of the field being so new, and how fast the field is developing. So I really want to gain as many skills as I can, and continue learning. I think Google will be a really great place to do that.
But I think the clinic projects specifically showed me how relevant and valuable data science can be to social impact. That’s ultimately what I want to go into, and that’s something I’m looking forward to, but also just surrounding myself with people who are thinking about data science in the meantime.
As the first UChicago data science graduates, what advice do you have for students considering the major?
Noor Elmasry: I think a lot of people can get something out of data science, skill-wise, to help in whatever they’re passionate about, because I think it’s going to become an increasingly valuable and interdisciplinary tool. I think that even as a minor, it’s incredibly valuable for anthropology or economics or public policy majors; it’s going to be incredibly valuable in all forms, even just a couple classes. Because I think that there’s a lot of thoughts and general skills and application in those classes that are going to help with any other field of interest.
Alex Hayward: I totally agree, I think that any type of data science education nowadays is extremely valuable, even necessary. I think we’re just living in a data science world now. If you’re someone who is curious about how things work, you’re a relentless problem-solver, someone who won’t stop until they find the answer and is constantly eager to learn, then data science is a great program for you. Also, if you’re interested in research and want to be exposed to mission-driven, fast-paced teams, that has been a big part of my experience too.
The data science faculty have been pivotal to the choices I’ve made regarding what I want to do after graduation. They have been incredible mentors. Having exposure to that faculty knowledge and mentorship, whether that’s through a few data science courses or the major, is extremely valuable. My data science classes have been my favorite classes here, and that’s thanks in part to how passionate the professors are and their willingness to meet students where they’re at.