Skip to main content
photo of ms in applied data science students at a cmimi conference in 2023

Sampling Research

Students in the MS-Applied Data Science program now have a research-focused option for their capstone experience. At a time when collaborations between industry and academia are vital for innovation and unlocking new sources of value, applied data scientists are broadening their perspective and deepening their skillsets by gaining exposure to the research side of the discipline.

The University of Chicago’s MS in Applied Data Science has taken an important step to meet this demand by developing a new capstone track for students eager to try their hands at research.

While the capstone experience has traditionally involved working closely with industry partners to solve real-world business problems, the research capstone offers students an opportunity to collaborate with faculty on a leading-edge area of data science where methodologies and applications are still actively being investigated.

“The area of research is the key,” says Utku Pamuksuz, an associate clinical professor at UChicago who spearheaded the MS in Applied Data Science’s research capstone track. “We’re not conducting research on already well-studied areas but on the newest technologies where industry applications are still being worked out and discovered.”

The first batch of research projects focused on generative AI, an area that Pamuksuz says is calling out for research right now and full of opportunities for students to gain experience in research design.

“Our goal is to be as up-to-the-minute as possible when it comes to teaching and sharing content with our students,” he says. “GenAI is an area still full of open questions and we saw that a research-oriented approach where faculty work directly with students as they review papers, conduct experiments, and present at academic conferences would be the best way for students to understand these technologies at a fundamental level.”

Connecting Academia to Industry

For his part, Pamuksuz has straddled research and industry throughout his career. While earning his PhD in machine learning, he collaborated with professors in finance, marketing, and healthcare information systems to understand how these abstract methodologies connect to other disciplines.

“You can do data science and use machine learning tools on virtually anything,” he says. “That’s what really caught my eye: The research side and the applied side are in constant dialogue.”

Having led data science teams at several Fortune 500 companies and co-founded the machine-learning company Inference Analytics, Pamuksuz also continues to do his own research, attend conferences, and publish in journals. He says it gives him an advantage when teaching in the Applied Data Science classroom.

“I’ll show students how particular GenAI tools are being applied today, but then I’ll also highlight how the area is a live research area that’s still open to exploration and being looked at from various angles.”

For the first round of research projects, Pamuksuz and other faculty selected healthcare as the domain area. With solid data sets, innumerable potential use cases, and a positive social factor, the domain gives students an ideal instrument to more deeply understand and operationalize the foundational concepts driving the latest GenAI transformer models.

“Our goal was to work with students and get them to understand these models on the most basic levels, down even to figuring out what each single neuron is doing and how they’re interacting with each other and the specific journey leading to the output. These areas are still very poorly understood today.”

PhD-Lite Experience

For their research capstone project, Andrew Alvarez, Mary Erikson, and Tegan Keigher leveraged the potential of large language models (LLMs) to generate accurate results for CT radiology reports.

With up to fifty percent of radiologists’s time spent drafting reports today, transformer models that can automate the documenting process have the potential to save time and reduce the stress responsible for increasing burnout in recent years.

Having started their project in early 2023—just months after the groundbreaking release of ChatGPT—the team was particularly excited to dive into an area that was so timely and full of potential.

“It’s been really incredible to focus on such a central topic in the world right now and dissect the nuts and bolts taking place inside it,” says Erikson, an analytical lead at Google.

“At first the idea of building our own transformer model might’ve felt a little like we were biting off more than we could chew, but through reading the key research papers, learning in class, and talking with each other and Utku, we were able to get a handle on this new area and connect it with what we’d learned already in the program.”

Alvarez, an engagement manager at United Airlines, says that “doing something research-based gave us an entirely new experience, one that allowed us to pick up a different sort of skillset that’s still amenable to the workplace. It was something like a PhD-lite experience.”

“The capstone-related work has also helped me guide others taking their first steps into using transformer models,” Alvarez adds. “Automating areas that are putting a strain on our teams allows us to focus on the work that matters. That’s where I think these technologies will really shine.”

Frontiers of Research

A critical learning experience arrived when the team presented their research at Johns Hopkins University as part of the Conference on Machine Intelligence in Medical Imaging (CMIMI).

The selective research-focused conference gave them the opportunity to mingle with top experts in the field and view firsthand what was taking place on the frontiers of research.
Perhaps most significant were the questions the team received in response to their presentation.

Mary Erikson, Tegan Keigher, Utku Pamusuz (left to right)

Andrew Alvarez, Mary Erikson, Conference Attendee, Tegan Keigher (left to right)

Ananth Prayaga, Evelyn Wu, Nitin Gupta (left to right)

“They asked us if we’d tried this approach, or looked at it in this way, or pre-processed data in that way,” says Keigher, a lead data scientist at United Airlines. “Learning about these other purely research-based approaches sparked ideas for us and got us thinking about how we could improve our project and test out different avenues.”

The team is now deepening their project by incorporating what they learned at the conference. Once they finalize their results, the plan is to submit their work to the ultimate research test: a peer-reviewed publication.

By Philip Baker