Skip to main content

As the crowning achievement of the Master’s in Applied Data Science Program, the Capstone Project unites students with industry partners to solve real-world analytics problems.

At the Autumn Showcase—the largest in the history of the program—projects spanned industries, data types, and methodological approaches. Eight teams stood out for special commendation.

“The hard work, perseverance, and talent is clear from the truly impressive quality of the work,” said Greg Green, executive director for the Master’s in Applied Data Science program. “I was especially pleased to see how inventive the teams were in the solutions they developed for their industry partners.”

Learn more about the winning projects below.

Inference Analytics | Automated Clinical Report Annotation Through Weak Supervision
Vritti Gandhi, Samhita Penigalapati, Medha Yadav, Veda Kilaru
Advisor: Utku Pamuksuz

Gandhi, Penigalapati, Yadav, and Kilaru’s project used natural language processing (NLP) to convert unstructured text from radiology reports into a structured format suitable for supervised machine learning models. A critical aim of their work was to reduce the burden the current labor-intensive manual practice places on healthcare workers.

“We used NLP to perform medical named entity recognition and followed that with a multi-class classification model that automatically annotates the radiology reports with body part and sub-body part labels,” the team said. “In the end, we constructed a variety of models, but the most successful was a stacked BERT-based model. That one yielded test set F1 scores between 0.86 and 0.90 for layer 1, and scores between 0.63 and 0.85 for layer 2, depending on modality.”

Inference Analytics | Implant Detection and Brand Prediction from Dental Images
Jack Chen, Zongyuan Yu, Fiona Fei, Zoey Zhou
Advisor: Michael Xiao

For their project, Chen, Yu, Fei, and Zhou worked with Inference Analytics to improve the approach dentists use to detect the location and brand of tooth implants. Because the current manual method is both time-consuming and error prone, IANN is developing software that can automatically determine the implant’s location and brand.

“Our goal was to develop a machine learning algorithm using YOLO (You Only Look Once) to label the implant location,” the team said. “To detect the implant location and classify the brand, we used a hierarchical clustering technique.”

Inference Analytics | Factually Correct Impression Predictions
Lincy Qin, Kenneth Jin, Renyuan Liu, Allen Dong
Advisor: Utku Pamuksuz

Qin, Jin, Liu, and Dong’s project aimed to improve upon a tool developed by Inference Analytics that seeks to prevent burnout among radiologists and also increase hospital productivity. In order to improve IANN’s tool, which uses an LSTM and BERT-based abstractive summarization model, the team created a factual correctness module that can generate factually extracted impressions based on positive or negative answers to clinical questions.

“We also created a mapping that links parts of the body to clinical findings as a way to assist both physicians and patients visually understand the reports,” the team said. “Cumulatively, these steps can improve the radiology workflow and also provide crucial insights through a patient-friendly interface.”

CME Group | Reclassification of CME Market Sentiment Meter State Using Image Recognition
Xinglin Chen, Qiansheng Zhou, Zijun Wu, Azizha Zeinita
Advisor: Abid Ali

For their project, Chen, Zhou, Wu, and Zeinita worked with CME Group to improve real-time market predictions for a range of market prices including gold, treasuries, corn, and equities. CME currently uses a four-part market sentiment meter that gauges participants across a distribution of (1) complacent, (2) balanced, (3) anxious, and (4) conflicted.

“We used a convolutional neural network to enhance the model’s classification accuracy,” the team wrote. “In the end, our image recognition model is able to more accurately reclassify sentiment states after significant market events.”

Barchart | Corn and Soybean Yield Forecast
Bowen Liao, Zhaochen Ye, Dingying Zhang, Xin Ge
Advisor: Roger Moore

Liao, Ye, Zhang, and Ge worked with Barchart to enhance a derivative product that forecasts soybean and corn yields in the United States. The product is a customer favorite because it gives them a view onto the US crop yield forecast at the start of the growing season.

“With the USDA crop yield forecast as our benchmark,” the team said, “we concluded that using a random forest classification algorithm with the addition of drought index and crop condition data would significantly boost the existing model’s performance.”

Inference Analytics | Incidental Findings Analysis in Radiology Reports
Hyunjae Cho, Bhadri Vaidhyanathan, Jitak Ahn, Milan Toolsidas
Advisor: Utku Pamuksuz

Cho, Vaidhyanathan, Ahn, and Toolsidas developed an automatic detection method for incidental findings recorded during radiology imaging studies. Incidental findings are abnormal masses or lesions unrelated to the primary objective of the radiological exam that get recorded in free text in radiology reports. Often overlooked, erroneous, or miscommunicated, incidental findings can nevertheless be important considerations for ensuring that patients receive appropriate and timely care.

“Our goal was to build an augmented labeled dataset of the semantic rules behind incidental classifications using weak supervision,” the team said. “We then used this dataset to develop a machine learning model using natural language processing (NLP) that would automatically detect and process incidental findings. These are then correlated with established medical guidelines using a rule-based algorithm to identify follow-up recommendations.”

UST | Predicting and Improving CMS Star Ratings
Melody Feng, Jason Lee, Chia-Min Chen, Bruce Liu
Advisor: Wendy Klusendorf

Feng, Lee, Chen, and Liu built a model that predicts the performance of Medicare Part C and Part D health plans under the Star Rating Program. Because star ratings influence how Medicare beneficiaries choose their plans, there is a significant incentive for health plans to improve their star ratings. Additionally, accurate next-year predictions of health plan performance allow better planning and allocation of resources.

“By using past performance and other data sources, we were able to predict overall and summary star ratings for the upcoming year for one third of the health plans we assessed,” the team said. “We were generally accurate to within about half a star. We also developed a recommendations framework that health plans can use to improve their star rating and we made that available through a Streamlit app.”

UChicago Medicine Radiology | Machine Learning for Automated Comparison of Neuroimaging
Jingyi Yun, Meng Yang, Ximan Wu, Jiaxu He
Advisor: Arnab Bose

Yun, Yang, Wu, and He used a segmentation algorithm to assist physicians and radiologists in their daily diagnosis of brain lesions. The large variety of types, sizes, and locations of lesions makes their detection a difficult and time-consuming process that requires the visual assessment of large quantities of medical images.

“Our study was based on 122 brain MRI scans from 18 patients and we used an automatic lesion segmentation algorithm,” the team said. “The U-Net Convolutional Neural Network performed a quantitative analysis of brain lesions that included identifying the lesion’s location and shape. In the end, it increases radiological efficiency by assisting physicians and radiologists to detect, characterize, and monitor brain lesions.”

Written by Philip Baker