When you’re standing in freezing Chicago weather with a broken arm, waiting for a bus that never seems to come, it’s more than an inconvenience. For Daniel Sa, a student in the University of Chicago’s MS in Applied Data Science program, that frustrating personal experience sparked an idea: use data science to help commuters better anticipate delays.

At Transit Hacks 2025, Daniel teamed up with classmates Jazil Kalim, Sakshi Bokil, and Karim Derbali to bring the idea to life. Together, they built a machine learning model that predicts CTA bus delays based on real-time weather conditions. The result? A second-place win and a project with real-world potential.

Hosted by UChicago Transit Enthusiasts & Explorers, Transit Hacks is an annual datathon that invites students to build innovative, tech-centered solutions focused on transportation in the Chicagoland area. From CTA and Metra to Divvy and university shuttle services, the competition encourages participants to apply data science, web development, or algorithmic tools to tackle real transit challenges.

TURNING FRUSTRATION INTO INNOVATION

“The idea came from something I’d tried in a previous hackathon that didn’t quite work out,” said Daniel. “This time, I wanted to see it through.”

Their goal: build a model that could detect potential slowdowns using features such as day of the week, hour of the day, and precipitation. The team trained a Random Forest classification model on weather conditions like temperature and snowfall, along with transit variables, to predict when delays were likely. The model achieved 93% recall (meaning it correctly identified 93% of actual delays) and a ROC AUC of 0.83 (a measure of the model’s overall ability to distinguish between delayed and on-time buses), strong results that suggest real potential for helping riders prepare in advance.

They even started developing a Streamlit app that would allow users to input live or forecasted weather conditions and receive predicted delays in return, though that component is still a work in progress.

USING LESSONS FROM THE MS-ADS PROGRAM TO OVERCOME REAL-WORLD CHALLENGES

The hackathon wasn’t without hurdles. “We expected to use a clean API,” Daniel shared, “but it was down. We had to find and pre-process our own historical bus reliability data, and we only had about two weeks’ worth to work with.”

Another challenge? Merging datasets with mismatched timestamp granularities. “The weather data was hourly, and the bus data came in five-minute chunks. Pandas just wasn’t cutting it, so we brute-forced a solution,” Daniel explained. (Pandas is a Python tool commonly used for organizing and analyzing data, but it can be slow when working with large or complex datasets.) “If we’d had more data, that approach would’ve taken way too long. But it worked in the short timeframe.”

Daniel credits the machine learning coursework in the MS in Applied Data Science program with preparing him for these types of challenges. “It helped a lot,” he said. “We knew how to structure the model, evaluate performance, and iterate.”

ADVICE FOR FUTURE HACKATHON PARTICIPANTS

“Just do it,” Daniel said. “You don’t regret doing one of these things, especially if your alternative is going to be rotting on your afternoon.”

arrow-left-smallarrow-right-large-greyarrow-right-large-yellowarrow-right-largearrow-right-long-yellowarrow-right-smallclosefacet-arrow-down-whitefacet-arrow-downCheckedCheckedlink-outmag-glass