Skip to main content

​​In a Data Engineering course taught by Steve Barry, a team of Master’s in Applied Data Science (MS-ADS) students built a full end-to-end data pipeline, from raw public datasets to an interactive, AI-powered interface. The project gave students hands-on experience designing the kind of data infrastructure that underpins real-world applications, developing the technical skills.

Working with Real-World Data Systems

The team–Adi Singh, Junny Choi, Wendy Xing, and Mike Peng–brought a mix of academic and professional backgrounds, with some students early in the program and others continuing part-time studies. They sourced data from multiple public datasets, including crime records, demographic information, police districts, community areas, business licenses, and historical weather data. Bringing these sources together required careful planning and preprocessing before any analysis or visualization could occur.

“The hardest part was the data cleaning,” noted Choi. “A lot of the data wasn’t necessary for our use case, so we had to rename columns, standardize naming conventions, and normalize everything before it could even go into the database.”

The team ultimately worked across multiple large tables, making deliberate decisions about schema design, keys, and relationships. These choices ensured the data could be queried efficiently and extended over time, mirroring how production data systems are built in industry.

Building the Pipeline

With the data prepared, the team designed a pipeline that moved from ingestion to interaction. The system included relational database modeling, API development, and an interactive dashboard that allowed users to explore the data dynamically and move between higher-level views and more detailed breakdowns, without requiring technical expertise.

The students also extended the pipeline by integrating an AI-powered Model Context Protocol (MCP) layer, allowing users to query the database using natural language.

“The point of the MCP is that you don’t have to write code,” said Singh. “You can just ask a question, and the system pulls the answer directly from the database.”

Learning Skills that Carry Forward Through Implementation

For several team members, tools such as API development and MCP integration were entirely new.

“A lot of this was new to me,” Singh shared. “I was teaching myself MCP while building the project, with help from the professor and TA. Being able to learn a new skill and immediately apply it made the project really rewarding.”

Rather than following a predefined template, students were responsible for making architectural decisions, troubleshooting challenges, and iterating on their design—experiences that closely resemble real-world data engineering work.

By the end of the project, students reported increased confidence across a range of applied data science skills, including data cleaning, database design, API development, AI-assisted data interaction, and data visualization.

“It was a great experience seeing how to build a project from zero,” Peng said “An important skill for me was learning how to build APIs connected to the database. We used AI tools to help interact with the data, so the system could return answers directly. That’s really important, because in a future workplace it could help teams get information from large datasets more efficiently.”

“Non-technical communication was a big priority for me,” Singh shared. “We were doing all of these technical analyses, but being able to explain how the system works to people who don’t fully understand the technical side is really important, especially in a workplace where you’re working with stakeholders.”

Xing emphasized the collaborative nature of the work: “When we worked through the problem together, we were able to inspire more meaningful ideas as a group.”

Overall, the project moved beyond analysis, asking students to consider how data is structured, accessed, and communicated before insights appear on a screen. This assignment reflects the MS-ADS program’s emphasis on preparing students for the realities of applied data science work.

arrow-left-smallarrow-right-large-greyarrow-right-large-yellowarrow-right-largearrow-right-long-yellowarrow-right-smallclosefacet-arrow-down-whitefacet-arrow-downCheckedCheckedlink-outmag-glass