Artificial intelligence (AI) is an increasingly essential tool for scientific discovery, helping researchers unlock new insights from growing pools of data. But AI also creates new barriers in science, as advanced technical expertise is often required to access, evaluate, and use advanced AI models in research.
A $3.5 million grant from the National Science Foundation awarded to a team of scientists from the University of Chicago, Argonne National Laboratory, and MIT will seek to lower these barriers with a new framework inspired by one of the oldest technologies: gardens. Working with researchers from materials science, physics, and chemistry, the team led by Ian Foster will create “Model Gardens” that publish and curate AI models, link them with data and computing resources, and make it simple for users to test and deploy these powerful tools in their own studies.
The Garden team, which also includes Ben Blaiszik of UChicago and Argonne, Rafael Gomez-Bombarelli of MIT, Eliu Huerta of Argonne, and Rebecca Willett of UChicago, hopes that the platform will increase usage of AI models for research and education, advancing scientific discovery, reproducibility, and collaboration.
“The Garden is a place where a community can be built around AI in science,” said Blaiszik, Researcher in Data Science and Learning at Globus Labs. “We see Model Gardens as places where models in similar domains are kept, tended, and shared, and where researchers can discover a validated model that fits their needs and immediately run it with four lines of code, turning what used to require months of effort into minutes or seconds of effort.”
Today’s scientists use machine learning, neural networks, and other AI methods to make predictions, simulate complex phenomena, and design experiments. A chemist might train a model on molecular data to find promising candidates for a new drug treatment or materials for a new technology. Astronomers and high-energy physicists use AI to detect rare events in the massive amounts of data collected by their instruments, and as a guide for where future experiments should look.
Creating these models requires access to large datasets and powerful computers, making them difficult to replicate, share, and repurpose for outside studies. Model Gardens provides a repository for models where they can be linked to papers, testing metrics, known model limitations, and code, plus computing and data storage resources through tools such as funcX and Globus. The combination creates “containers” where scientific models can be run easily from anywhere, accessing NSF-funded data and computing resources through the cloud.
“The NSF has been making important investments in AI through large AI institutes and other programs. Our Model Gardens infrastructure will serve to multiply the effectiveness of each of those centers,” said Foster, Arthur Holly Compton Distinguished Service Professor of Computer Science at UChicago and Distinguished Fellow and Senior Scientist at Argonne. “We want to make the data and the models from these many different efforts trivially accessible, and allow them to be combined in ways that are not possible today.”
The team will initially create Model Gardens containing published models for physics, materials science, and chemistry, partnering with scientists and companies in those communities to populate their garden and ensure that the platform meets the unique needs of each field. Collaborators include scientists working on projects at the Large Hadron Collider and U.S. Department of Energy National Laboratories exploring the frontiers of high-energy physics and designing new therapeutics and advanced materials.
After contributors upload their models to the garden, the framework will run automated tests and screens to validate their accuracy and performance, publishing all results to an information-rich web page describing a model’s features. The framework also generates containers and APIs so that other scientists can easily access the models and their linked data, often via just a few lines of programming code.
“A user can come to the Garden and see all that information at a glance,” Blaiszik said. “They can cite the model, they can learn about the model, they can contact the authors, and they can invoke the model themselves in a web environment, on leadership computing facilities, or on their own computer.”
The simple user interface also makes Model Gardens a powerful tool for education and outreach. By lowering the technical barriers and resource limitations for finding and deploying AI-based scientific models, the framework enables students to work directly with these methods, either as part of a course or in their own research projects.
“AI training is vital for the next generation of scientists,” said Willett, Professor of Statistics and Computer Science at UChicago and Faculty Director of AI at the Data Science Institute. “If we can make datasets and models trivially available, then AI methods become much more accessible to students. You can immediately see how to use different models, how they perform, and potential failure modes. It gives a student a much more tangible experience than they would get otherwise.”
Model Gardens was funded through the NSF Cyberinfrastructure for Sustained Scientific Innovation (CSSI) program, grant #2209892. Learn more about Model Gardens and other tools for scientific discovery, education, and community engagement in Ben Blaiszik’s keynote at the 2022 SciPy conference. Additional team members include Aristana Scourtas (UChicago), KJ Schmidt (UChicago), Logan Ward (Argonne/UChicago), and Ben Galewsky (UIUC).