PreScience: Forecasting the future of science end-to-end
A new benchmark from the Allen Institute for AI (AI2) and the University of Chicago is putting AI’s scientific imagination to the test. PreScience was developed by AI2 in collaboration with UChicago faculty including James Evans (Max Palevsky Professor of Sociology & Data Science; Director, Knowledge Lab; DSI Faculty Co-Director of Novel Intelligence), with support from the National Science Foundation.
Whereas existing AI evaluations of science focus on narrow slices of the research process (e.g. predicting a citation count or generating an abstract), PreScience takes on the full research lifecycle. Grounded in roughly 100,000 real papers, authors, and citation histories from arXiv, the benchmark captures the real-world dynamics of how scientific research unfolds. It breaks scientific advance into four stages–predicting who will collaborate on a paper, identifying which prior work they’ll build on, generating the paper’s actual contribution, and forecasting how much impact it will have–and links them to form a multi-step “science simulator.”
When the team behind PreScience chained all four tasks together into a 12-month simulation of AI research, they found the resulting synthetic corpus was less diverse and less novel than what human researchers actually produced over the same period. Even when given diverse inputs, AI models tend to converge on a narrower range of ideas than real scientists would explore. But part of the benchmark’s value is in surfacing that gap.
“If we can forecast where science is heading, we can also ask where it should be heading,” said Evans. PreScience “reveals the gap between the science we’re likely to get and the science we want and need,” he explained, and offers “a lever for steering discovery toward the emerging opportunities and futures we choose.”
The benchmark also highlights current challenges in scientific forecasting–predicting initial collaborations between scholars and generating truly novel findings–with implications for the kinds of tools that could help researchers advance scientific discovery, from recommending co-authors to suggesting directions to explore.
The team sees PreScience as a living benchmark, and plans to incorporate richer signals like institutional affiliations, funding sources, and multimodal scientific artifacts like figures and tables as the project grows. The PreScience dataset, evaluation suite, and technical report are publicly available. To learn more, check out the AI2 blog.
People
