Reproducibility is Not a Crisis. Now What? Next Steps for Advancing Computational and Data-enabled Science
“There is no crisis, but also no time for complacency” said the chair of the National Academies of Sciences, Engineering, and Medicine (NASEM) committee on “Reproducibility and Replicability in Science” in May 2019 (https://vimeo.com/335923468). Questions regarding reproducibility have arisen regarding the transparency of computational methods and discovery, in part due to the leveraging of data and compute resources for scientific and engineering advancements, now pervasive in a staggeringly broad range of academic disciplines and activities. In this talk, I present the reproducibility definitions that emerged in our NASEM committee deliberations and discuss an abstract framework for conceptualizing and advancing data science as a discipline, called the Lifecycle of Data Science (forthcoming in CACM). This framework integrates the disparate components of data-enabled discovery, from hardware provisioning to applications to dissemination standards for verification and re-use to ethics, and thereby brings into contextual focus salient issues such as computational reproducibility, standards and policy, and curricular development. I then present the “Knowledge Integrator,” an effort to conceptualize and enable the dissemination of reproducible research results based on the Lifecycle of Data Science and community engagement.
Victoria Stodden joined the School of Information Sciences as an associate professor in Fall 2014. She is a leading figure in the area of reproducibility in computational science, exploring how can we better ensure the reliability and usefulness of scientific results in the face of increasingly sophisticated computational approaches to research. Her work addresses a wide range of topics, including standards of openness for data and code sharing, legal and policy barriers to disseminating reproducible research, robustness in replicated findings, cyberinfrastructure to enable reproducibility, and scientific publishing practices. Stodden co-chairs the NSF Advisory Committee for CyberInfrastructure and is a member of the NSF Directorate for Computer and Information Science and Engineering (CISE) Advisory Committee. She also serves on the National Academies Committee on Responsible Science: Ensuring the Integrity of the Research Process.
Previously an assistant professor of statistics at Columbia University, Stodden taught courses in data science, reproducible research, and statistical theory and was affiliated with the Institute for Data Sciences and Engineering. She co-edited two books released in 2014—Privacy, Big Data, and the Public Good: Frameworks for Engagement published by Cambridge University Press and Implementing Reproducible Research published by Taylor & Francis. Stodden earned both her PhD in statistics and her law degree from Stanford University. She also holds a master’s degree in economics from the University of British Columbia and a bachelor’s degree in economics from the University of Ottawa.