In 2015, the LIGO experiment picked up the sound of two black holes merging 1.3 billion light years away from Earth. It was the first direct observation of gravitational waves, and one of the most important physics discoveries ever, receiving the Nobel Prize just two years later. The experiment cost more than $1 billion, but when the scientists communicated their results to the public, they used software that cost $0: open source Python tools including NumPy, SciPy, matplotlib, and Jupyter notebooks.
“This was a dream come true for us,” said Project Jupyter co-founder Fernando Pérez in his October 8th talk at the Center for Data and Computing. “Fifteen years earlier, we were arguing like crazies in the desert that this Python thing is real and we can do science with it. Now, the most important result of the last decade in physics was achieved with these tools.”
Jupyter is a programming language-agnostic interactive notebook that allows researchers to combine code, text, mathematics, and results into shareable documents. A descendent of the iPython notebook — created by Pérez in graduate school as “my PhD procrastination/depression-control mechanism” — Jupyter is now a core tool for scientists using computation across all fields of inquiry.
But the software behind Jupyter and other scientific Python tools is only part of the open source puzzle, Pérez said. The Jupyter team has also worked through challenges of forming standards and protocols, the economic sustainability of a project with 1500 volunteer contributors, and the traditional incentives of science, which have been slow to adapt to the rise of open source software.
“On any dimension that you look at, the creation and construction of these open, collaborative tools and ideas is at odds with the incentive mechanisms we have throughout the system. From academic hiring and promotion and publishing and tenure, to grant writing and awards,” Pérez said. “We managed to engineer a system of incentives that makes the right things, the hardest things to do.”
Watch Pérez’s full talk, including information on his geosciences project Pangeo and the UChicago connection to the birth of matplotlib, below.
Scientific Open Source Software: Meat and Bits But Not Papers. Is it Real Work?
Open source software is now the backbone of computation across the sciences and increasingly education. Yet the creation of scientific software is not well recognized as part of the enterprise of science in terms of training, career paths, intellectual recognition, organizational support, or funding. In this talk, I’ll explore the challenges of this contradictory situation, from the perspective of someone who has spent almost 20 years building open source software and communities. I have lived (often precariously) a dual life of “real academic” and of open source developer and advocate, working on IPython, Project Jupyter and the Scientific Python ecosystem since 2001.
I will provide an overview of Project Jupyter, including its intellectual core, the open source community context that surrounds it, and some of its impact. This will help frame the second part of the talk, where I’ll try to open a conversation on the social and organizational challenges of creating and sustaining open, collaborative communities in the structure of research and education. The scientific, technical and community dynamics of projects like Jupyter presents interesting challenges in the context of traditional scientific incentives (funding, publishing, hiring and promotion, etc.) I’ll briefly outline some of these but will mostly focus on some ideas that I hope can move the conversation forward in productive ways.
Tuesday, October 8, 2019
Welcome & Introductions
Fernando Pérez is an associate professor in Statistics at UC Berkeley and a Faculty Scientist in the Department of Data Science and Technology at Lawrence Berkeley National Laboratory. After completing a PhD in particle physics at the University of Colorado at Boulder, his postdoctoral research in applied mathematics centered on the development of fast algorithms for the solution of partial differential equations. Today, his research focuses on creating tools for modern computational research and data science across domain disciplines, with an emphasis on high-level languages, interactive and literate computing, and reproducible research. He created IPython while a graduate student in 2001 and co-founded its successor, Project Jupyter. The Jupyter team collaborates openly to create the next generation of tools for human-driven computational exploration, data analysis, scientific insight and education.
He is a National Academy of Science Kavli Frontiers of Science Fellow and a Senior Fellow and founding co-investigator of the Berkeley Institute for Data Science. He is a co-founder of the NumFOCUS Foundation, and a member of the Python Software Foundation. He is a recipient of the 2012 FSF Award for the Advancement of Free Software, and of the 2017 ACM Software System Award.