Oct

2019

CDAC Distinguished Speaker Series

Scientific Open Source Software with Fernando Pérez (UC-Berkeley, Project Jupyter)

Fernando Pérez is an associate professor in Statistics at UC Berkeley who builds open source tools for humans to use computers as tools for thinking and collaborating, mostly in the scientific Python ecosystem.

October 8, 2019 12:30 PM – 1:30 PM

John Crerar Library, Room 390

Event Recap

In 2015, the LIGO experiment picked up the sound of two black holes merging 1.3 billion light years away from Earth. It was the first direct observation of gravitational waves, and one of the most important physics discoveries ever, receiving the Nobel Prize just two years later. The experiment cost more than $1 billion, but when the scientists communicated their results to the public, they used software that cost $0: open source Python tools including NumPy, SciPy, matplotlib, and Jupyter notebooks.

“This was a dream come true for us,” said Project Jupyter co-founder Fernando Pérez in his October 8th talk at the Center for Data and Computing. “Fifteen years earlier, we were arguing like crazies in the desert that this Python thing is real and we can do science with it. Now, the most important result of the last decade in physics was achieved with these tools.”

Jupyter is a programming language-agnostic interactive notebook that allows researchers to combine code, text, mathematics, and results into shareable documents. A descendent of the iPython notebook — created by Pérez in graduate school as “my PhD procrastination/depression-control mechanism” — Jupyter is now a core tool for scientists using computation across all fields of inquiry.

But the software behind Jupyter and other scientific Python tools is only part of the open source puzzle, Pérez said. The Jupyter team has also worked through challenges of forming standards and protocols, the economic sustainability of a project with 1500 volunteer contributors, and the traditional incentives of science, which have been slow to adapt to the rise of open source software.

“On any dimension that you look at, the creation and construction of these open, collaborative tools and ideas is at odds with the incentive mechanisms we have throughout the system. From academic hiring and promotion and publishing and tenure, to grant writing and awards,” Pérez said. “We managed to engineer a system of incentives that makes the right things, the hardest things to do.”

Watch Pérez’s full talk, including information on his geosciences project Pangeo and the UChicago connection to the birth of matplotlib, below.

Scientific Open Source Software: Meat and Bits But Not Papers. Is it Real Work?

Open source software is now the backbone of computation across the sciences and increasingly education. Yet the creation of scientific software is not well recognized as part of the enterprise of science in terms of training, career paths, intellectual recognition, organizational support, or funding. In this talk, I’ll explore the challenges of this contradictory situation, from the perspective of someone who has spent almost 20 years building open source software and communities. I have lived (often precariously) a dual life of “real academic” and of open source developer and advocate, working on IPython, Project Jupyter and the Scientific Python ecosystem since 2001.

I will provide an overview of Project Jupyter, including its intellectual core, the open source community context that surrounds it, and some of its impact. This will help frame the second part of the talk, where I’ll try to open a conversation on the social and organizational challenges of creating and sustaining open, collaborative communities in the structure of research and education. The scientific, technical and community dynamics of projects like Jupyter presents interesting challenges in the context of traditional scientific incentives (funding, publishing, hiring and promotion, etc.) I’ll briefly outline some of these but will mostly focus on some ideas that I hope can move the conversation forward in productive ways.

Agenda

Tuesday, October 8, 2019

12:00 pm–12:30 pm

Check-In

12:30 pm–12:35 pm

Welcome & Introductions

12:35 pm–1:20 pm

Talk

1:20 pm–1:30 pm

Audience Q&A

Speakers

Fernando Pérez

Associate Professor, University of California, Berkeley; Faculty Scientist, Lawrence Berkeley National Laboratory; Co-Founder, Project Jupyter

Fernando Pérez is an associate professor in Statistics at UC Berkeley and a Faculty Scientist in the Department of Data Science and Technology at Lawrence Berkeley National Laboratory. After completing a PhD in particle physics at the University of Colorado at Boulder, his postdoctoral research in applied mathematics centered on the development of fast algorithms for the solution of partial differential equations. Today, his research focuses on creating tools for modern computational research and data science across domain disciplines, with an emphasis on high-level languages, interactive and literate computing, and reproducible research. He created IPython while a graduate student in 2001 and co-founded its successor, Project Jupyter. The Jupyter team collaborates openly to create the next generation of tools for human-driven computational exploration, data analysis, scientific insight and education.

He is a National Academy of Science Kavli Frontiers of Science Fellow and a Senior Fellow and founding co-investigator of the Berkeley Institute for Data Science. He is a co-founder of the NumFOCUS Foundation, and a member of the Python Software Foundation. He is a recipient of the 2012 FSF Award for the Advancement of Free Software, and of the 2017 ACM Software System Award.

Homepage

Registration

RSVP required.

Add To Calendar 10/08/2019 12:30 PM 10/08/2019 01:30 PM Scientific Open Source Software with Fernando Pérez (UC-Berkeley, Project Jupyter) Fernando Pérez is an associate professor in Statistics at UC Berkeley who builds open source tools for humans to use computers as tools for thinking and collaborating, mostly in the scientific Python ecosystem. John Crerar Library, Room 390 false

Initiatives

Programs

Academic Programs

Other Programs

Community Data Fellow Stephania Tello Zamudio helps broaden internet access for Illinois residents

DSI Software Engineers create interactive map tool to maximize climate investment tax benefits

Transform cohort 3 participant Healee uses AI to improve healthcare

Uncovering Patterns in Structure for Voltage Sensing Membrane Proteins with Machine Learning

Finding the likely causes when potential explanatory factors look alike

An Intro to Gravitational-Wave Astronomy

Ritesh Kumar (Uchicago) – AI+Science Schmidt Fellows Speaker Series

MS in Applied Data Science – Information Session (Online Program)

Chicago Data Night – Aramide Kehinde (Amazon Web Services)

Event Recap

Scientific Open Source Software: Meat and Bits But Not Papers. Is it Real Work?

Agenda

Tuesday, October 8, 2019

Check-In

Welcome & Introductions

Talk

Audience Q&A

Speakers

Fernando Pérez

Registration

Fernando Perez – “Scientific Open Source Software”

Student Roundtable with Fernando Pérez

More on this topic

Navigating the Data Science Job Market: Insights and Opportunities

Inderjit S. Dhillon (The University of Texas at Austin) – MatFormer: Nested Transformer for Elastic Inference

Brandon Stewart (Princeton University) – Getting Inference Right with LLM Annotations in the Social Sciences

Introducing PalmWatch: Mapping the impact of big brands’ palm oil use