Towards reproducible Jupyter notebooks
- Track: HPC, Big Data, and Data Science devroom
- Room: UB5.132
- Day: Sunday
- Start: 12:30
- End: 12:40
Jupyter has become a tool of choice for researchers willing to share a narrative and supporting code that their peers can re-run. This talk is about Jupyter’s Achille’s heel: software deployment. I will present Guix-Jupyter, which aims to make notebook self-contained and to support reproducible deployment.
Jupyter has become a tool of choice for researchers in data science and others fields. Jupyter Notebooks allow them to share a narrative and supporting code that their peers can re-run, which is why it is often considered a good tool for reproducible science.
However, Jupyter Notebooks do not describe their software dependencies, which significantly hinder reproducibility: What if your peer runs different Python version? What if your notebook depends on a library that your peer hasn’t installed? What will happen if you try to run your notebook in a few years?
All these issues are being addressed by tools such as Binder and its friend repo2docker. These solutions, though, do not address what we think is the core issue: that notebooks lack information about their software dependency.
In this talk I will present our take on this problem, Guix-Jupyter. Guix-Jupyter allows users to annotate their notebook with information about their run-time environment. Those annotations are interpreted and Guix takes care of deploying the dependencies described. Furthermore, Guix-Jupyter ensures that code runs in an isolated environment (a container) as a way to maximize reproducibility.
Guix-Jupyter is work-in-progress and we are eager to share our approach and get your feedback!
Speakers
Ludovic Courtès |