Reproducibility and performance: why choose?
CPU tuning in GNU Guix
- Track: HPC, Big Data and Data Science devroom
- Room: UD2.120 (Chavanne)
- Day: Sunday
- Start: 12:50
- End: 13:00
- Video only: ud2120_chavanne
- Chat: Join the conversation!
High-performance computing (HPC) is often seen as antithetical to “reproducibility”: one would have to choose between software that achieves high performance, and software that can be deployed in a reproducible fashion. This talk will discuss how GNU Guix lets users deploy software optimized for the target machines while preserving provenance tracking and reproducibility.
High-performance computing (HPC) is often seen as antithetical to “reproducibility”: one would have to choose between software that achieves high performance, and software that can be deployed in a reproducible fashion. This talk will discuss how GNU Guix lets users deploy software optimized for the target machines while preserving provenance tracking and reproducibility.
Why is HPC seen as antithetical to reproducibility in the first place? Maybe your cluster admins told you: if you want peak performance, you have to use the software stack that they themselves or the hardware vendors installed and tailored specifically to this cluster, and to recompile your code locally. Your software deployment becomes tied to this machine and hardly reproducible elsewhere.
However, by giving up on reproducibility, we would give up on verifiability, a foundation of the scientific process. How can we conciliate performance and reproducibility? Engineering work that has gone into performance portability has already proved fruitful, but some areas remain unaddressed when it comes to CPU tuning. This talk looks into package multi-versioning, a technique developed for GNU Guix, a tool for reproducible software deployment. We will show that it allows us to implement CPU tuning without compromising on reproducibility and provenance tracking.
Speakers
Ludovic Courtès |