Keeping the HPC ecosystem working with Spack CI
- Track: HPC, Big Data and Data Science devroom
- Room: UD2.120 (Chavanne)
- Day: Sunday
- Start: 16:00
- End: 16:25
- Video only: ud2120_chavanne
- Chat: Join the conversation!
The Spack package manager is widely used by HPC sites, users, and developers to install HPC software, and the Spack project began offering a public binary cache in June of 2022. The cache includes builds for x86_64, Power, and aarch64, as well as for AMD and NVIDIA GPUs and Intel's oneapi compiler. Currently, the system handles nearly 40,000 builds per week to maintain a core set of Spack packages.
Keeping this many different stacks working continuously has been a challenge, and this talk will dive into the build infrastructure we use to make it happen. Spack is hosted on GitHub, but the CI system is orchestrated by GitLab CI in the cloud. Builds are automated and triggered by pull requests, with runners both in the cloud and on bare metal. We will talk about the architecture of the CI system, from the user-facing stack descriptions in YAML to backend services like Kubernetes, Karpenter, S3, CloudFront, and the challenges of tuning runners to give good build performance. We'll also talk about how we've implemented security in a completely PR-driven CI system, and the difficulty of serving all the relevant HPC platforms when most commits are from untrusted contributors. Finally, we'll talk about some of the architectural decisions in Spack itself that had to change to better support CI.
Speakers
Todd Gamblin |