Online / 5 & 6 February 2022

visit

HPC, Big Data, and Data Science devroom


09 10 11 12 13 14 15 16 17
Saturday Low-code data visualization and aggregation with OpenSearch Dashboards Uncovering Arcon: A state-first Rust streaming analytics runtime Build an Open Source Streaming Data Pipeline Using OpenStack to reduce HPC service complexity
... no, that is not an oxymoron!
Containers in HPC
State of Containers in HPC
This is The Way- A Crash Course on the Intricacies of Managing CPUs in K8s
From homogenous single-socket to heterogenous multi-socket clusters
Making Apache Spark, Apache Mahout, Kubeflow, and Kubernetes Play Nice
Sunday HPC for Social & Crime Science
Big Data in Police and Crime Research
SCIP: scalable cytometry image processing using Dask in a high performance computing environment
A software for distributed processing of bioimaging datasets
Distributed Join Algorithms in CrateDB
How We Made Distributed Joins 23 Thousand Times Faster
Multidimensional Bloom Filters
A Survey of What, When, Why
Utilizing AMD GPUs: Tuning, programming models, and roadmap Exascale PMI on a heterogeneous sub-exascale Slurm cluster Porting Signal processing algorithms to CuPy for precision measurement PIRA: Performance Instrumentation Refinement Automation WOODS
A set of Benchmarks for Out-of-Distribution Generalization in Time Series Tasks
Bringing together open source scientific software development for HPC and beginners Open source tooling in High-Energy Physics Software

Read the Call for Papers at https://lists.fosdem.org/pipermail/fosdem/2021q4/003315.html.

High Performance Computing (HPC) and Big Data are two important approaches to scientific computing. HPC typically deals with smaller, highly structured data sets and huge amounts of computation while Big Data, not surprisingly, deals with gigantic, unstructured data sets and focuses on the I/O bottlenecks. With the Big Data trend unlocking access to an unprecedented amount of data, Data Science has emerged to tackle the problem of creating processes and approaches to extracting knowledge or insights from these data sets. Machine learning and predictive analytics algorithms have joined the family of more traditional HPC algorithms and are pushing the requirements of cluster and data scalability.

Free and Open Source communities have been the foundation of the HPC and Big Data communities for some time. In the HPC community, it should be no surprise that currently 100% of the Top500 supercomputers in the world run (some variant of) Linux. On the Big Data side, the Hadoop ecosystem has had a tremendous amount of Open Source contributions from a wide range of organizations coming together under the Apache Software Foundation.

Our goal is to bring the communities together, share expertise, learn how we can benefit from each other's work and foster further joint research and collaboration. We welcome talks about Free and Open Source solutions to the challenges presented by large scale computing, data management and data analysis.

Event Speakers Start End

Saturday

  Low-code data visualization and aggregation with OpenSearch Dashboards Olena Kutsenko 10:00 10:30
  Uncovering Arcon: A state-first Rust streaming analytics runtime Max Meldrum 10:30 11:00
  Build an Open Source Streaming Data Pipeline Olena Kutsenko, Francesco Tisiot 11:00 11:30
  Using OpenStack to reduce HPC service complexity
... no, that is not an oxymoron!
John Garbutt 15:00 15:30
  Containers in HPC
State of Containers in HPC
Christian Kniep 15:30 16:00
  This is The Way- A Crash Course on the Intricacies of Managing CPUs in K8s
From homogenous single-socket to heterogenous multi-socket clusters
Swati Sehgal, Marlow Weston 16:00 16:30
  Making Apache Spark, Apache Mahout, Kubeflow, and Kubernetes Play Nice Trevor Grant 16:30 17:00

Sunday

  HPC for Social & Crime Science
Big Data in Police and Crime Research
Philipp M. Dau 10:00 10:30
  SCIP: scalable cytometry image processing using Dask in a high performance computing environment
A software for distributed processing of bioimaging datasets
Maxim Lippeveld 10:30 11:00
  Distributed Join Algorithms in CrateDB
How We Made Distributed Joins 23 Thousand Times Faster
Marija Selakovic 11:00 11:30
  Multidimensional Bloom Filters
A Survey of What, When, Why
Claude Warren 11:30 12:00
  Utilizing AMD GPUs: Tuning, programming models, and roadmap Georgios Markomanolis 12:00 12:30
  Exascale PMI on a heterogeneous sub-exascale Slurm cluster Alex Domingo 15:00 15:30
  Porting Signal processing algorithms to CuPy for precision measurement Mamta Shukla 15:30 16:00
  PIRA: Performance Instrumentation Refinement Automation Jan-Patrick Lehr 16:00 16:30
  WOODS
A set of Benchmarks for Out-of-Distribution Generalization in Time Series Tasks
Jean-Christophe Gagnon-Audet 16:30 17:00
  Bringing together open source scientific software development for HPC and beginners Jan-Patrick Lehr, Moritz Schwarzmeier 17:00 17:30
  Open source tooling in High-Energy Physics Software Valentin Volkl 17:30 18:00