FOSDEM 2022
/
Schedule
/
Tracks
/
Developer rooms
/
HPC, Big Data, and Data Science

HPC, Big Data, and Data Science devroom

Room: D.hpc
Calendar: iCal, xCal
Video with Q&A: D.hpc
Video only: D.hpc
Chat: Join the conversation!

	09												10												11												12												13												14												15												16												17
Saturday													Low-code data visualization and aggregation with OpenSearch Dashboards						Uncovering Arcon: A state-first Rust streaming analytics runtime						Build an Open Source Streaming Data Pipeline																																																Using OpenStack to reduce HPC service complexity ... no, that is not an oxymoron!						Containers in HPC State of Containers in HPC						This is The Way- A Crash Course on the Intricacies of Managing CPUs in K8s From homogenous single-socket to heterogenous multi-socket clusters						Making Apache Spark, Apache Mahout, Kubeflow, and Kubernetes Play Nice
Sunday													HPC for Social & Crime Science Big Data in Police and Crime Research						SCIP: scalable cytometry image processing using Dask in a high performance computing environment A software for distributed processing of bioimaging datasets						Distributed Join Algorithms in CrateDB How We Made Distributed Joins 23 Thousand Times Faster						Multidimensional Bloom Filters A Survey of What, When, Why						Utilizing AMD GPUs: Tuning, programming models, and roadmap																																				Exascale PMI on a heterogeneous sub-exascale Slurm cluster						Porting Signal processing algorithms to CuPy for precision measurement						PIRA: Performance Instrumentation Refinement Automation						WOODS A set of Benchmarks for Out-of-Distribution Generalization in Time Series Tasks						Bringing together open source scientific software development for HPC and beginners						Open source tooling in High-Energy Physics Software

Read the Call for Papers at https://lists.fosdem.org/pipermail/fosdem/2021q4/003315.html.

High Performance Computing (HPC) and Big Data are two important approaches to scientific computing. HPC typically deals with smaller, highly structured data sets and huge amounts of computation while Big Data, not surprisingly, deals with gigantic, unstructured data sets and focuses on the I/O bottlenecks. With the Big Data trend unlocking access to an unprecedented amount of data, Data Science has emerged to tackle the problem of creating processes and approaches to extracting knowledge or insights from these data sets. Machine learning and predictive analytics algorithms have joined the family of more traditional HPC algorithms and are pushing the requirements of cluster and data scalability.

Free and Open Source communities have been the foundation of the HPC and Big Data communities for some time. In the HPC community, it should be no surprise that currently 100% of the Top500 supercomputers in the world run (some variant of) Linux. On the Big Data side, the Hadoop ecosystem has had a tremendous amount of Open Source contributions from a wide range of organizations coming together under the Apache Software Foundation.

Our goal is to bring the communities together, share expertise, learn how we can benefit from each other's work and foster further joint research and collaboration. We welcome talks about Free and Open Source solutions to the challenges presented by large scale computing, data management and data analysis.

Event		Speakers	Start	End
Saturday
	Low-code data visualization and aggregation with OpenSearch Dashboards	Olena Kutsenko	10:00	10:30
	Uncovering Arcon: A state-first Rust streaming analytics runtime	Max Meldrum	10:30	11:00
	Build an Open Source Streaming Data Pipeline	Olena Kutsenko, Francesco Tisiot	11:00	11:30
	Using OpenStack to reduce HPC service complexity ... no, that is not an oxymoron!	John Garbutt	15:00	15:30
	Containers in HPC State of Containers in HPC	Christian Kniep	15:30	16:00
	This is The Way- A Crash Course on the Intricacies of Managing CPUs in K8s From homogenous single-socket to heterogenous multi-socket clusters	Swati Sehgal, Marlow Weston	16:00	16:30
	Making Apache Spark, Apache Mahout, Kubeflow, and Kubernetes Play Nice	Trevor Grant	16:30	17:00
Sunday
	HPC for Social & Crime Science Big Data in Police and Crime Research	Philipp M. Dau	10:00	10:30
	SCIP: scalable cytometry image processing using Dask in a high performance computing environment A software for distributed processing of bioimaging datasets	Maxim Lippeveld	10:30	11:00
	Distributed Join Algorithms in CrateDB How We Made Distributed Joins 23 Thousand Times Faster	Marija Selakovic	11:00	11:30
	Multidimensional Bloom Filters A Survey of What, When, Why	Claude Warren	11:30	12:00
	Utilizing AMD GPUs: Tuning, programming models, and roadmap	Georgios Markomanolis	12:00	12:30
	Exascale PMI on a heterogeneous sub-exascale Slurm cluster	Alex Domingo	15:00	15:30
	Porting Signal processing algorithms to CuPy for precision measurement	Mamta Shukla	15:30	16:00
	PIRA: Performance Instrumentation Refinement Automation	Jan-Patrick Lehr	16:00	16:30
	WOODS A set of Benchmarks for Out-of-Distribution Generalization in Time Series Tasks	Jean-Christophe Gagnon-Audet	16:30	17:00
	Bringing together open source scientific software development for HPC and beginners	Jan-Patrick Lehr, Moritz Schwarzmeier	17:00	17:30
	Open source tooling in High-Energy Physics Software	Valentin Volkl	17:30	18:00

FOSDEM22

Online / 5 & 6 February 2022

HPC, Big Data, and Data Science devroom

Saturday

Sunday

FOSDEM

This year

Practical information

Media and press