FOSDEM 2017
/
Schedule
/
Tracks
/
Developer rooms
/
HPC, Big Data and Data Science

HPC, Big Data and Data Science devroom

Room: H.2213
Calendar: iCal, xCal

	09												10												11												12												13												14												15												16												17												18
Saturday																			Opening	Portability of containers across diverse HPC resources with Singularity					The birth of HPC Cuba How supercomputing is being made available to all Cuban researchers using FOSS						Optimized and reproducible HPC Software deployment with free software and GNU Guix						Reproducible HPC Software Installation on Cray Systems with EasyBuild						Putting Your Jobs Under the Microscope using OGRT						Dask - extending Python data tools for parallel and distributed computing						Purely Functional GPU Programming with Futhark						The Marriage of Cloud, HPC and Containers		Quickstart Big Data			Extending Spark Machine Learning Pipelines Going beyond wordcount with Spark ML		Using BigBench to compare Hive and Spark versions and features BigBench in Hive and Spark		Making Wiki Gardening Tasks Easier Using Big Data and NLP			A field guide to the machine learning zoo						Intelligently Collecting Data at the Edge Intro to Apache NiFi and MiNiFi						Postgres MPP Data Warehousing joins Hadoop ecosystem Making two elephants dance						BigPetStore on Spark and Flink Implementing use cases on unified Big Data engines						Democratizing Deep Learning with Tensorflow on Hops Hadoop						Kafka Streams and Protobuf stream processing at trivago						Not less, Not more. Exactly Once Large-Scale Stream Processing in Action.						Why you should care about SQL for big data and how Apache Calcite can help #SQL4NoSQL					Closing

High Performance Computing (HPC) and Big Data are two important approaches to scientific computing. HPC typically deals with smaller, highly structured data sets and huge amounts of computation while Big Data, not surprisingly, deals with gigantic, unstructured data sets and focuses on the I/O bottlenecks. With the Big Data trend unlocking access to an unprecedented amount of data, Data Science has emerged to tackle the problem of creating processes and approaches to extracting knowledge or insights from these data sets. Machine learning and predictive analytics algorithms have joined the family of more traditional HPC algorithms and are pushing the requirements of cluster and data scalability.

Free and Open Source communities have been the foundation of the HPC and Big Data communities for some time. In the HPC community, it should be no surprise that 488 of the Top500 supercomputers in the world run Linux. On the Big Data side, the Hadoop ecosystem has had a tremendous amount of Open Source contributions from a wide range of organizations coming together under the Apache Software Foundation.

Our goal is to bring the communities together, share expertise, learn how we can benefit from each other's work and foster further joint research and collaboration. We welcome talks about Free and Open Source solutions to the challenges presented by large scale computing, data management and data analysis.

Event		Speakers	Start	End
Saturday
	Opening	Vasia Kalavri	10:30	10:35
	Portability of containers across diverse HPC resources with Singularity	Michael Bauer, César Gómez-Martín	10:35	11:00
	The birth of HPC Cuba How supercomputing is being made available to all Cuban researchers using FOSS	Dieter Roefs, Hector Cruz Enriquez	11:00	11:25
	Optimized and reproducible HPC Software deployment with free software and GNU Guix	Ludovic Courtès, Pjotr Prins	11:30	11:55
	Reproducible HPC Software Installation on Cray Systems with EasyBuild	Guilherme Peretti-Pezzi	12:00	12:25
	Putting Your Jobs Under the Microscope using OGRT	Georg Rath	12:30	12:55
	Dask - extending Python data tools for parallel and distributed computing	Joris Van den Bossche	13:00	13:25
	Purely Functional GPU Programming with Futhark	Troels Henriksen	13:30	13:55
	The Marriage of Cloud, HPC and Containers	Adam Huffman	14:00	14:10
	Quickstart Big Data	Olaf Flebbe	14:10	14:20
	Extending Spark Machine Learning Pipelines Going beyond wordcount with Spark ML	Holden Karau	14:25	14:35
	Using BigBench to compare Hive and Spark versions and features BigBench in Hive and Spark	Nicolas Poggi, Alejandro Montero	14:35	14:45
	Making Wiki Gardening Tasks Easier Using Big Data and NLP	Bee Padalkar	14:45	14:55
	A field guide to the machine learning zoo	Theodore Vasiloudis	15:00	15:25
	Intelligently Collecting Data at the Edge Intro to Apache NiFi and MiNiFi	Andy LoPresto	15:30	15:55
	Postgres MPP Data Warehousing joins Hadoop ecosystem Making two elephants dance	Roman Shaposhnik	16:00	16:25
	BigPetStore on Spark and Flink Implementing use cases on unified Big Data engines	Marton Balassi	16:30	16:55
	Democratizing Deep Learning with Tensorflow on Hops Hadoop	Jim Dowling, Gautier Berthou	17:00	17:25
	Kafka Streams and Protobuf stream processing at trivago	Clemens Valiente	17:30	17:55
	Not less, Not more. Exactly Once Large-Scale Stream Processing in Action.	Paris Carbone	18:00	18:25
	Why you should care about SQL for big data and how Apache Calcite can help #SQL4NoSQL	Christian Tzolov	18:30	18:55
	Closing	Vasia Kalavri	18:55	19:00

FOSDEM17

Brussels / 4 & 5 February 2017

HPC, Big Data and Data Science devroom

Saturday

FOSDEM

This year

Practical information

Media and press