FOSDEM 2016
/
Schedule
/
Tracks
/
Developer rooms
/
HPC, Big Data and Data Science

HPC, Big Data and Data Science devroom

Room: AW1.126
Calendar: iCal, xCal

	09												10												11												12												13												14												15												16												17												18
Sunday	Opening	FlinkML: Large Scale machine learning for Apache Flink					MADlib: Distributed In-Database Machine Learning for Fun and Profit						[AMENDMENT] Apache Bigtop Roll your own Big Data Distribution						Automating Big Data Benchmarking for Different Architectures						hanythingondemand: easily creating on-the-fly Hadoop clusters (and more) on HPC systems						Timely dataflow in Rust HPC performance with a dataflow programming model						ClusterShell Scalable command execution library and tools	Extracting Data from your Open Source Communities	Reproducible and User-Controlled Package Management in HPC with GNU Guix	Scylla, a Cassandra-compatible NoSQL database at 2 million requests/s	Taxi trip analysis (DEBS grand-challenge) with Apache Geode (incubating)		OpenHPC: Community Building Blocks for HPC Systems						XALT: Tracking User Jobs and Environments on a Supercomputer						Multi-host containerised HPC cluster The new Docker networking put into action to spin up a SLURM cluster						Parallel Inception MPP databases ♥ GPGPU						Using Hadoop as a SQL Data Warehouse						ORCA: Query Optimization as a Service						Big Data meets Fast Data: an scalable hybrid real-time transactional and analytics solution						Apache Flink: streaming done right						Streaming Architecture: Why Flow Instead of State?					Closing

High Performance Computing (HPC) and Big Data are two important approaches to scientific computing. HPC typically deals with smaller, highly structured data sets and huge amounts of computation while Big Data, not surprisingly, deals with gigantic, unstructured data sets and focuses on the I/O bottlenecks. With the Big Data trend unlocking access to an unprecedented amount of data, Data Science has emerged to tackle the problem of creating processes and approaches to extracting knowledge or insights from these data sets. Machine learning and predictive analytics algorithms have joined the family of more traditional HPC algorithms and are pushing the requirements of cluster and data scalability.

Free and Open Source communities have been the foundation of the HPC and Big Data communities for some time. In the HPC community, it should be no surprise that 488 of the Top500 supercomputers in the world run Linux. On the Big Data side, the Hadoop ecosystem has had a tremendous amount of Open Source contributions from a wide range of organizations coming together under the Apache Software Foundation.

Our goal is to bring the communities together, share expertise, learn how we can benefit from each other's work and foster further joint research and collaboration. We welcome talks about Free and Open Source solutions to the challenges presented by large scale computing, data management and data analysis.

Event		Speakers	Start	End
Sunday
	Opening		09:00	09:05
	FlinkML: Large Scale machine learning for Apache Flink	Theodore Vasiloudis	09:05	09:30
	MADlib: Distributed In-Database Machine Learning for Fun and Profit	Frank McQuillan	09:30	09:55
	[AMENDMENT] Apache Bigtop Roll your own Big Data Distribution	Olaf Flebbe	10:00	10:30
	Automating Big Data Benchmarking for Different Architectures	Nicolas Poggi	10:30	10:55
	hanythingondemand: easily creating on-the-fly Hadoop clusters (and more) on HPC systems	Ewan Higgs	11:00	11:25
	Timely dataflow in Rust HPC performance with a dataflow programming model	Frank McSherry	11:30	11:55
	ClusterShell Scalable command execution library and tools	Aurélien Degrémont	12:00	12:05
	Extracting Data from your Open Source Communities	Dawn Foster	12:05	12:10
	Reproducible and User-Controlled Package Management in HPC with GNU Guix	Ricardo Wurmus	12:10	12:15
	Scylla, a Cassandra-compatible NoSQL database at 2 million requests/s	Roman Shaposhnik	12:15	12:20
	Taxi trip analysis (DEBS grand-challenge) with Apache Geode (incubating)	William Markito	12:20	12:25
	OpenHPC: Community Building Blocks for HPC Systems	Karl W. Schulz	12:30	12:55
	XALT: Tracking User Jobs and Environments on a Supercomputer	Robert McLay	13:00	13:25
	Multi-host containerised HPC cluster The new Docker networking put into action to spin up a SLURM cluster	Christian Kniep	13:30	13:55
	Parallel Inception MPP databases ♥ GPGPU	Kyle Dunn	14:00	14:25
	Using Hadoop as a SQL Data Warehouse	Lei Chang	14:30	14:55
	ORCA: Query Optimization as a Service	Addison Huddy	15:00	15:25
	Big Data meets Fast Data: an scalable hybrid real-time transactional and analytics solution	William Markito	15:30	15:55
	Apache Flink: streaming done right	Till Rohrmann	16:00	16:25
	Streaming Architecture: Why Flow Instead of State?	Tugdual Grall	16:30	16:55
	Closing		16:55	17:00

FOSDEM16

Brussels / 30 & 31 January 2016

HPC, Big Data and Data Science devroom

Sunday

FOSDEM

This year

Practical information

Media and press