Brussels / 30 & 31 January 2016

schedule

HPC, Big Data and Data Science devroom


09 10 11 12 13 14 15 16 17 18
Sunday Opening FlinkML: Large Scale machine learning for Apache Flink MADlib: Distributed In-Database Machine Learning for Fun and Profit [AMENDMENT] Apache Bigtop
Roll your own Big Data Distribution
Automating Big Data Benchmarking for Different Architectures hanythingondemand: easily creating on-the-fly Hadoop clusters (and more) on HPC systems Timely dataflow in Rust
HPC performance with a dataflow programming model
ClusterShell
Scalable command execution library and tools
Extracting Data from your Open Source Communities Reproducible and User-Controlled Package Management in HPC with GNU Guix Scylla, a Cassandra-compatible NoSQL database at 2 million requests/s Taxi trip analysis (DEBS grand-challenge) with Apache Geode (incubating) OpenHPC: Community Building Blocks for HPC Systems XALT: Tracking User Jobs and Environments on a Supercomputer Multi-host containerised HPC cluster
The new Docker networking put into action to spin up a SLURM cluster
Parallel Inception
MPP databases ♥ GPGPU
Using Hadoop as a SQL Data Warehouse ORCA: Query Optimization as a Service Big Data meets Fast Data: an scalable hybrid real-time transactional and analytics solution Apache Flink: streaming done right Streaming Architecture: Why Flow Instead of State? Closing

High Performance Computing (HPC) and Big Data are two important approaches to scientific computing. HPC typically deals with smaller, highly structured data sets and huge amounts of computation while Big Data, not surprisingly, deals with gigantic, unstructured data sets and focuses on the I/O bottlenecks. With the Big Data trend unlocking access to an unprecedented amount of data, Data Science has emerged to tackle the problem of creating processes and approaches to extracting knowledge or insights from these data sets. Machine learning and predictive analytics algorithms have joined the family of more traditional HPC algorithms and are pushing the requirements of cluster and data scalability.

Free and Open Source communities have been the foundation of the HPC and Big Data communities for some time. In the HPC community, it should be no surprise that 488 of the Top500 supercomputers in the world run Linux. On the Big Data side, the Hadoop ecosystem has had a tremendous amount of Open Source contributions from a wide range of organizations coming together under the Apache Software Foundation.

Our goal is to bring the communities together, share expertise, learn how we can benefit from each other's work and foster further joint research and collaboration. We welcome talks about Free and Open Source solutions to the challenges presented by large scale computing, data management and data analysis.

Event Speakers Start End

Sunday

  Opening 09:00 09:05
  FlinkML: Large Scale machine learning for Apache Flink Theodore Vasiloudis 09:05 09:30
  MADlib: Distributed In-Database Machine Learning for Fun and Profit Frank McQuillan 09:30 09:55
  [AMENDMENT] Apache Bigtop
Roll your own Big Data Distribution
Olaf Flebbe 10:00 10:30
  Automating Big Data Benchmarking for Different Architectures Nicolas Poggi 10:30 10:55
  hanythingondemand: easily creating on-the-fly Hadoop clusters (and more) on HPC systems Ewan Higgs 11:00 11:25
  Timely dataflow in Rust
HPC performance with a dataflow programming model
Frank McSherry 11:30 11:55
  ClusterShell
Scalable command execution library and tools
Aurélien Degrémont 12:00 12:05
  Extracting Data from your Open Source Communities Dawn Foster 12:05 12:10
  Reproducible and User-Controlled Package Management in HPC with GNU Guix Ricardo Wurmus 12:10 12:15
  Scylla, a Cassandra-compatible NoSQL database at 2 million requests/s Roman Shaposhnik 12:15 12:20
  Taxi trip analysis (DEBS grand-challenge) with Apache Geode (incubating) William Markito 12:20 12:25
  OpenHPC: Community Building Blocks for HPC Systems Karl W. Schulz 12:30 12:55
  XALT: Tracking User Jobs and Environments on a Supercomputer Robert McLay 13:00 13:25
  Multi-host containerised HPC cluster
The new Docker networking put into action to spin up a SLURM cluster
Christian Kniep 13:30 13:55
  Parallel Inception
MPP databases ♥ GPGPU
Kyle Dunn 14:00 14:25
  Using Hadoop as a SQL Data Warehouse Lei Chang 14:30 14:55
  ORCA: Query Optimization as a Service Addison Huddy 15:00 15:25
  Big Data meets Fast Data: an scalable hybrid real-time transactional and analytics solution William Markito 15:30 15:55
  Apache Flink: streaming done right Till Rohrmann 16:00 16:25
  Streaming Architecture: Why Flow Instead of State? Tugdual Grall 16:30 16:55
  Closing 16:55 17:00