Brussels / 4 & 5 February 2017

schedule

HPC, Big Data and Data Science devroom


09 10 11 12 13 14 15 16 17 18
Saturday Opening Portability of containers across diverse HPC resources with Singularity The birth of HPC Cuba
How supercomputing is being made available to all Cuban researchers using FOSS
Optimized and reproducible HPC Software deployment
with free software and GNU Guix
Reproducible HPC Software Installation on Cray Systems with EasyBuild Putting Your Jobs Under the Microscope using OGRT Dask - extending Python data tools for parallel and distributed computing Purely Functional GPU Programming with Futhark The Marriage of Cloud, HPC and Containers Quickstart Big Data Extending Spark Machine Learning Pipelines
Going beyond wordcount with Spark ML
Using BigBench to compare Hive and Spark versions and features
BigBench in Hive and Spark
Making Wiki Gardening Tasks Easier Using Big Data and NLP A field guide to the machine learning zoo Intelligently Collecting Data at the Edge
Intro to Apache NiFi and MiNiFi
Postgres MPP Data Warehousing joins Hadoop ecosystem
Making two elephants dance
BigPetStore on Spark and Flink
Implementing use cases on unified Big Data engines
Democratizing Deep Learning with Tensorflow on Hops Hadoop Kafka Streams and Protobuf
stream processing at trivago
Not less, Not more. Exactly Once Large-Scale Stream Processing in Action. Why you should care about SQL for big data and how Apache Calcite can help
#SQL4NoSQL
Closing

High Performance Computing (HPC) and Big Data are two important approaches to scientific computing. HPC typically deals with smaller, highly structured data sets and huge amounts of computation while Big Data, not surprisingly, deals with gigantic, unstructured data sets and focuses on the I/O bottlenecks. With the Big Data trend unlocking access to an unprecedented amount of data, Data Science has emerged to tackle the problem of creating processes and approaches to extracting knowledge or insights from these data sets. Machine learning and predictive analytics algorithms have joined the family of more traditional HPC algorithms and are pushing the requirements of cluster and data scalability.

Free and Open Source communities have been the foundation of the HPC and Big Data communities for some time. In the HPC community, it should be no surprise that 488 of the Top500 supercomputers in the world run Linux. On the Big Data side, the Hadoop ecosystem has had a tremendous amount of Open Source contributions from a wide range of organizations coming together under the Apache Software Foundation.

Our goal is to bring the communities together, share expertise, learn how we can benefit from each other's work and foster further joint research and collaboration. We welcome talks about Free and Open Source solutions to the challenges presented by large scale computing, data management and data analysis.

Event Speakers Start End

Saturday

  Opening Vasia Kalavri 10:30 10:35
  Portability of containers across diverse HPC resources with Singularity Michael Bauer, César Gómez-Martín 10:35 11:00
  The birth of HPC Cuba
How supercomputing is being made available to all Cuban researchers using FOSS
Dieter Roefs, Hector Cruz Enriquez 11:00 11:25
  Optimized and reproducible HPC Software deployment
with free software and GNU Guix
Ludovic Courtès, Pjotr Prins 11:30 11:55
  Reproducible HPC Software Installation on Cray Systems with EasyBuild Guilherme Peretti-Pezzi 12:00 12:25
  Putting Your Jobs Under the Microscope using OGRT Georg Rath 12:30 12:55
  Dask - extending Python data tools for parallel and distributed computing Joris Van den Bossche 13:00 13:25
  Purely Functional GPU Programming with Futhark Troels Henriksen 13:30 13:55
  The Marriage of Cloud, HPC and Containers Adam Huffman 14:00 14:10
  Quickstart Big Data Olaf Flebbe 14:10 14:20
  Extending Spark Machine Learning Pipelines
Going beyond wordcount with Spark ML
Holden Karau 14:25 14:35
  Using BigBench to compare Hive and Spark versions and features
BigBench in Hive and Spark
Nicolas Poggi, Alejandro Montero 14:35 14:45
  Making Wiki Gardening Tasks Easier Using Big Data and NLP Bee Padalkar 14:45 14:55
  A field guide to the machine learning zoo Theodore Vasiloudis 15:00 15:25
  Intelligently Collecting Data at the Edge
Intro to Apache NiFi and MiNiFi
Andy LoPresto 15:30 15:55
  Postgres MPP Data Warehousing joins Hadoop ecosystem
Making two elephants dance
Roman Shaposhnik 16:00 16:25
  BigPetStore on Spark and Flink
Implementing use cases on unified Big Data engines
Marton Balassi 16:30 16:55
  Democratizing Deep Learning with Tensorflow on Hops Hadoop Jim Dowling, Gautier Berthou 17:00 17:25
  Kafka Streams and Protobuf
stream processing at trivago
Clemens Valiente 17:30 17:55
  Not less, Not more. Exactly Once Large-Scale Stream Processing in Action. Paris Carbone 18:00 18:25
  Why you should care about SQL for big data and how Apache Calcite can help
#SQL4NoSQL
Christian Tzolov 18:30 18:55
  Closing Vasia Kalavri 18:55 19:00