Brussels / 3 & 4 February 2018


Scaling Deep Learning to hundreds of GPUs on HopsHadoop

Scaling out deep learning training is the big systems challenge for the deep learning community. Backed by a high performance distributed file system (HopsFS) and the support for GPU sharing and management in the cluster manager (HopsYarn), the HopsWorks platform provides different flavors of Tensorflow-as-a-Service and it offers several possibilities for parallelizing and scaling out deep learning. In this talk we are going to present how data scientists can use the Hops to perform parallel hyperparameter searching, or how they can run traditional distributed Tensorflow on a big data cluster with the TensorflowOnSpark framework. In particular, during the talk, we are going to focus on the last generation of distributed Tensorflow architectures which borrow topology and communication pattern from the HPC field. In the Ring-AllReduce architecture, workers are organized in a ring topology and communicate gradients updates without incurring in the communication bottleneck with the parameter server(s) that traditional distributed Tensorflow suffers from. Ring-AllReduce has been used by Facebook and IBM to reduce the training time on Imagenet from 2 weeks to ~45minutes/1 hour. Finally, we will show how you can recreate the popular game "Quick, Draw!" using HopsWorks and Tensorflow services provided the platform.

The code is available on Github at


Fabio Buso