hanythingondemand: easily creating on-the-fly Hadoop clusters (and more) on HPC systems
- Track: HPC, Big Data and Data Science devroom
- Room: AW1.126
- Day: Sunday
- Start: 11:00
- End: 11:25
hanythingondemand (or HOD for short) is a set of scripts to start services, for example a Hadoop cluster, from within another resource management system (e.g., Torque/PBS) on an HPC system. As such, it allows traditional users of HPC systems to experiment with Hadoop and other services, or use it as a production setup if there is no dedicated setup available. Next to Hadoop clusters, HOD can also create HBase databases, IPython notebooks, and set up a Spark environment.
In this talk, we will:
- motivate the need for a framework like HOD
- discuss its history (based on 'Hadoop On Demand’)
- explain how it works
- showcase several use cases, including:
- easily creating one or more Hadoop clusters on-the-fly for interactive use
- running batch scripts on a Hadoop cluster (non-interactively)
- spawning an IPython notebook with desired resources and connecting to it
HOD is available through https://github.com/hpcugent/hanythingondemand under a GPLv2 license. Detailed documentation is available at http://hod.readthedocs.org
Speakers
Ewan Higgs |