BEGIN:VCALENDAR VERSION:2.0 PRODID:-//Pentabarf//Schedule 0.3//EN CALSCALE:GREGORIAN METHOD:PUBLISH X-WR-CALDESC;VALUE=TEXT:HPC, Big Data, and Data Science devroom X-WR-CALNAME;VALUE=TEXT:HPC, Big Data, and Data Science devroom X-WR-TIMEZONE;VALUE=TEXT:Europe/Brussels BEGIN:VEVENT METHOD:PUBLISH UID:7071@FOSDEM18@fosdem.org TZID:Europe-Brussels DTSTART:20180204T090000 DTEND:20180204T092500 SUMMARY:Installing software for scientists on a multi-user HPC system DESCRIPTION:
Before scientists can use HPC systems for their research, they need to get the tools and applications installed that they require.
Until recently, this was a (perhaps surprisingly for some) painful process,especially for scientists that lack sufficient experience with compiling software and dealing with dependencies.
Recently, several projects have emerged that aim to facilitate this process, each of which with a particular focus:performance, flexibility, reproducibility, ease of use, support for multiple platforms, etc.
In this talk, I would like to present an objective comparison of the different tools that are most prevalent currently, including:
Although I intend to focus on the use case of installing (scientific) software on multi-user HPC systems, I will also highlight particularly interesting features that fall outside that scope.
CLASS:PUBLIC STATUS:CONFIRMED CATEGORIES:HPC, Big Data, and Data Science URL:https:/fosdem.org/2018/schedule/2018/schedule/event/installing_software_for_scientists/ LOCATION:H.1302 (Depage) ATTENDEE;ROLE=REQ-PARTICIPANT;CUTYPE=INDIVIDUAL;CN="Kenneth Hoste":invalid:nomail END:VEVENT BEGIN:VEVENT METHOD:PUBLISH UID:7058@FOSDEM18@fosdem.org TZID:Europe-Brussels DTSTART:20180204T093000 DTEND:20180204T095500 SUMMARY:Binary packaging for HPC with Spack DESCRIPTION:Spack is a package manager for cluster users, developers, andadministrators, rapidly gaining populartiy in the HPC community. Likeother HPC package managers, Spack was designed to build packages fromsource. However, we've recently added binary packaging capabilities,which pose unique challenges for HPC environments. Most binarydistributions assume a lowest-common-denominator architecture,e.g. x86_64, and do not take advantage of vector instructions orarchitecture-specific features. Spack supports relocatable binaries forspecific OS releases, target architectures, MPI implementations, andother very fine-grained build options.
This talk will introduce binary packaging in Spack and some of the openinfrastructure we have planned for distributing packages. We'll talkabout challenges to providing binaries for a combinatorially largepackage ecosystem, and what we're doing in Spack to address theseproblems. We'll also talk about challenges for implementing relocatablebinaries with a multi-compiler system like Spack. Finally, We'll talkabout how Spack integrates with the US exsascale project's open sourcesoftware release plan, and how this will help glue together the HPC OSSecosystem as a whole.
CLASS:PUBLIC STATUS:CONFIRMED CATEGORIES:HPC, Big Data, and Data Science URL:https:/fosdem.org/2018/schedule/2018/schedule/event/llnl_spack/ LOCATION:H.1302 (Depage) ATTENDEE;ROLE=REQ-PARTICIPANT;CUTYPE=INDIVIDUAL;CN="Todd Gamblin":invalid:nomail END:VEVENT BEGIN:VEVENT METHOD:PUBLISH UID:7061@FOSDEM18@fosdem.org TZID:Europe-Brussels DTSTART:20180204T100000 DTEND:20180204T102500 SUMMARY:Tying software deployment to scientific workflows DESCRIPTION:Package management, container provisioning, and workflow execution are often viewed as related but separate activities. This talk is about using Guix to integrate reproducible software deployment in scientific workflows.
CLASS:PUBLIC STATUS:CONFIRMED CATEGORIES:HPC, Big Data, and Data Science URL:https:/fosdem.org/2018/schedule/2018/schedule/event/guix_workflows/ LOCATION:H.1302 (Depage) ATTENDEE;ROLE=REQ-PARTICIPANT;CUTYPE=INDIVIDUAL;CN="Ludovic Courtès":invalid:nomail END:VEVENT BEGIN:VEVENT METHOD:PUBLISH UID:7014@FOSDEM18@fosdem.org TZID:Europe-Brussels DTSTART:20180204T103000 DTEND:20180204T105500 SUMMARY:Combining CVMFS, Nix, Lmod, and EasyBuild at Compute Canada DESCRIPTION:One of the challenges in HPC is to deliver a consistent software stack that balances the needs of the system administrators with the needs of the users. This means running recent software on enterprise Linux distributions that ship older software.Traditionally this is accomplished using environment modules, that change environment variables such as $PATH to point to the software that is needed.At Compute Canada we have taken this further by distributing a complete user-level software stack, including all needed libraries including the GNU C library, but excluding any privileged components.I will describe our setup, which combines Nix for the bottom layer of base components, EasyBuild for the top layer of more scientifically inclined components, Lmod to implement environment modules, and the CernVM File System (CVMFS) to distribute it to Canadian supercomputers.Expected prior knowledge: knowing how to use the command line and environment variables.
CLASS:PUBLIC STATUS:CONFIRMED CATEGORIES:HPC, Big Data, and Data Science URL:https:/fosdem.org/2018/schedule/2018/schedule/event/computecanada/ LOCATION:H.1302 (Depage) ATTENDEE;ROLE=REQ-PARTICIPANT;CUTYPE=INDIVIDUAL;CN="Bart Oldeman":invalid:nomail END:VEVENT BEGIN:VEVENT METHOD:PUBLISH UID:6598@FOSDEM18@fosdem.org TZID:Europe-Brussels DTSTART:20180204T110000 DTEND:20180204T112500 SUMMARY:Behind the scenes of a FOSS-powered HPC cluster at UCLouvain DESCRIPTION:With the advent of the DevOps and Infrastructure as Code movements, tools have emerged that allow building a complete HPC solution from scratch based only on open source software. At UCLouvain, one of our clusters, and the services on which it depends, is built on a full FOSS stack. From the operating system, to the deployment tools, monitoring, scheduling, and user software installation, everything is built from open source software that inter-operate gracefully. For instance, for deployment/provisioning, we use a combination of Ansible and Salt which we find work perfectly together even if they are often considered to be mutually exclusive.This talk will share our experience with making FOSS software co-operate smoothly and will offer our point of view on choosing the right tool for the right job. It will also present some of the contributions we have made to the open source community.
CLASS:PUBLIC STATUS:CONFIRMED CATEGORIES:HPC, Big Data, and Data Science URL:https:/fosdem.org/2018/schedule/2018/schedule/event/hpc_uclouvain/ LOCATION:H.1302 (Depage) ATTENDEE;ROLE=REQ-PARTICIPANT;CUTYPE=INDIVIDUAL;CN="Damien François":invalid:nomail END:VEVENT BEGIN:VEVENT METHOD:PUBLISH UID:6700@FOSDEM18@fosdem.org TZID:Europe-Brussels DTSTART:20180204T113000 DTEND:20180204T115500 SUMMARY:How DeepLearning can help to improve geospatial DataQuality , an OSM use case. DESCRIPTION:How DeepLearning, and semantic segmentation, can be an efficient way to detect and spot inconsistency in an existing dataset ?OpenStreetMap dataset took as an use case.
CLASS:PUBLIC STATUS:CONFIRMED CATEGORIES:HPC, Big Data, and Data Science URL:https:/fosdem.org/2018/schedule/2018/schedule/event/deeplearning_osm/ LOCATION:H.1302 (Depage) ATTENDEE;ROLE=REQ-PARTICIPANT;CUTYPE=INDIVIDUAL;CN="Olivier Courtin":invalid:nomail END:VEVENT BEGIN:VEVENT METHOD:PUBLISH UID:6586@FOSDEM18@fosdem.org TZID:Europe-Brussels DTSTART:20180204T120000 DTEND:20180204T121000 SUMMARY:Modules v4 DESCRIPTION:Typically users initialize their shell environment when they log in a system by setting environment information for every application they will reference during the session. The Modules project, also referred as Environment Modules, provides a shell command named module
that simplifies shell initialization and lets users easily modify their environment during the session with configuration files called modulefiles.
The Modules project has a long history as its development was started in 1991. At that time, the concept of the module
command was laid down to dynamically and atomically enable environment configurations during a shell session. Since then, this concept has become a standard practice, especially among the scientific community where people share same computing resources but all have a specific software and version requirement.
After an almost 5-year release hiatus, Modules with its version 4 is back into the environment management game. The intend is to further improve the modulefile standard and the module command capabilities, with proven concepts applied to similar fields like software package management.
After briefly explaining the root concept behind the module command, this talk will cover the major changes between versions 3.2 and 4 at both software and project level. Then focus will be put on some of the recent or upcoming new features:* virtual modulefiles* extend the module command at your site* sharing code across different modulefiles* dependencies management between modulefiles* new ways to query or change the environment state
The audience for this talk is anyone who is interested in user environment management. From the system administrator, who has to provide access to a software catalog, to the end-user of shared computing system, who need to juggle with different workloads combining software elements.
CLASS:PUBLIC STATUS:CONFIRMED CATEGORIES:HPC, Big Data, and Data Science URL:https:/fosdem.org/2018/schedule/2018/schedule/event/modules_v4/ LOCATION:H.1302 (Depage) ATTENDEE;ROLE=REQ-PARTICIPANT;CUTYPE=INDIVIDUAL;CN="Xavier Delaruelle":invalid:nomail END:VEVENT BEGIN:VEVENT METHOD:PUBLISH UID:6171@FOSDEM18@fosdem.org TZID:Europe-Brussels DTSTART:20180204T121000 DTEND:20180204T122000 SUMMARY:Scale Out and Conquer: Architectural Decisions Behind Distributed In-Memory Systems DESCRIPTION:Distributed platforms, like Apache Ignite, rely on horizontal scalability. More machines in the cluster means greater performance of the application. Do we always get twice the speed after adding the second machine to the farm? Ten times faster after adding ten machines? Is that [always] true? What is the responsibility of the platform? And where do engineers’ responsibilities begin?
CLASS:PUBLIC STATUS:CONFIRMED CATEGORIES:HPC, Big Data, and Data Science URL:https:/fosdem.org/2018/schedule/2018/schedule/event/distributed_in_memory_systems/ LOCATION:H.1302 (Depage) ATTENDEE;ROLE=REQ-PARTICIPANT;CUTYPE=INDIVIDUAL;CN="Akmal Chaudhri":invalid:nomail END:VEVENT BEGIN:VEVENT METHOD:PUBLISH UID:7065@FOSDEM18@fosdem.org TZID:Europe-Brussels DTSTART:20180204T122000 DTEND:20180204T123000 SUMMARY:The Magnificent Modular Mahout DESCRIPTION:Open source big data engines as well as HPC libraries seem to be proliferating at an increasing rate. Technical debt can be incurred with statistical and machine learning algorithms that require a highly specialized knowledge of the algorithm at hand as well as the distributed engine / HPC library which the method has been written against. The Apache Mahout project presents a highly modular stack which introduces levels of abstraction between the mathematical implementation of the algorithm (an R-Like Scala DSL) and the execution of the code. Users are able to interchange Apache Spark, Apache Flink (batch), and H2O distributed engines, as well as ViennaCL for OpenCL on GPU and OpenMP, and CUDA native solvers. Users can also port high level algorithms to new distributed engines or native solvers by defining a handful of BLAS operations.
CLASS:PUBLIC STATUS:CONFIRMED CATEGORIES:HPC, Big Data, and Data Science URL:https:/fosdem.org/2018/schedule/2018/schedule/event/modular_mahout/ LOCATION:H.1302 (Depage) ATTENDEE;ROLE=REQ-PARTICIPANT;CUTYPE=INDIVIDUAL;CN="Trevor Grant":invalid:nomail END:VEVENT BEGIN:VEVENT METHOD:PUBLISH UID:6878@FOSDEM18@fosdem.org TZID:Europe-Brussels DTSTART:20180204T123000 DTEND:20180204T124000 SUMMARY:Tools for large-scale collection and analysis of source code repositories DESCRIPTION:There are 10s of millions Git repositories publicly available over the Internet, but what kind of tools would one need to be able to treat all this code as a Big Dataset?I will talk about new and existing OSS tools that were built and used, in order to allow collection and analysis of millions of Git repositories on commodity hardware clusters.
CLASS:PUBLIC STATUS:CONFIRMED CATEGORIES:HPC, Big Data, and Data Science URL:https:/fosdem.org/2018/schedule/2018/schedule/event/large_scale_repo_analysis/ LOCATION:H.1302 (Depage) ATTENDEE;ROLE=REQ-PARTICIPANT;CUTYPE=INDIVIDUAL;CN="Alexander Bezzubov":invalid:nomail END:VEVENT BEGIN:VEVENT METHOD:PUBLISH UID:7064@FOSDEM18@fosdem.org TZID:Europe-Brussels DTSTART:20180204T124000 DTEND:20180204T125000 SUMMARY:Slurm in Action: Batch Processing for the 21st Century DESCRIPTION:This talk will give an overview over how we use Slurm to schedule the workloads of over 6000 scientists at NERSC, while providing high throughput, ease of use and ultimately user satisfaction.With the emergence of data-intensive applications it was necessary to update the classic scheduling infrastructure to handle things like user defined software stacks (read: containers), data movement and storage provisioning. We did all of this and more through facilities provided by Slurm. In addition to these features we will discuss priority management and quality of service and how that can greatly improve the user experience of computational infrastructures.
CLASS:PUBLIC STATUS:CONFIRMED CATEGORIES:HPC, Big Data, and Data Science URL:https:/fosdem.org/2018/schedule/2018/schedule/event/slurm/ LOCATION:H.1302 (Depage) ATTENDEE;ROLE=REQ-PARTICIPANT;CUTYPE=INDIVIDUAL;CN="Georg Rath":invalid:nomail END:VEVENT BEGIN:VEVENT METHOD:PUBLISH UID:6939@FOSDEM18@fosdem.org TZID:Europe-Brussels DTSTART:20180204T130000 DTEND:20180204T132500 SUMMARY:The Julia programming language DESCRIPTION:The Julia programming language is a high-level language, primarily developed for scientific computing. It uses just-in-time compilation to get a performance level that is comparable to C/C++. It was designed to overcome the “two-language problem”, where a proof-of-concept in a high-level language needs to be translated to a compiled language by specialists to get the required performance. In this talk, the main features of the language will be highlighted from the perspective of a “convert” coming from C++ and with a focus on scientific programming aspects. As an application, arrays and the work on making parts of the Trilinos library available will be discussed.
CLASS:PUBLIC STATUS:CONFIRMED CATEGORIES:HPC, Big Data, and Data Science URL:https:/fosdem.org/2018/schedule/2018/schedule/event/julia_trilinos_integration/ LOCATION:H.1302 (Depage) ATTENDEE;ROLE=REQ-PARTICIPANT;CUTYPE=INDIVIDUAL;CN="Bart Janssens":invalid:nomail END:VEVENT BEGIN:VEVENT METHOD:PUBLISH UID:6900@FOSDEM18@fosdem.org TZID:Europe-Brussels DTSTART:20180204T133000 DTEND:20180204T135500 SUMMARY:Does data security rule out high performance? DESCRIPTION:Traditionally HPC systems assume they are in a secure, isolated environment andas many barriers as possible are removed, in order to achieve the highestpossible performance. While these assumptions may still hold for traditionalsimulation codes, many HPC clusters are now used for heterogeneous workloads.Such workloads increasingly involve the integration of input data from a varietyof sources, notably in the life sciences. Scientists are now operating at thepopulation scale, where datasets are ultimately derived from real people. Inthis talk we discuss some of the restrictions placed on the usage of suchdatasets, how those restrictions interfere with the goal of high performancecomputing, and some alternative strategies that meet the data requirements whilenot hobbling the speed of analytical workloads.
CLASS:PUBLIC STATUS:CONFIRMED CATEGORIES:HPC, Big Data, and Data Science URL:https:/fosdem.org/2018/schedule/2018/schedule/event/data_security_vs_hpc/ LOCATION:H.1302 (Depage) ATTENDEE;ROLE=REQ-PARTICIPANT;CUTYPE=INDIVIDUAL;CN="Adam Huffman":invalid:nomail END:VEVENT BEGIN:VEVENT METHOD:PUBLISH UID:6835@FOSDEM18@fosdem.org TZID:Europe-Brussels DTSTART:20180204T140000 DTEND:20180204T142500 SUMMARY:CrateDB: A Search Engine or a Database? Both! DESCRIPTION:In this talk, I will give an introduction to CrateDB, its architecture, and demo a few things that people have built with it.
CLASS:PUBLIC STATUS:CONFIRMED CATEGORIES:HPC, Big Data, and Data Science URL:https:/fosdem.org/2018/schedule/2018/schedule/event/cratedb/ LOCATION:H.1302 (Depage) ATTENDEE;ROLE=REQ-PARTICIPANT;CUTYPE=INDIVIDUAL;CN="Maximilian Michels":invalid:nomail END:VEVENT BEGIN:VEVENT METHOD:PUBLISH UID:6595@FOSDEM18@fosdem.org TZID:Europe-Brussels DTSTART:20180204T143000 DTEND:20180204T145500 SUMMARY:Scaling Deep Learning to hundreds of GPUs on HopsHadoop DESCRIPTION:Scaling out deep learning training is the big systems challenge for the deep learning community.Backed by a high performance distributed file system (HopsFS) and the support for GPU sharing and management in the cluster manager (HopsYarn), the HopsWorks platform provides different flavors of Tensorflow-as-a-Service and it offers several possibilities for parallelizing and scaling out deep learning.In this talk we are going to present how data scientists can use the Hops to perform parallel hyperparameter searching, or how they can run traditional distributed Tensorflow on a big data cluster with the TensorflowOnSpark framework.In particular, during the talk, we are going to focus on the last generation of distributed Tensorflow architectures which borrow topology and communication pattern from the HPC field. In the Ring-AllReduce architecture, workers are organized in a ring topology and communicate gradients updates without incurring in the communication bottleneck with the parameter server(s) that traditional distributed Tensorflow suffers from. Ring-AllReduce has been used by Facebook and IBM to reduce the training time on Imagenet from 2 weeks to ~45minutes/1 hour.Finally, we will show how you can recreate the popular game "Quick, Draw!" using HopsWorks and Tensorflow services provided the platform.
The code is available on Github at https://github.com/hopshadoop
CLASS:PUBLIC STATUS:CONFIRMED CATEGORIES:HPC, Big Data, and Data Science URL:https:/fosdem.org/2018/schedule/2018/schedule/event/hopshadoop/ LOCATION:H.1302 (Depage) ATTENDEE;ROLE=REQ-PARTICIPANT;CUTYPE=INDIVIDUAL;CN="Fabio Buso":invalid:nomail END:VEVENT BEGIN:VEVENT METHOD:PUBLISH UID:6714@FOSDEM18@fosdem.org TZID:Europe-Brussels DTSTART:20180204T150000 DTEND:20180204T152500 SUMMARY:AI on Microcontrollers DESCRIPTION:The deployment of Deep learning technology today normally limited to GPU clusters, due to their computational requirements. For AI to be truly ubiquitous, its cost and energy efficiency needs to be improved. With the recent developments made in algorithms and MCUs, we introduce a deep-learning inferencing framework which runs TensorFlow models on MCU powered devices (with Mbed). In comparison to GPUs and mobile CPUs, MCU based devices are much more cost and power efficient. We believe this will open a new paradigm to AI and edge computing.
CLASS:PUBLIC STATUS:CONFIRMED CATEGORIES:HPC, Big Data, and Data Science URL:https:/fosdem.org/2018/schedule/2018/schedule/event/utensor/ LOCATION:H.1302 (Depage) ATTENDEE;ROLE=REQ-PARTICIPANT;CUTYPE=INDIVIDUAL;CN="Neil Tan":invalid:nomail END:VEVENT BEGIN:VEVENT METHOD:PUBLISH UID:6432@FOSDEM18@fosdem.org TZID:Europe-Brussels DTSTART:20180204T153000 DTEND:20180204T155500 SUMMARY:Productionizing Spark ML Pipelines with the Portable Format for Analytics DESCRIPTION:The common perception of machine learning is that it starts with data and ends with a model. In real-world production systems, the traditional data science and machine learning workflow of data preparation, feature engineering and model selection, while important, is only one aspect. A critical missing piece is the deployment and management of models, as well as the integration between the model creation and deployment phases.
This is particularly challenging in the case of deploying Apache Spark ML pipelines for low-latency scoring, since the Spark runtime is ill-suited to the needs of real-time predictive applications. In this talk I will introduce the Portable Format for Analytics (PFA) for portable, open and standardized deployment of data science pipelines and analytic applications. I will also introduce and evaluate Aardpfark, a library I have created for exporting Spark ML pipelines to PFA, as well as compare it to other open-source alternatives available in the community.
CLASS:PUBLIC STATUS:CONFIRMED CATEGORIES:HPC, Big Data, and Data Science URL:https:/fosdem.org/2018/schedule/2018/schedule/event/portable_format_analytics/ LOCATION:H.1302 (Depage) ATTENDEE;ROLE=REQ-PARTICIPANT;CUTYPE=INDIVIDUAL;CN="Nick Pentreath":invalid:nomail END:VEVENT BEGIN:VEVENT METHOD:PUBLISH UID:6358@FOSDEM18@fosdem.org TZID:Europe-Brussels DTSTART:20180204T160000 DTEND:20180204T162500 SUMMARY:Accelerating Big Data Outside of the JVM DESCRIPTION:Many popular big data technologies (such as Apache Spark, BEAM, Flink, and Kafka) are built in the JVM, and many interesting tools are built in other languages (ranging from Python to CUDA). For simple operations the cost of copying the data can quickly dominate, and in complex cases can limit our ability to take advantage of specialty hardware. This talk explores how improved formats are being integrated to reduce these hurdles to co-operation.
CLASS:PUBLIC STATUS:CONFIRMED CATEGORIES:HPC, Big Data, and Data Science URL:https:/fosdem.org/2018/schedule/2018/schedule/event/big_data_outside_jvm/ LOCATION:H.1302 (Depage) ATTENDEE;ROLE=REQ-PARTICIPANT;CUTYPE=INDIVIDUAL;CN="Holden Karau":invalid:nomail END:VEVENT BEGIN:VEVENT METHOD:PUBLISH UID:6725@FOSDEM18@fosdem.org TZID:Europe-Brussels DTSTART:20180204T163000 DTEND:20180204T165500 SUMMARY:Nexmark A unified benchmarking suite for data-intensive systems with Apache Beam DESCRIPTION:NEXMark is an unpublished research paper that introduced a benchmarking suite for streaming systems. The Apache Beam community implemented (and enhanced) the examples of this paper as a series of benchmarks on top of Beam that can be run on different open source distributed processing engines e.g. Apache Spark, Apache Flink, etc. This talk discusses this experience and expects to engage new contributors to bring more ideas so we can eventually have a unified and semantically rich benchmarking standard for batch and streaming data-intensive systems a la TPC.
CLASS:PUBLIC STATUS:CONFIRMED CATEGORIES:HPC, Big Data, and Data Science URL:https:/fosdem.org/2018/schedule/2018/schedule/event/nexmark_benchmarking_suite/ LOCATION:H.1302 (Depage) ATTENDEE;ROLE=REQ-PARTICIPANT;CUTYPE=INDIVIDUAL;CN="Ismaël Mejía":invalid:nomail END:VEVENT END:VCALENDAR