Brussels / 30 & 31 January 2016


Gradoop: Scalable Graph Analytics with Apache Flink

At the Leipzig University, we develop Gradoop [1], a framework for distributed, declarative graph analytics on top of Apache Flink [2]. Gradoop is designed around the so-called Extended Property Graph Model (EPGM) and supports semantically rich, schema-free graph data. In this model, a database consists of multiple property graphs, which we call logical graphs. These graphs are application specific subsets from shared vertex and edge sets. The EPGM provides operators for both single graphs as well as collections of graphs. Operators may also return single graphs or graph collections thus enabling the definition of analytical workflows in a declarative way.

In my talk, I would like to give an overview of Gradoop, the EPGM and its operators and show how Apache Flink helps us by presenting a subset of our operator implementations. Furthermore, I will sketch the usefulness of Gradoop by presenting an analytical use case from the business intelligence domain.

Gradoop is open-source and licenced under GPLv3. The Gradoop source code and a short documentation can be found on GitHub [3], a more detailed explanation of the data model and our operators can be found in a technical report [4].

[1] [2] [3] [4]

Intended audience and goals of the talk

The targeted audience are analysts and/or developers that already have some knowledge about graph systems, e.g. semantically expressive graph database systems like Neo4j or generic graph processing systems like Apache Giraph. The goal is to present our concept of combining those two approaches to allow powerful graph analytics in a distributed way.

Previous talks about Gradoop

Flink Forward, October 2015:


Photo of Martin Junghanns Martin Junghanns