Brussels / 30 & 31 January 2016


Arabesque: A Distributed Graph Mining Platform

Arabesque provides an elegant solution to the difficult problem of Graph Mining that lets a user easily express graph mining algorithms and efficiently distribute the computation.

Distributed data processing platforms such as MapReduce and Pregel (Giraph, GraphLab, Graph-X) have substantially simplified the design and deployment of certain classes of distributed graph analytics algorithms such as computing Pagerank, SSP, CC etc.

However, these platforms do not represent a good match for graph mining problems, as for example finding frequent subgraphs(FSM) in a labeled graph or finding Cliques. Given an input graph, these problems require exploring a very large number of subgraphs (embeddings) and finding patterns that match some “interestingness” criteria desired by the user. The number of the embeddings that the system needs to explore grows exponential to the size of the embeddings. For instance even for a tiny graph of ~4000 vertices, we can reach 1.7 Billion embeddings when we consider embeddings of size 6.

In this talk, we present Arabesque the first distributed data processing platform for implementing graph mining algorithms. Arabesque automates the process of exploring a very large number of embeddings and defines a high-level computational model, Think Like an Embedding (TLE), which simplifies the development of scalable graph mining algorithms. Arabesque based applications require a handful of lines of code, scale to trillions of embeddings, and represent in some cases the first available distributed solutions. Arabesque runs on top of Giraph/Hadoop, but uses the TLE programming model instead of the TLV model.


Georgos Siganos