Brussels / 3 & 4 February 2018


Cypher for Apache Spark


Graph pattern matching is one of the most interesting and challenging operations in graph analytics. Query languages like openCypher, implemented in systems like Neo4j, SAP HANA Graph and Redis Graph, allow the intuitive definition of graph patterns including structural and semantic predicates.

For now, graph query languages are most prominent in graph database systems such as Neo4j. However, we think that many systems can benefit from having such a language in their toolbox. One of these systems is Apache Spark, which is one of the most popular open source frameworks in the context of distributed processing of large data volumes within complex analytical workloads. To bring the benefits of Cypher from the graph database realm into the world of Big Data, we at Neo4j started developing Cypher for Apache Spark (CAPS). CAPS is primarily focused on graph-powered data integration and graph analytical query workloads within the Spark ecosystem. In addition, CAPS is our testbed for Cypher language extensions as specified in the openCypher project; for example, multiple graphs, graph transformations and construction, and query composition.

In our talk, we want to motivate use-cases for CAPS and give an overview of new querying capabilities which we demonstrate using Apache Spark and Apache Zeppelin. Furthermore, we briefly present the internal architecture highlighting the main differences between Neo4j and CAPS.

Intended audience and goal of the talk

Developers and analysts, trying to express graph queries on distributed graphs

Links to background information on the given talk for the hungry and impatient

Links to previous talks, code snippets or repositories


Photo of Martin Junghanns Martin Junghanns
Photo of Max Kie├čling Max Kie├čling