Brussels / 2 & 3 February 2019

schedule

From Zero to Portability

Apache Beam's Journey to Cross-Language Data Processing


Apache Beam is a programming model for composing parallel and distributed data processing jobs.

As many other Apache projects, Beam first used Java as its API language. Unsatisfied with the status quo, Beam developers launched the portability project to enable other languages to run with Beam. Currently, Beam has a Java, Python, and a Go API.

Ultimately, these languages won't just coexist in Apache Beam, but they will complement each other in cross-language data processing jobs.

In this talk we will learn how it is possible to support multiple languages and why it might be a good idea to combine these languages in data processing jobs.

Apache Beam is a programming model for composing parallel and distributed data processing jobs. Once composed, these jobs run on various execution engines like Apache Flink, Apache Spark, or Google Cloud Dataflow. But Apache Beam's vision goes beyond just running on multiple execution engines.

As many other Apache projects, Beam first used Java as its API language. Unsatisfied with the status quo, Beam developers launched the portability project to enable other languages to run with Beam. Currently, Beam has a Java, Python, and a Go API. That means users are not restricted to the Java ecosystem but can use their favorite Python libraries like Numpy or Tensorflow with Apache Beam.

Ultimately, these languages won't just coexist in Apache Beam, but they will complement each other in cross-language data processing jobs. For example, reading from Kafka can be done with the Java connector but the data can afterwards be processed in Python.

In this talk we will learn how it is possible to support multiple languages and why it might be a good idea to combine these languages in data processing jobs.

Speakers

Photo of Maximilian Michels Maximilian Michels

Attachments

Links