Brussels / 30 & 31 January 2016

schedule

Streaming Architecture: Why Flow Instead of State?


Batch processing has been, until recently, the standard model for big data. Largely, this is due to the very large influence of the original processing MapReduce implementation in Hadoop and the difficulty in replacing MapReduce in the original Hadoop framework.

Batch processing has been, until recently, the standard model for big data. Largely, this is due to the very large influence of the original processing MapReduce implementation in Hadoop and the difficulty in replacing MapReduce in the original Hadoop framework.

More recently, there has been a shift to streaming architectures using tools such as Apache Spark and Kafka. These architectures offer surprisingly large benefits in terms of simplicity and robustness, but they are also surprisingly different from previous message queuing designs. The changes in these new systems allow enormously higher scalability and make fault tolerance relatively simple to achieve while maintaining good latency.

In this talk I will describe the key design tools and best practice techniques used in modern systems including percolators, the big-data oscilloscope, replayable queues, state-point queuing and universal micro-architectures. The benefits that I will highlight include

  • a decrease in total system complexity

  • flexible throughput/latency trade-offs

  • fault tolerance without the difficulties of the lamdba architecture and

  • easy debuggability

I will detail several Apache projects that attempt to support flow computing.

Speakers

Photo of Tugdual Grall Tugdual Grall

Links