Streaming Architecture: Why Flow Instead of State?
- Track: HPC, Big Data and Data Science devroom
- Room: AW1.126
- Day: Sunday
- Start: 16:30
- End: 16:55
Batch processing has been, until recently, the standard model for big data. Largely, this is due to the very large influence of the original processing MapReduce implementation in Hadoop and the difficulty in replacing MapReduce in the original Hadoop framework.
Batch processing has been, until recently, the standard model for big data. Largely, this is due to the very large influence of the original processing MapReduce implementation in Hadoop and the difficulty in replacing MapReduce in the original Hadoop framework.
More recently, there has been a shift to streaming architectures using tools such as Apache Spark and Kafka. These architectures offer surprisingly large benefits in terms of simplicity and robustness, but they are also surprisingly different from previous message queuing designs. The changes in these new systems allow enormously higher scalability and make fault tolerance relatively simple to achieve while maintaining good latency.
In this talk I will describe the key design tools and best practice techniques used in modern systems including percolators, the big-data oscilloscope, replayable queues, state-point queuing and universal micro-architectures. The benefits that I will highlight include
a decrease in total system complexity
flexible throughput/latency trade-offs
fault tolerance without the difficulties of the lamdba architecture and
easy debuggability
I will detail several Apache projects that attempt to support flow computing.
Speakers
Tugdual Grall |