Faster Spark SQL: Adaptive Query Execution in Spark v3
- Track: HPC, Big Data and Data Science devroom
- Room: D.hpc
- Day: Saturday
- Start: 11:00
- End: 11:30
- Video with Q&A: D.hpc
- Video only: D.hpc
- Chat: Join the conversation!
Over the years, there has been extensive efforts to improve Apache Spark SQL performance. This talk will introduce the new Adaptive Query Execution (AQE) framework and how it can automatically improve user query performance. AQE leverages query runtime statistics to dynamically guide Spark's execution as queries run along. The talk will go over the main features in AQE and provide examples on how it can improve on the previous static query plans. Finally, we'll present the significant improvements we have seen on the TPC-DS benchmark with AQE.
Examples of the new runtime optimizations include selecting the right join type (broadcast-hash-join vs. sort-merge-join), dealing with data skew, and automatically selecting the number of shuffle (reducer) partitions.
Speakers
Nicolas Poggi |