FOSDEM 2017
/
Schedule
/
Events
/
Developer rooms
/
MySQL and Friends
/
Data Analytics with MySQL, Apache Spark and Apache Drill

Data Analytics with MySQL, Apache Spark and Apache Drill

Track: MySQL and Friends devroom
Room: H.1309 (Van Rijn)
Day: Saturday
Start: 16:05
End: 16:25

Apache Spark is a cluster computing framework, similar to Apache Hadoop. There are a number of tasks where MySQL does not show great performance: for example MySQL is not massively parallel system and a single query will only utilize 1 CPU core . Spark, on the the other hand is designed to be massively parallel; in addition Spark is a clustering framework, so you can easily add more compute nodes so that Spark can utilize more resources and scale.

Apache Drill is similar project aimed to make data discovery easier. For example it allow you to join data sources in MySQL, MongoDB, flat files, other RDBMS, etc.

In this talk I will demonstrate how to use Apache Spark together with MySQL for data analysis. I will sho how Apache Spark aggregates data (wikipedia pageview statistics) and stores the resultset in MySQL. I will also show how to use Apache Spark with multiple sources and join virtual tables from MySQL, flat files and even MongoDB.

Speakers

	Sveta Smirnova
	Alexander Rubin

FOSDEM17

Brussels / 4 & 5 February 2017

Data Analytics with MySQL, Apache Spark and Apache Drill

Speakers

Links

FOSDEM

This year

Practical information

Media and press