Speakers | |
---|---|
Isabel Drost | |
Schedule | |
Day | Sunday |
Room | Janson |
Start time | 14:00 |
End time | 14:45 |
Duration | 00:45 |
Info | |
Event type | Podium |
Track | Scalability |
Language | English |
Media | |
Video (DIVX) |
The goal of Apache Hadoop is to make large scale data analysis easy. Hadoop implements a distributed filesystem based on the dieas behind GFS, the Google File System. With Map/Reduce it provides an easy way to implement parallel algorithms.
Storage has become ever cheaper in recent years. Currently one terabyte of harddisk space costs less than 100 Euros. As a result a growing number of businesses have started collecting and digitizing data: Custumer transaction logs, news articles published over decades, crawls of parts o f the world wide web are only few use cases that produce large amounts of data. But with petabytes of data at your fingertips the question of how to make ad-hoc as well as continuous processing efficient arises.
The goal of Apache Hadoop is to make large scale data analysis easy. Hadoop implements a distributed filesystem based on the dieas behind GFS, the Google File System. With Map/Reduce it provides an easy way to implement parallel algorithms.
After motivating the neeed for a distributed library the talk gives an introduction to Hadoop detailing its strengths and weaknesses. It gives an introduction on how to quickly get your own Map/Reduce jobs up and running. The talk closes with an overview of the Hadoop ecosystem.