| Speakers | |
|---|---|
|
Isabel Drost |
| Schedule | |
| Day | Sunday |
| Room | Janson |
| Start time | 14:00 |
| End time | 14:45 |
| Duration | 00:45 |
| Info | |
| Event type | Podium |
| Track | Scalability |
| Language | English |
| Media | |
| Video (DIVX) | |
The goal of Apache Hadoop is to make large scale data analysis easy. Hadoop implements a distributed filesystem based on the dieas behind GFS, the Google File System. With Map/Reduce it provides an easy way to implement parallel algorithms.
Storage has become ever cheaper in recent years. Currently one terabyte of harddisk space costs less than 100 Euros. As a result a growing number of businesses have started collecting and digitizing data: Custumer transaction logs, news articles published over decades, crawls of parts o f the world wide web are only few use cases that produce large amounts of data. But with petabytes of data at your fingertips the question of how to make ad-hoc as well as continuous processing efficient arises.
The goal of Apache Hadoop is to make large scale data analysis easy. Hadoop implements a distributed filesystem based on the dieas behind GFS, the Google File System. With Map/Reduce it provides an easy way to implement parallel algorithms.
After motivating the neeed for a distributed library the talk gives an introduction to Hadoop detailing its strengths and weaknesses. It gives an introduction on how to quickly get your own Map/Reduce jobs up and running. The talk closes with an overview of the Hadoop ecosystem.