FOSDEM '10 is a free and non-commercial event organized by the community, for the community. Its goal is to provide Free and Open Source developers a place to meet. No registration necessary.

   
Speakers
Isabel Drost
Schedule
Day Sunday
Room Janson
Start time 14:00
End time 14:45
Duration 00:45
Info
Event type Podium
Track Scalability
Language English
Media
Video (DIVX)
Large scale data analysis made easy - Apache Hadoop

The goal of Apache Hadoop is to make large scale data analysis easy. Hadoop implements a distributed filesystem based on the dieas behind GFS, the Google File System. With Map/Reduce it provides an easy way to implement parallel algorithms.

Storage has become ever cheaper in recent years. Currently one terabyte of harddisk space costs less than 100 Euros. As a result a growing number of businesses have started collecting and digitizing data: Custumer transaction logs, news articles published over decades, crawls of parts o f the world wide web are only few use cases that produce large amounts of data. But with petabytes of data at your fingertips the question of how to make ad-hoc as well as continuous processing efficient arises.

The goal of Apache Hadoop is to make large scale data analysis easy. Hadoop implements a distributed filesystem based on the dieas behind GFS, the Google File System. With Map/Reduce it provides an easy way to implement parallel algorithms.

After motivating the neeed for a distributed library the talk gives an introduction to Hadoop detailing its strengths and weaknesses. It gives an introduction on how to quickly get your own Map/Reduce jobs up and running. The talk closes with an overview of the Hadoop ecosystem.

Other events at the same time:

When Event Track Where
13:00-14:30 LPI exam session 4 Certification Guillissen
13:30-14:15 Front end perfomance Drupal H.2214
13:45-14:15 10x performance improvements - A case study MySQL AW1.121
13:45-14:30 The Semantic Desktop, SPARQL and You! CrossDesktop H.1309
13:45-14:15 ParallelFx, bringing Mono applications in the multicore era Mono H.2213
13:45-14:30 Debian and Ubuntu Distributions H.1308
14:00-14:45 Postgresql: Lists and Recursion and Trees (oh my) Database Chavanne
14:00-14:15 Open-source software: Blaming the unknown, or a constructive approach to technology Lightning Talks Ferrer
14:00-14:30 OpenSound System v4 port to Haiku Alt-OS AW1.105
14:00-14:45 MDB and MDBX: Open Source SimpleDB Projects based on GTM NoSQL AW1.120
14:00-14:30 Objective-C 2.0: libobjc2 and Clang, current status, plans for the future GNUstep AW1.117
14:00-14:45 The free software desktop’s graphics driver stack X.org AW1.124
14:00-14:30 Explore Jetpack Mozilla H.1301
14:00-15:00 Rockbox: open source firmware replacement for music players Embedded Lameere
14:00-14:45 Because the License Matters: BSD as the Foundation for Commercial Point of Sale Applications BSD AW1.126
14:15-14:30 Kaizendo.org: Customizing schoolbooks the free software way Lightning Talks Ferrer
14:15-14:45 Correcting replication data drift with Maatkit MySQL AW1.121
14:15-15:00 How to setup the perfect development environment Drupal H.2214
14:30-14:45 The Wiki for Open Technologies: How to share your projects and knowledge Lightning Talks Ferrer
14:30-15:15 Nepomuk CrossDesktop H.1309
14:30-15:00 Generating Driver Source Code with Rathaxes Alt-OS AW1.105
14:30-15:30 Building The Virtual Babel: Mono In Second Life Mono H.2213
14:30-15:15 Mozilla Drumbeat in Europe Mozilla H.1301