Schedule: Kettle: extracting, transforming and loading data

Speakers
	Matt Casters
Schedule
Day	Saturday
Room	Ferrer
Start time	16:40
End time	16:55
Duration	00:15
Info
Event type	Lightning-Talk
Track	Lightning Talks
Language	English
Media
Slides (PDF)
Video (Ogg/Theora)

Kettle: extracting, transforming and loading data

With the lightning talk, Matt wants to take a first stab at bridging the gap between two worlds: the general open source world, and the Open Source Business Intelligence world. Unfortunately it will not be possible to explain the whole field of BI in 15 minutes, so the focus will be on Matt's line of expertise, ETL and the Kettle project more specific.

Kettle basically is an extremely fast tool to extract data from many different sources, such as web pages, excel documents, databases, you name it. (E = Extraction in ETL)

We can transparently grab that data and then manipulate, transform or process that data in just about any possible way you can imagine. (T = Transformation in ETL)

Then, after combining data, changing it, mapping values and whatnot, you can store or load the result in whatever format or medium you like, including text files and all possible databases. (L = Loading in ETL)

Some examples:

Load data from text files, XML files to store it into a database
Export of database(s) to text-file(s) or other databases
Import of data into databases, ranging from text-files to excel sheets
Data migration between database applications
Exploration of data in existing databases (tables, views, etc.)
Information enrichment by looking up data in various information stores

(databases, text-files, excel sheets and more )

Data cleaning by applying complex conditions in data transformations
Application integration
Data warehouse population with built-in support for slowly changing

dimensions, junk dimensions and much, much more. To do all these things we provide an easy to use graphical user interface. For a screen shot, see here: http://www.pentaho.com/images/transformation_screenshot.png One of the first things that was done by Kettle (and myself 4-5 years ago) was the capturing of the traffic data for the Flanders Traffic Centre (Vlaams Verkeerscentrum) in Wilrijk: http://www.verkeerscentrum.be/verkeersinfo/default This source data is updated every minute and loaded from thousands of sources accross the state, referenced, cleaned and loaded in a single Terabyte database containing years of history and billions of rows of data. That database is then used to do traffic pattern analyses, traffic density reports, planning of road works, etc.

FOSDEM

Links

User login

Schedule: Kettle: extracting, transforming and loading data

Links: