Schedule: Kettle: extracting, transforming and loading data
Speakers | |
---|---|
Matt Casters | |
Schedule | |
Day | Saturday |
Room | Ferrer |
Start time | 16:40 |
End time | 16:55 |
Duration | 00:15 |
Info | |
Event type | Lightning-Talk |
Track | Lightning Talks |
Language | English |
Media | |
Slides (PDF) | |
Video (Ogg/Theora) |
With the lightning talk, Matt wants to take a first stab at bridging the gap between two worlds: the general open source world, and the Open Source Business Intelligence world. Unfortunately it will not be possible to explain the whole field of BI in 15 minutes, so the focus will be on Matt's line of expertise, ETL and the Kettle project more specific.
Kettle basically is an extremely fast tool to extract data from many different sources, such as web pages, excel documents, databases, you name it. (E = Extraction in ETL)
We can transparently grab that data and then manipulate, transform or process that data in just about any possible way you can imagine. (T = Transformation in ETL)
Then, after combining data, changing it, mapping values and whatnot, you can store or load the result in whatever format or medium you like, including text files and all possible databases. (L = Loading in ETL)
Some examples:
- Load data from text files, XML files to store it into a database
- Export of database(s) to text-file(s) or other databases
- Import of data into databases, ranging from text-files to excel sheets
- Data migration between database applications
- Exploration of data in existing databases (tables, views, etc.)
- Information enrichment by looking up data in various information stores
- Data cleaning by applying complex conditions in data transformations
- Application integration
- Data warehouse population with built-in support for slowly changing