Speakers | |
---|---|
Michaël Figuière | |
Schedule | |
Day | Saturday |
Room | AW1.124 |
Capacity | 59 |
Start time | 17:30 |
End time | 18:00 |
Duration | 00:30 |
Info | |
Track | Data Analytics devroom |
A real-time search engine with Lucene and S4
Introduction to the S4 data stream processing framework and application to real-time indexing for the Apache Lucene search engine.
Search engines have been around for a while, but only recently focus has been made on allowing search on real-time content. To enable such a thing, the whole indexing pipeline has to be made real-time : that is the data processing, and the insertion in the index itself. Lucene has been extended to allow the latter, but the former still has to be handled.
S4 is an emerging technology from Yahoo that simplifies real-time distributed data processing. The goal of this presentation is to show how S4 can be used to enable some expensive pre-processing on a stream of incoming data, right before its indexing, thus bringing a powerful real-time search capability.
Level of experience of audience : Average knowledge of Lucene
Additional note :
The talk will very quickly introduce the S4 philosophy. The rest of the presentation will be driven by a real world example of the development of a real-time search engine.
Concurrent events:
Next (up to 3) talks in the same room (AW1.124):
When | Event | Track |
---|---|---|
18:00-18:15 | How Seeks let you do your Web search at home | Data Analytics |
18:15-18:30 | Graph databases, the Web of Data storage engines | Data Analytics |
18:30-19:00 | Comparing Scalable NOSQL Databases: Functionality and Measurements | Data Analytics |