Schedule: Strigi internals

Jos van den Oever
Day Sunday
Room H.1302
Start time 10:00
End time 10:45
Duration 00:45
Event type Podium
Track CrossDesktop
Language English
Slides (PDF)
Slides (OOo)
Strigi internals

Strigi, the fastest and smallest desktop search engine, combines stream-based content analysis with an abstract index interface. Revolutionary tools like deepfind, deepgrep, the Strigi daemon and a universal metadata extractor for the free desktop are the result.

Strigi introduces a new way of looking at metadata and file formats that enables the creation of very efficient tools for improving the way users handle their data. It does so by using simple C++ code with very few dependencies.

Two concepts are central: stream-based data analysis and an abstract index interface. The former provides a standardized way of looking at metadata and embedded files. It allows reading arbitrarily deeply nested files with very low cpu and memory consumption. Embedded files of all types are handled in the same way, no matter if they are email attachments, zip file entries, files in an rpm or deb file or pictures embedded in a pdf file. This interface allows for applications like deepfind and deepgrep, versions of 'find' and 'grep' that also list arbitrary deeply nested files.

The latter, the common interface for indexes, is also essential for deepfind and deepgrep, but in addition, it enables a universal metadata extractor which can hugely speed up the indexing phase of any desktop search engine. On top of that, it lets the Strigi desktop search daemon be ignorant about its storage backend.

This presentation explains the technology that allows Strigi to be as fast as it is and will show how other projects can benefit too by using the programs and libraries provided by Strigi.