Schedule: Mole -- Infrastructure for Managing Information

Speakers
Jeroen Van Wolffelaar
Schedule
Day Saturday
Room AW1.125
Start time 15:15
End time 16:15
Duration 01:00
Info
Event type Podium
Track Debian
Language English
Media
Slides (OOo)
Video (Ogg/Theora)
Video (DIVX)
Video (Ogg/Theora/Low quality)
Mole -- Infrastructure for Managing Information

This talk describes "mole", a set of scripts which together comprise infrastructure with which various sorts of information can be managed.

The major properties of mole are:

  • A means to accept submitted information, with optional access control and moderation
  • A means to store both transient and constant information ("data")
  • A means to retrieve this information quickly and efficiently in both micro-queries and as whole datasets
  • A means to coordinate the generation of information
This is very generally spoken mole. To make things a bit more concrete, one example:
  • Mole would accept (by mail, HTTP post, or otherwise) build logs for building source package in current unstable/testing in a standard or rather a specifically tweaked environment
  • It would store it in its database in an efficient form, keeping some configured amount of older versions too
  • It would maintain a list of source packages that don't have such log yet, or which log is older than a configurable amount of time, and provide worker machines that can do such rebuilding with a todo-list.
  • Mole would automatically keep track such that double work is prevented.
  • Via for example a web interface people can query results of specific packages, but one can also retrieve the full database.
This approach is very powerful, because mole "jobs" can be stacked. To continue with the above example, a different job could keep track of such logs that didn't get 'judged' yet, and have a worker judge them for "successful/not-successful", and store those qualifications in a result table. Additionally, data-providers do not need to be computer programs, they can also be humans. For example, the job could be "file bugs on failed logs/mark a failed log as "wrongly failed". Data can also be about non-package things, such as bugs, or mirrors. It can itself also be something else than quality tests, examples would be mere extraction of data such as .desktop files for auto-installers, SLOC-counts for funny statistics, or (user-supplied) screenshots and reviews. Current implementation is functional, but is still a little bit rough. It lacks mostly on documentation, needs some scalability tweaks, and perhaps most importantly, it doesn't have an easy query interface such as a web interface yet.