FOSDEM '10 is a free and non-commercial event organized by the community, for the community. Its goal is to provide Free and Open Source developers a place to meet. No registration necessary.

Matthieu Fertre
Day Sunday
Room Ferrer
Start time 10:00
End time 10:15
Duration 00:15
Event type Lightning-Talk
Track Lightning Talks
Language English
Kerrighed: Flexible distributed checkpoint/restart

Process checkpoint consists in saving the state of a running process, so that the process can be restarted at any time later. Uses include fault tolerance, job suspend that frees memory resources, process live-migration across physical machines. Checkpoint services may checkpoint only single processes as well as full operating systems with processes, file systems, socket states, etc. This talk will present Kerrighed's application checkpoint/restart and show its advantages in flexibility over other checkpoint services.

Kerrighed is a Single System Image operating system for clusters. It offers the view of a unique SMP machine on top of a cluster of standard PCs.

Kerrighed is implemented as an extension to the Linux operating system (a set of modules and a patch to the kernel). Current development version is based on Linux 2.6.30.

Main available features are:

  • Cluster wide process management with customizable load balancing over the cluster (through process migration and remote forking)
  • Cluster wide shared memory
  • Application checkpointing
  • Node addition/removal