Brussels / 1 & 2 February 2014

schedule

Reduce the Storage Consumption of Your Storage Clusters with RozoFS

The Flexible Distributed File System, based on an Erasure Code


Distributed storage systems like RozoFS provide the best solution to adapt the resources of your system to an evolving demand, but data protection entails a huge data consumption.

This topic would interest those who cares about the data consumption (which is directly linked with energy consumption and architecture cost) of their clusters.

Erasure coding (EC) is a technique providing the same data protection and availability as traditional block replication, while reducing storage usage significantly (e.g. up to 50%). Of course, EC comes with drawbacks, as it performs complex computations. However, the Mojette transform, used in RozoFS for its erasure code behaviour, brings fast computations since it relies on simple additions. Efforts are done to open up EC-based systems to data-intensive applications.

The growth of the global storage is alarming. IDC's Digital Universe study [1] forecasts that the global amount of data will reach 40 zettabytes (ZB) by 2020. Data protection plays a major role in this storage consumption.

The Mojette transform [2] is a mathematical tool from the University of Nantes that computes 'n' redundant projection blocks from 'k' information blocks. Any 'k' blocks among the 'n' are sufficient to retrieve the original data, behaving like an erasure code. Distributing these 'n' projection blocks over network storage nodes, RozoFS [3] is able to face 'n-k' node failures (including disk, network, server failures). Providing the same data protection and availability as traditional block replication [4], this technique reduces significantly the storage capacity (e.g. up to 50%). Of course, erasure coding comes with drawbacks as it performs complex computations. The Mojette transform, however, brings fast computations since it relies on simple additions. RozoFS holds many important characteristics for a distributed storage system, such as:

  • scalability: clusters of storage nodes can be added on demand;
  • openness: compatible with different protocols (CIFS,NFS,...), Amazon S3, Hadoop,...;
  • transparency: users manage their file exactly as usual;
  • management: provide a tool to make the administration tasks easier.

http://www.emc.com/collateral/analyst-reports/idc-the-digital-universe-in-2020.pdf JeanPierre Gu├ędon and Nicolas Normand http://link.springer.com/chapter/10.1007%2F978-3-540-31965-88 http://www.rozofs.com/ Hakim Weatherspoon and John D. Kubiatowicz http://oceanstore.cs.berkeley.edu/publications/papers/pdf/erasureiptps.pdf

Speakers

Dimitri Pertin

Attachments

Links