File systems unfit as distributed storage backends: lessons from ten years of Ceph evolution

File systems unfit as distributed storage backends: lessons from 10 years of Ceph evolution Aghayev et al., SOSP'19 Ten years of hard-won lessons packed into just 17 pages (13 if you don’t count the references!) makes this paper extremely good value for your time. It’s also a fabulous example of recognising and challenging implicit assumptions. ... Continue Reading

Azure Data Lake Store: a hyperscale distributed file service for big data analytics

Azure data lake store: a hyperscale distributed file service for big data analytics Douceur et al., SIGMOD'17 Today's paper takes us inside Microsoft Azure's distributed file service called the Azure Data Lake Store (ADLS). ADLS is the successor to an internal file system called Cosmos, and marries Cosmos semantics with HDFS, supporting both Cosmos and ... Continue Reading

The design, implementation and deployment of a system to transparently compress hundreds of petabytes of image files for a file storage service

The design, implementation, and deployment of a system to transparently compress hundreds of petabytes of image files for a file storage service Horn et al., NSDI'17 When I first started reading, I thought this paper was going to be about a new compression format Dropbox had introduced for JPEG images. And it is about that, ... Continue Reading

Redundancy does not imply fault tolerance: analysis of distributed storage reactions to single errors and corruptions

Redundancy does not imply fault tolerance: analysis of distributed storage reactions to single errors and corruptions Ganesan et al., FAST 2017 It's a tough life being the developer of a distributed datastore. Thanks to the wonderful work of Kyle Kingsbury (aka, @aphyr) and his efforts on Jepsen.io, awareness of data loss and related issues in ... Continue Reading