File systems unfit as distributed storage backends: lessons from ten years of Ceph evolution

File systems unfit as distributed storage backends: lessons from 10 years of Ceph evolution Aghayev et al., SOSP'19 Ten years of hard-won lessons packed into just 17 pages (13 if you don’t count the references!) makes this paper extremely good value for your time. It’s also a fabulous example of recognising and challenging implicit assumptions. … Continue reading File systems unfit as distributed storage backends: lessons from ten years of Ceph evolution

Towards web-based delta synchronization for cloud storage systems

Towards web-based delta synchronization for cloud storage systems Xiao et al., FAST’18 If you use Dropbox (or an equivalent service) to synchronise file between your Mac or PC and the cloud, then it uses an efficient delta-sync (rsync) protocol to only upload the parts of a file that have changed. If you use a web … Continue reading Towards web-based delta synchronization for cloud storage systems

Barrier-enabled IO stack for Flash storage

Barrier-enabled IO stack for flash storage Won et al., FAST’18 The performance of Flash storage has benefited greatly from concurrency and parallelism - for example, multi-channel controllers, large caches, and deep command queues. At the same time, the time to program an individual Flash cell has stayed fairly static (and even become slightly worse in … Continue reading Barrier-enabled IO stack for Flash storage

Protocol aware recovery for consensus-based storage

Protocol aware recovery for consensus based storage Alagappan et al., FAST’18 Following on from their excellent previous work on ‘All file systems are not created equal’ (well worth a read if you haven’t encountered it yet), in this paper the authors look at how well some of our most reliable protocols — those used in … Continue reading Protocol aware recovery for consensus-based storage

Azure Data Lake Store: a hyperscale distributed file service for big data analytics

Azure data lake store: a hyperscale distributed file service for big data analytics Douceur et al., SIGMOD'17 Today's paper takes us inside Microsoft Azure's distributed file service called the Azure Data Lake Store (ADLS). ADLS is the successor to an internal file system called Cosmos, and marries Cosmos semantics with HDFS, supporting both Cosmos and … Continue reading Azure Data Lake Store: a hyperscale distributed file service for big data analytics

The design, implementation and deployment of a system to transparently compress hundreds of petabytes of image files for a file storage service

The design, implementation, and deployment of a system to transparently compress hundreds of petabytes of image files for a file storage service Horn et al., NSDI'17 When I first started reading, I thought this paper was going to be about a new compression format Dropbox had introduced for JPEG images. And it is about that, … Continue reading The design, implementation and deployment of a system to transparently compress hundreds of petabytes of image files for a file storage service