A higher order estimate of the optimum checkpoint interval for restart dumps - Daly 2004 TL;DR: if you know how long it takes your system to create a checkpoint/snapshot (δ), and you know the expected mean-time between failures (M), then set the checkpoint interval to be √(2δM) - δ. OK, I grant that today's paper … Continue reading A higher order estimate of the optimum checkpoint interval for restart dumps
Tag: Algorithms and data structures
Handy algorithms and data structures – e.g., for approximations
Detecting Termination of Distributed Computations Using Markers
Detecting Termination of Distributed Computations Using Markers - Misra 1983 There's an intriguing line in the Distributed GraphLab paper that caught my eye: "Termination is evaluated using distributed consensus algorithm described in [Ref]." Today's choice is the paper by Misra in 1983 that describes this distributed termination detection algorithm. The solution is similar in spirit … Continue reading Detecting Termination of Distributed Computations Using Markers
A Bridging Model for Parallel Computation
A Bridging Model for Parallel Computation - Valiant 1990 We've seen a lot of references to the 'Bulk Synchronous Parallel' model over the last two weeks. When it was conceived by Valiant in 1990 though, it was intended as a much more general model than simply an abstraction to support graph processing. As the von … Continue reading A Bridging Model for Parallel Computation
Scaling Concurrent Log-Structured Data Stores
Scaling Concurrent Log-Structured Data Stores - Golan-Gueta et al. 2015 Key-value stores based on log-structured merge trees are everywhere. The original design was intended to mitigate slow disk I/O. Once this is achieved, as we scale to more and more cores the authors find that in-memory contention now becomes the bottleneck (see yesterday's piece on … Continue reading Scaling Concurrent Log-Structured Data Stores
Distributed Snapshots: Determining Global States of Distributed Systems
Distributed Snapshots: Determining Global States of Distributed Systems - Chandy & Lamport 1985. What state is your distributed system in? In the absence of a universal clock, is that even a well-formed question? And if you could take a distributed snapshot of system state, would that be useful? Through an algorithm that has simply become … Continue reading Distributed Snapshots: Determining Global States of Distributed Systems
Asynchronized Concurrency: The Secret to Scaling Concurrent Search Data Structures
Asynchronized Concurrency: The Secret to Scaling Concurrent Search Data Structures - David et al. 2015 Linked Lists, Hash Tables, Skip Lists, Binary Search Trees... these data structures are core to many programs. This paper studies such search data structures, supporting search, insert, and remove operations. In particular, the authors look at concurrent versions of these … Continue reading Asynchronized Concurrency: The Secret to Scaling Concurrent Search Data Structures
A Comprehensive study of Convergent and Commutative Replicated Data Types
A comprehensive study of Convergent and Commutative Replicated Data Types - Shapiro et al. 2011 This is the third of five Desert Island Paper choices from Jonas Bonér, and it continues the theme of avoiding coordination overhead in a principled manner whenever you can. As we saw yesterday, there are trade-offs between consistency, failure tolerance, … Continue reading A Comprehensive study of Convergent and Commutative Replicated Data Types
RIPQ: Advanced photo caching on flash for Facebook
RIPQ: Advanced Photo Caching on Flash for Facebook - Tang et al. 2015 It's three for the price of one with this paper: we get to deepen our understanding of the characteristics of flash, examine a number of priority queue and caching algorithms, and get a glimpse into what's behind an important part of Facebook's … Continue reading RIPQ: Advanced photo caching on flash for Facebook
Mergeable persistent data structures
Mergeable persistent data structures - Farinier et al. 2014 Irmin is part of the MirageOS project that was the subject of yesterday's paper, where it is also the basis for a Git-like persistent file system used for the OS. What if you could version-control a (mutable) persistent data structure, inspect its history, clone a remote … Continue reading Mergeable persistent data structures
A Hitchhiker’s Guide to Fast and Efficient Data Reconstruction in Erasure-coded Data Centers
A Hitchhiker's guide to fast and efficient data reconstruction in erasure-coded data centers - Rashmi et al. So far this week we've looked at a programming languages paper and a systems paper, so for today I thought it would be fun to look at an algorithm-based paper. HDFS enables horizontally scalable low-cost storage for the … Continue reading A Hitchhiker’s Guide to Fast and Efficient Data Reconstruction in Erasure-coded Data Centers