A higher order estimate of the optimum checkpoint interval for restart dumps

June 10, 2015 ~ Adrian Colyer ~ Leave a comment

A higher order estimate of the optimum checkpoint interval for restart dumps - Daly 2004 TL;DR: if you know how long it takes your system to create a checkpoint/snapshot (δ), and you know the expected mean-time between failures (M), then set the checkpoint interval to be √(2δM) - δ. OK, I grant that today's paper ... Continue Reading

Detecting Termination of Distributed Computations Using Markers

June 9, 2015 ~ Adrian Colyer ~ Leave a comment

Detecting Termination of Distributed Computations Using Markers - Misra 1983 There's an intriguing line in the Distributed GraphLab paper that caught my eye: "Termination is evaluated using distributed consensus algorithm described in [Ref]." Today's choice is the paper by Misra in 1983 that describes this distributed termination detection algorithm. The solution is similar in spirit ... Continue Reading

A Bridging Model for Parallel Computation

June 8, 2015 ~ Adrian Colyer ~ 5 Comments

A Bridging Model for Parallel Computation - Valiant 1990 We've seen a lot of references to the 'Bulk Synchronous Parallel' model over the last two weeks. When it was conceived by Valiant in 1990 though, it was intended as a much more general model than simply an abstraction to support graph processing. As the von ... Continue Reading

Scaling Concurrent Log-Structured Data Stores

April 30, 2015 ~ Adrian Colyer ~ 1 Comment

Scaling Concurrent Log-Structured Data Stores - Golan-Gueta et al. 2015 Key-value stores based on log-structured merge trees are everywhere. The original design was intended to mitigate slow disk I/O. Once this is achieved, as we scale to more and more cores the authors find that in-memory contention now becomes the bottleneck (see yesterday's piece on ... Continue Reading

Distributed Snapshots: Determining Global States of Distributed Systems

April 22, 2015 ~ Adrian Colyer ~ 13 Comments

Distributed Snapshots: Determining Global States of Distributed Systems - Chandy & Lamport 1985. What state is your distributed system in? In the absence of a universal clock, is that even a well-formed question? And if you could take a distributed snapshot of system state, would that be useful? Through an algorithm that has simply become ... Continue Reading

Asynchronized Concurrency: The Secret to Scaling Concurrent Search Data Structures

April 17, 2015 ~ Adrian Colyer ~ 1 Comment

Asynchronized Concurrency: The Secret to Scaling Concurrent Search Data Structures - David et al. 2015 Linked Lists, Hash Tables, Skip Lists, Binary Search Trees... these data structures are core to many programs. This paper studies such search data structures, supporting search, insert, and remove operations. In particular, the authors look at concurrent versions of these ... Continue Reading

A Comprehensive study of Convergent and Commutative Replicated Data Types

March 18, 2015 ~ Adrian Colyer ~ 16 Comments

A comprehensive study of Convergent and Commutative Replicated Data Types - Shapiro et al. 2011 This is the third of five Desert Island Paper choices from Jonas Bonér, and it continues the theme of avoiding coordination overhead in a principled manner whenever you can. As we saw yesterday, there are trade-offs between consistency, failure tolerance, ... Continue Reading

RIPQ: Advanced photo caching on flash for Facebook

February 27, 2015 ~ Adrian Colyer ~ Leave a comment

RIPQ: Advanced Photo Caching on Flash for Facebook - Tang et al. 2015 It's three for the price of one with this paper: we get to deepen our understanding of the characteristics of flash, examine a number of priority queue and caching algorithms, and get a glimpse into what's behind an important part of Facebook's ... Continue Reading

Mergeable persistent data structures

January 14, 2015 ~ Adrian Colyer ~ 3 Comments

Mergeable persistent data structures - Farinier et al. 2014 Irmin is part of the MirageOS project that was the subject of yesterday's paper, where it is also the basis for a Git-like persistent file system used for the OS. What if you could version-control a (mutable) persistent data structure, inspect its history, clone a remote ... Continue Reading

A Hitchhiker’s Guide to Fast and Efficient Data Reconstruction in Erasure-coded Data Centers

December 17, 2014 ~ Adrian Colyer ~ 2 Comments

A Hitchhiker's guide to fast and efficient data reconstruction in erasure-coded data centers - Rashmi et al. So far this week we've looked at a programming languages paper and a systems paper, so for today I thought it would be fun to look at an algorithm-based paper. HDFS enables horizontally scalable low-cost storage for the ... Continue Reading

the morning paper

a random walk through Computer Science research, by Adrian Colyer
Made delightfully fast by strattic

Algorithms and data structures