PSync: A Partially Synchronous Language for Fault-Tolerant Distributed Algorithms

PSync: A Partially Synchronous Language for Fault-Tolerant Distributed Algorithms - Drăgoi et al. 2016 Last month we looked at the RAMCloud team's design pattern for building distributed, concurrent, fault-tolerant modules. Today's paper goes one step beyond a pattern, and introduces a domain-specific language called PSync with the goal of unifying the modeling, programming, and verification … Continue reading PSync: A Partially Synchronous Language for Fault-Tolerant Distributed Algorithms

Panopticon: An Omniscient Lock Broker for Efficient Distributed Transactions in the Datacenter

Panopticon: An Omniscient Lock Broker for Efficient Distributed Transactions in the Datacenter - Tasci & Demirbas, 2015 Today we return to the theme of distributed transactions, and a paper that won a best paper award from IEEE Big Data in 2015. Panopticon is a centralized lock broker (like Chubby and ZooKeeper) that manages distributed (decentralized) … Continue reading Panopticon: An Omniscient Lock Broker for Efficient Distributed Transactions in the Datacenter

Petuum: A New Platform for Distributed Machine Learning on Big Data

Petuum: A New Platform for Distributed Machine Learning on Big Data - Xing et al. 2015 How do you perform machine learning with big models (big here could be 100s of billions of parameters!) over big data sets (terabytes or petabytes)? Take for example state of the art image recognition systems that have embraced large-scale … Continue reading Petuum: A New Platform for Distributed Machine Learning on Big Data

Experience with Rules-Based Programming for Distributed Concurrent Fault-Tolerant Code

Experience with Rules-Based Programming for Distributed, Concurrent, Fault-Tolerant Code - Stutsman et al. 2015 As we saw in yesterday's paper, the authors of RAMCloud settled on a very effective design pattern for writing distributed, concurrent, fault-tolerant (DCFT) modules within their system. They call this pattern 'rules-based programming' - a collection of (condition,action) pairs that can … Continue reading Experience with Rules-Based Programming for Distributed Concurrent Fault-Tolerant Code

FIT: A Distributed Database Performance Trade-off

FIT: A Distributed Database Performance Trade-off - Faleiro & Abadi, 2015 If the CAP FITs... This paper presents the FIT trade-off for distributed transactions: you can have any two of Fairness, (strong) Isolation, and Throughput, but not all three. Which also implies you can have both strong isolation and high throughput! As a consequence of … Continue reading FIT: A Distributed Database Performance Trade-off

Minimizing Faulty Executions of Distributed Systems

Minimizing Faulty Executions of Distributed Systems - Scott et al. Now that we've spent a couple of days looking at test case minimizing for sequential systems, we're ready to tackle Colin Scott et al.'s paper on doing the same for executions of distributed systems. This is the paper that describes the core system behind Colin's … Continue reading Minimizing Faulty Executions of Distributed Systems