HCloud: Resource-efficient provisioning in shared cloud systems - Delimitrou & Kozyrakis, ASPLOS '16 Do you use the public cloud? If so, I'm pretty confident you're going to find today's paper really interesting. Delimitrou & Kozyrakis study the provisioning strategies that provide the best balance between performance and cost. The sweet spot it turns out, is … Continue reading HCloud: Resource-efficient provisioning in shared cloud systems
Tag: Distributed Systems
Optimizing Distributed Actor Systems for Dynamic Interactive Services
Optimizing Distributed Actor Systems for Dynamic Interactive Services - Newell et al. 2016 I'm sure many of you have heard of the Orleans distributed actor system, that was used to build some of the systems supporting Microsoft's online Halo game. Halo Presence is an interactive application which implements presence services for a multi-player game running … Continue reading Optimizing Distributed Actor Systems for Dynamic Interactive Services
GeePS: Scalable deep learning on distributed GPUs with a GPU-specialized parameter server
GeePS: Scalable deep learning on distributed GPUs with a GPU-specialized parameter server - Cui et al. 2016 (EuroSys 2016) We know that deep learning is well suited to GPUs since it has inherent parallelism. But so far this has mostly been limited to either a single GPU (e.g. using Caffe) or to specially built distributed … Continue reading GeePS: Scalable deep learning on distributed GPUs with a GPU-specialized parameter server
Delta State Replicated Data Types
Delta State Replicated Data Types - Almeida et al. 2016 You know when you want to use CRDTs for their convergence properties, but the amount of state you're required to pass around gets out of hand? In this paper, Almeida et al. show how to retain the advantages of state-based CRDTs, but with much smaller … Continue reading Delta State Replicated Data Types
Maglev: A Fast and Reliable Software Network Load Balancer
Maglev: A Fast and Reliable Software Network Load Balancer - Eisenbud et al. 2016 Maglev is Google's software load balancer used within all their datacenters. It offers greater scalability and availability than hardware load balancers, enables quick iteration, and is much easier to upgrade. Maglev is a just another distributed system running on the commodity … Continue reading Maglev: A Fast and Reliable Software Network Load Balancer
Distributed TensorFlow with MPI
Distributed TensorFlow with MPI - Vishnu et al. 2016 A short early release paper to close out the week this week, which looks at how to support machine learning and data mining (MLDM) with Google's TensorFlow in a distributed setting. The paper also contains some good background on TensorFlow itself as well as MPI - … Continue reading Distributed TensorFlow with MPI
Distributed Consistency and Session Anomalies
Since we've spent the last couple of days sketching anomaly diagrams and looking at isolation levels, I wanted to finish the week off with a quick recap of session anomalies and consistency levels for distributed stores. In terms of papers, I've drawn primary material for this from: Highly Available Transactions: Virtues and Limitations, and Linearizability … Continue reading Distributed Consistency and Session Anomalies
The Heard-Of Model: Computing in Distributed Systems with Benign Failures
The Heard-Of Model: Computing in Distributed Systems with Benign Failures - Charron-Bost & Schiper2007 We briefly touched on the Heard-Of model last week when we looked at PSync. It's really very elegant, so today I thought it would be good to take a closer look. The traditional view of fault-tolerant distributed systems makes the following … Continue reading The Heard-Of Model: Computing in Distributed Systems with Benign Failures
Chapar: Certified Causally Consistent Distributed Key-Value Stores
Chapar: Certified Causally Consistent Distributed Key-Value Stores - Lesani et al. 2016 Another POPL '16 paper today. The Chapar framework provides for modular verification of causal consistency for both causally consistent key-value store implementations and for client programs written to use them. §1 also wins the prize for best use of emojis in a research … Continue reading Chapar: Certified Causally Consistent Distributed Key-Value Stores
‘Cause I’m Strong Enough: Reasoning About Consistency Choices in Distributed Systems
'Cause I'm Strong Enough: Reasoning About Consistency Choices in Distributed Systems - Gotsman et al. 2016 With apologies for the longer write-up today, I've tried to stick right to the heart of the matter, but even that takes quite some explanation... We've looked at the theme of coordination avoidance before - instead of uniformly applying … Continue reading ‘Cause I’m Strong Enough: Reasoning About Consistency Choices in Distributed Systems