Gray failure: the Achilles' heel of cloud-scale systems Huang et al., HotOS'17 If you're going to fail, fail properly dammit! All this limping along in degraded mode, doing your best to mask problems, turns out to be one of the key causes of major availability breakdowns and performance anomalies in cloud-scale systems. Today's HotOS'17 paper … Continue reading Gray failure: the Achilles’ heel of cloud-scale systems
Tag: Distributed Systems
Hybrids on Steroids: SGX-based high-performance BFT
Hybrids on Steroids: SGX-based high performance BFT Behl et al., EuroSys'17 Byzantine fault tolerance (BFT) is the kind of fault-tolerance designed to withstand not just process crashes and network problems, but also active adversaries trying to break the system, as well as storage and memory corruptions and so on. We've taken a look at BFT … Continue reading Hybrids on Steroids: SGX-based high-performance BFT
Online reconstruction of structural information from datacenter logs
Online reconstruction of structural information from datacenter logs Chothia et al., EuroSys'17 Today's choice brings together a couple of themes that we've previously looked at on The Morning Paper: recovering system information from log files, and dataflows for stream processing. On log files (and tracing), see for example Dapper, the MysteryMachine, lprof, and Pivot tracing. … Continue reading Online reconstruction of structural information from datacenter logs
An empirical study on the correctness of formally verified distributed systems
An empirical study on the correctness of formally verified distributed systems Fonseca et al., EuroSys'17 "Is your distributed system bug free?" "I formally verified it!" "Yes, but is your distributed system bug free?" There's a really important discussion running through this paper - what does it take to write bug-free systems software? I have a … Continue reading An empirical study on the correctness of formally verified distributed systems
Efficient memory disaggregation with Infiniswap
Efficient memory disaggregation with Infiniswap Gu et al., NSDI '17 If we move performance numbers onto a human scale (let 1ns of processor time = 1 second of human time) then it's easier to get an intuition - for me at least - of the relative cost of different operations. In this world, it takes … Continue reading Efficient memory disaggregation with Infiniswap
CherryPick: Adaptively unearthing the best cloud configurations for big data analytics
CherryPick: Adaptively unearthing the best cloud configurations for big data analytics Alipourfard et al., NSDI'17 For big data analytics jobs, especially recurring jobs, finding a good cloud configuration (number and type of machines, CPU, memory ,disk and network options) can make a big different to overall cost and runtimes. Likewise, a poor choice can seriously … Continue reading CherryPick: Adaptively unearthing the best cloud configurations for big data analytics
vCorfu: A cloud-scale object store on a shared log
vCorfu: A cloud-scale object store on a shared log Wei et al., NSDI'17 vCorfu builds on the idea of a distributed shared log that we looked at yesterday with CORFU, to construct a distributed object store. We show that vCorfu outperforms Cassandra, a popular state-of-the-art NoSQL store, while providing strong consistency (opacity, read-own-writes), efficient transactions, … Continue reading vCorfu: A cloud-scale object store on a shared log
Corfu: A distributed shared log
Corfu: A distributed shared log Balakrishnan et al., ACM TOCS, 2013 (If you experience any difficulty in accessing the pdf in the above link please let me know, it should be open for you on the ACM DL. UPDATE, many readers are still seeing a paywall for the above paper link, here's an alternative open … Continue reading Corfu: A distributed shared log
HopFS: Scaling hierarchical file system metadata using NewSQL databases
HopFS: Scaling hierarchical file system metadata using NewSQL databases Niazi et al., FAST 2017 If you're working with big data and Hadoop, this one paper could repay your investment in The Morning Paper many times over (ok, The Morning Paper is free - but you do pay with your time to read it). You know … Continue reading HopFS: Scaling hierarchical file system metadata using NewSQL databases
Incremental consistency guarantees for replicated objects
Incremental consistency guarantees for replicated objects Guerraoui et al., OSDI 2016 We know that there's a price to be paid for strong consistency in terms of higher latencies and reduced throughput. We also know that there's a price to be paid for weaker consistency in terms of application correctness and / or programmer difficulty. Furthermore, … Continue reading Incremental consistency guarantees for replicated objects