Gray failure: the Achilles’ heel of cloud-scale systems

Gray failure: the Achilles' heel of cloud-scale systems Huang et al., HotOS'17 If you're going to fail, fail properly dammit! All this limping along in degraded mode, doing your best to mask problems, turns out to be one of the key causes of major availability breakdowns and performance anomalies in cloud-scale systems. Today's HotOS'17 paper … Continue reading Gray failure: the Achilles’ heel of cloud-scale systems

Hybrids on Steroids: SGX-based high-performance BFT

Hybrids on Steroids: SGX-based high performance BFT Behl et al., EuroSys'17 Byzantine fault tolerance (BFT) is the kind of fault-tolerance designed to withstand not just process crashes and network problems, but also active adversaries trying to break the system, as well as storage and memory corruptions and so on. We've taken a look at BFT … Continue reading Hybrids on Steroids: SGX-based high-performance BFT

Online reconstruction of structural information from datacenter logs

Online reconstruction of structural information from datacenter logs Chothia et al., EuroSys'17 Today's choice brings together a couple of themes that we've previously looked at on The Morning Paper: recovering system information from log files, and dataflows for stream processing. On log files (and tracing), see for example Dapper, the MysteryMachine, lprof, and Pivot tracing. … Continue reading Online reconstruction of structural information from datacenter logs

An empirical study on the correctness of formally verified distributed systems

An empirical study on the correctness of formally verified distributed systems Fonseca et al., EuroSys'17 "Is your distributed system bug free?" "I formally verified it!" "Yes, but is your distributed system bug free?" There's a really important discussion running through this paper - what does it take to write bug-free systems software? I have a … Continue reading An empirical study on the correctness of formally verified distributed systems

CherryPick: Adaptively unearthing the best cloud configurations for big data analytics

CherryPick: Adaptively unearthing the best cloud configurations for big data analytics Alipourfard et al., NSDI'17 For big data analytics jobs, especially recurring jobs, finding a good cloud configuration (number and type of machines, CPU, memory ,disk and network options) can make a big different to overall cost and runtimes. Likewise, a poor choice can seriously … Continue reading CherryPick: Adaptively unearthing the best cloud configurations for big data analytics

vCorfu: A cloud-scale object store on a shared log

vCorfu: A cloud-scale object store on a shared log Wei et al., NSDI'17 vCorfu builds on the idea of a distributed shared log that we looked at yesterday with CORFU, to construct a distributed object store. We show that vCorfu outperforms Cassandra, a popular state-of-the-art NoSQL store, while providing strong consistency (opacity, read-own-writes), efficient transactions, … Continue reading vCorfu: A cloud-scale object store on a shared log

HopFS: Scaling hierarchical file system metadata using NewSQL databases

HopFS: Scaling hierarchical file system metadata using NewSQL databases Niazi et al., FAST 2017 If you're working with big data and Hadoop, this one paper could repay your investment in The Morning Paper many times over (ok, The Morning Paper is free - but you do pay with your time to read it). You know … Continue reading HopFS: Scaling hierarchical file system metadata using NewSQL databases

Incremental consistency guarantees for replicated objects

Incremental consistency guarantees for replicated objects Guerraoui et al., OSDI 2016 We know that there's a price to be paid for strong consistency in terms of higher latencies and reduced throughput. We also know that there's a price to be paid for weaker consistency in terms of application correctness and / or programmer difficulty. Furthermore, … Continue reading Incremental consistency guarantees for replicated objects