RPCValet: NI-driven tail-aware balancing of µs-scale RPCs

RPCValet: NI-driven tail-aware balancing of µs-scale RPCs Daglis et al., ASPLOS'19 Last week we learned about the [increased tail-latency sensitivity of microservices based applications with high RPC fan-outs. Seer uses estimates of queue depths to mitigate latency spikes on the order of 10-100ms, in conjunction with a cluster manager. Today’s paper choice, RPCValet, operates at … Continue reading RPCValet: NI-driven tail-aware balancing of µs-scale RPCs

Understanding real-world concurrency bugs in Go

Understanding real-world concurrency bugs in Go Tu, Liu et al., ASPLOS'19 The design of a programming (or data) model not only makes certain problems easier (or harder) to solve, but also makes certain classes of bugs easier (or harder) to create, detect, and subsequently fix. Today’s paper choice studies concurrency mechanisms in Go. Before we … Continue reading Understanding real-world concurrency bugs in Go

Seer: leveraging big data to navigate the complexity of performance debugging in cloud microservices

Seer: leveraging big data to navigate the complexity of performance debugging in cloud microservices Gan et al., ASPLOS'19 Last time around we looked at the DeathStarBench suite of microservices-based benchmark applications and learned that microservices systems can be especially latency sensitive, and that hotspots can propagate through a microservices architecture in interesting ways. Seer is … Continue reading Seer: leveraging big data to navigate the complexity of performance debugging in cloud microservices

An open-source benchmark suite for microservices and their hardware-software implications for cloud & edge systems

An open-source benchmark suite for microservices and their hardware-software implications for cloud & edge systems Gan et al., ASPLOS'19 Microservices are well known for producing ‘death star’ interaction diagrams like those shown below, where each point on the circumference represents an individual service, and the lines between them represent interactions. Systems built with lots of … Continue reading An open-source benchmark suite for microservices and their hardware-software implications for cloud & edge systems

Distributed consensus revised – Part III

Distributed consensus revised (part III) Howard, PhD thesis With all the ground work laid, the second half of the thesis progressively generalises the Paxos algorithm: weakening the quorum intersection requirements; reusing intersections to allow decisions to be reached with fewer participants; weakening the value selection rules; and sharing phases to take best advantage of the … Continue reading Distributed consensus revised – Part III

Distributed consensus revised – Part II

Distributed consensus revised (part II) Howard, PhD thesis In today’s post we’re going to be looking at chapter 3 of Dr Howard’s thesis, which is a tour ("systematisation of knowledge", SoK) of some of the major known revisions to the classic Paxos algorithm. Negative responses (NACKs) In classic Paxos acceptors only send replies to proposer … Continue reading Distributed consensus revised – Part II

Keeping master green at scale

Keeping master green at scale Ananthanarayanan et al., EuroSys'19 This paper provides a fascinating look at a key part of Uber’s software delivery machine. With a monorepo, and many thousands of engineers concurrently committing changes, keeping the build green, and keeping commit-to-live latencies low, is a major challenge. This paper introduces a change management system … Continue reading Keeping master green at scale

Teaching rigorous distributed systems with efficient model checking

Teaching rigorous distributed systems with efficient model checking Michael et al., EuroSys'19 On the surface you might think today’s paper selection an odd pick. It describes the labs environment, DSLabs, developed at the University of Washington to accompany a course in distributed systems. During the ten week course, students implement four different assignments: an exactly-once … Continue reading Teaching rigorous distributed systems with efficient model checking