Moment-based quantile sketches for efficient high cardinality aggregation queries

Moment-based quantile sketches for efficient high cardinality aggregation queries Gan et al., VLDB'18 Today we’re temporarily pausing our tour through some of the OSDI’18 papers in order to look at a great sketch-based data structure for quantile queries over high-cardinality aggregates. That’s a bit of a mouthful so let’s jump straight into an example of … Continue reading Moment-based quantile sketches for efficient high cardinality aggregation queries

Noria: dynamic, partially-stateful data-flow for high-performance web applications

Noria: dynamic, partially-stateful data-flow for high-performance web applications Gjengset, Schwarzkopf et al., OSDI'18 I have way more margin notes for this paper than I typically do, and that’s a reflection of my struggle to figure out what kind of thing we’re dealing with here. Noria doesn’t want to fit neatly into any existing box! We’ve … Continue reading Noria: dynamic, partially-stateful data-flow for high-performance web applications

RobinHood: tail latency aware caching – dynamic reallocation from cache-rich to cache-poor

RobinHood: tail latency aware caching - dynamic reallocation from cache-rich to cache-poor Berger et al., OSDI'18 It’s time to rethink everything you thought you knew about caching! My mental model goes something like this: we have a set of items that probably follow a power-law of popularity. We have a certain finite cache capacity, and … Continue reading RobinHood: tail latency aware caching – dynamic reallocation from cache-rich to cache-poor

Maelstrom: mitigating datacenter-level disasters by draining interdependent traffic safely and efficiently

Maelstrom: mitigating datacenter-level disasters by draining interdependent traffic safely and efficiently Veeraraghavan et al., OSDI'18 Here’s a really valuable paper detailing four plus years of experience dealing with datacenter outages at Facebook. Maelstrom is the system Facebook use in production to mitigate and recover from datacenter-level disasters. The high level idea is simple: drain traffic … Continue reading Maelstrom: mitigating datacenter-level disasters by draining interdependent traffic safely and efficiently

LegoOS: a disseminated, distributed OS for hardware resource disaggregation

LegoOS: a disseminated, distributed OS for hardware resource disaggregation Shan et al., OSDI'18 One of the interesting trends in hardware is the proliferation and importance of dedicated accelerators as general purposes CPUs stopped benefitting from Moore’s law. At the same time we’ve seen networking getting faster and faster, causing us to rethink some of the … Continue reading LegoOS: a disseminated, distributed OS for hardware resource disaggregation

Orca: differential bug localization in large-scale services

Orca: differential bug localization in large-scale services Bhagwan et al., OSDI'18 Earlier this week we looked at REPT, the reverse debugging tool deployed live in the Windows Error Reporting service. Today it’s the turn of Orca, a bug localisation service that Microsoft have in production usage for six of their large online services. The focus … Continue reading Orca: differential bug localization in large-scale services

REPT: reverse debugging of failures in deployed software

REPT: reverse debugging of failures in deployed software Cui et al., OSDI'18 REPT (‘repeat’) won a best paper award at OSDI’18 this month. It addresses the problem of debugging crashes in production software, when all you have available is a memory dump. In particular, we’re talking about debugging Windows binaries. To effectively understand and fix … Continue reading REPT: reverse debugging of failures in deployed software