Procella: unifying serving and analytical data at YouTube

Procella: unifying serving and analytical data at YouTube Chattopadhyay et al., VLDB'19 Academic papers aren’t usually set to music, but if they were the chorus of Queen’s "I want it all (and I want it now...)" seems appropriate here. Anchored in the primary use case of supporting Google’s YouTube business, what we’re looking at here … Continue reading Procella: unifying serving and analytical data at YouTube

Experiences with approximating queries in Microsoft’s production big-data clusters

Experiences with approximating queries in Microsoft’s production big-data clusters Kandula et al., VLDB'19 I’ve been excited about the potential for approximate query processing in analytic clusters for some time, and this paper describes its use at scale in production. Microsoft’s big data clusters have 10s of thousands of machines, and are used by thousands of … Continue reading Experiences with approximating queries in Microsoft’s production big-data clusters

DDSketch: a fast and fully-mergeable quantile sketch with relative-error guarantees

DDSketch: a fast and fully-mergeable quantile sketch with relative-error guarantees Masson et al., VLDB'19 Datadog handles a ton of metrics - some customers have endpoints generating over 10M points per second! For response times (latencies) reporting a simple metric such as ‘average’ is next to useless. Instead we want to understand what’s happening at different … Continue reading DDSketch: a fast and fully-mergeable quantile sketch with relative-error guarantees

SLOG: serializable, low-latency, geo-replicated transactions

SLOG: serializable, low-latency, geo-replicated transactions Ren et al., VLDB'19 SLOG is another research system motivated by the needs of the application developer (aka, user!). Building correct applications is much easier when the system provides strict serializability guarantees. Strict serializability reduces application code complexity and bugs, since it behaves like a system that is running on … Continue reading SLOG: serializable, low-latency, geo-replicated transactions

IPA: invariant-preserving applications for weakly consistent replicated databases

IPA: invariant-preserving applications for weakly consistent replicated databases Balegas et al., VLDB'19 IPA for developers, happy days! Last we week looked at automating checks for invariant confluence, and extending the set of cases where we can show that an object is indeed invariant confluent. I’m not going to re-cover that background in this write-up, so … Continue reading IPA: invariant-preserving applications for weakly consistent replicated databases

Interactive checks for coordination avoidance

Interactive checks for coordination avoidance Whittaker & Hellerstein et al., VLDB'19 I am so pleased to see a database systems paper addressing the concerns of the application developer! To the developer, a strongly consistent system behaves exactly like a single-threaded system running on a single node, so reasoning about the behaviour of the system is … Continue reading Interactive checks for coordination avoidance

Snuba: automating weak supervision to label training data

Snuba: automating weak supervision to label training data Varma & Ré, VLDB 2019 This week we’re moving on from ICML to start looking at some of the papers from VLDB 2019. VLDB is a huge conference, and once again I have a problem because my shortlist of "that looks really interesting, I’d love to read … Continue reading Snuba: automating weak supervision to label training data

Learning to prove theorems via interacting with proof assistants

Learning to prove theorems via interacting with proof assistants Yang & Deng, ICML'19 Something a little different to end the week: deep learning meets theorem proving! It’s been a while since we gave formal methods some love on The Morning Paper, and this paper piqued my interest. You’ve probably heard of Coq, a proof management … Continue reading Learning to prove theorems via interacting with proof assistants

Statistical foundations of virtual democracy

Statiscal foundations of virtual democracy Kahng et al., ICML'19 This is another paper on the theme of combining information and making decisions in the face of noise and uncertainty - but the setting is quite different to those we’ve been looking at recently. Consider a food bank that receives donations of food and distributes it … Continue reading Statistical foundations of virtual democracy