Procella: unifying serving and analytical data at YouTube Chattopadhyay et al., VLDB'19 Academic papers aren’t usually set to music, but if they were the chorus of Queen’s "I want it all (and I want it now...)" seems appropriate here. Anchored in the primary use case of supporting Google’s YouTube business, what we’re looking at here … Continue reading Procella: unifying serving and analytical data at YouTube
Category: Uncategorized
Experiences with approximating queries in Microsoft’s production big-data clusters
Experiences with approximating queries in Microsoft’s production big-data clusters Kandula et al., VLDB'19 I’ve been excited about the potential for approximate query processing in analytic clusters for some time, and this paper describes its use at scale in production. Microsoft’s big data clusters have 10s of thousands of machines, and are used by thousands of … Continue reading Experiences with approximating queries in Microsoft’s production big-data clusters
DDSketch: a fast and fully-mergeable quantile sketch with relative-error guarantees
DDSketch: a fast and fully-mergeable quantile sketch with relative-error guarantees Masson et al., VLDB'19 Datadog handles a ton of metrics - some customers have endpoints generating over 10M points per second! For response times (latencies) reporting a simple metric such as ‘average’ is next to useless. Instead we want to understand what’s happening at different … Continue reading DDSketch: a fast and fully-mergeable quantile sketch with relative-error guarantees
SLOG: serializable, low-latency, geo-replicated transactions
SLOG: serializable, low-latency, geo-replicated transactions Ren et al., VLDB'19 SLOG is another research system motivated by the needs of the application developer (aka, user!). Building correct applications is much easier when the system provides strict serializability guarantees. Strict serializability reduces application code complexity and bugs, since it behaves like a system that is running on … Continue reading SLOG: serializable, low-latency, geo-replicated transactions
IPA: invariant-preserving applications for weakly consistent replicated databases
IPA: invariant-preserving applications for weakly consistent replicated databases Balegas et al., VLDB'19 IPA for developers, happy days! Last we week looked at automating checks for invariant confluence, and extending the set of cases where we can show that an object is indeed invariant confluent. I’m not going to re-cover that background in this write-up, so … Continue reading IPA: invariant-preserving applications for weakly consistent replicated databases
Choosing a cloud DBMS: architectures and tradeoffs
Choosing a cloud DBMS: architectures and tradeoffs Tan et al., VLDB'19 If you’re moving an OLAP workload to the cloud (AWS in the context of this paper), what DBMS setup should you go with? There’s a broad set of choices including where you store the data, whether you run your own DBMS nodes or use … Continue reading Choosing a cloud DBMS: architectures and tradeoffs
Interactive checks for coordination avoidance
Interactive checks for coordination avoidance Whittaker & Hellerstein et al., VLDB'19 I am so pleased to see a database systems paper addressing the concerns of the application developer! To the developer, a strongly consistent system behaves exactly like a single-threaded system running on a single node, so reasoning about the behaviour of the system is … Continue reading Interactive checks for coordination avoidance
Snuba: automating weak supervision to label training data
Snuba: automating weak supervision to label training data Varma & Ré, VLDB 2019 This week we’re moving on from ICML to start looking at some of the papers from VLDB 2019. VLDB is a huge conference, and once again I have a problem because my shortlist of "that looks really interesting, I’d love to read … Continue reading Snuba: automating weak supervision to label training data
Learning to prove theorems via interacting with proof assistants
Learning to prove theorems via interacting with proof assistants Yang & Deng, ICML'19 Something a little different to end the week: deep learning meets theorem proving! It’s been a while since we gave formal methods some love on The Morning Paper, and this paper piqued my interest. You’ve probably heard of Coq, a proof management … Continue reading Learning to prove theorems via interacting with proof assistants
Statistical foundations of virtual democracy
Statiscal foundations of virtual democracy Kahng et al., ICML'19 This is another paper on the theme of combining information and making decisions in the face of noise and uncertainty - but the setting is quite different to those we’ve been looking at recently. Consider a food bank that receives donations of food and distributes it … Continue reading Statistical foundations of virtual democracy