Fine-grained, secure and efficient data provenance on blockchain systems

September 16, 2019May 25, 2020 ~ Adrian Colyer ~ 4 Comments

Fine-grained, secure and efficient data provenance on blockchain systems Ruan et al., VLDB'19 We haven’t covered a blockchain paper on The Morning Paper for a while, and today’s choice won the best paper award at VLDB’19. The goal here is to enable smart contracts to be written in which the contract logic depends on the ... Continue Reading

Declarative recursive computation on an RDBMS

September 13, 2019May 25, 2020 ~ Adrian Colyer ~ 2 Comments

Declarative recursive computation on an RDBMS... or, why you should use a database for distributed machine learing Jankov et al., VLDB'19 If you think about a system like Procella that’s combining transactional and analytic workloads on top of a cloud-native architecture, extensions to SQL for streaming, dataflow based materialized views (see e.g. Naiad, Noria, Multiverses, ... Continue Reading

Procella: unifying serving and analytical data at YouTube

September 11, 2019May 25, 2020 ~ Adrian Colyer ~ 25 Comments

Procella: unifying serving and analytical data at YouTube Chattopadhyay et al., VLDB'19 Academic papers aren’t usually set to music, but if they were the chorus of Queen’s "I want it all (and I want it now...)" seems appropriate here. Anchored in the primary use case of supporting Google’s YouTube business, what we’re looking at here ... Continue Reading

Experiences with approximating queries in Microsoft’s production big-data clusters

September 9, 2019May 25, 2020 ~ Adrian Colyer ~ 2 Comments

Experiences with approximating queries in Microsoft’s production big-data clusters Kandula et al., VLDB'19 I’ve been excited about the potential for approximate query processing in analytic clusters for some time, and this paper describes its use at scale in production. Microsoft’s big data clusters have 10s of thousands of machines, and are used by thousands of ... Continue Reading

DDSketch: a fast and fully-mergeable quantile sketch with relative-error guarantees

September 6, 2019May 25, 2020 ~ Adrian Colyer ~ 1 Comment

DDSketch: a fast and fully-mergeable quantile sketch with relative-error guarantees Masson et al., VLDB'19 Datadog handles a ton of metrics - some customers have endpoints generating over 10M points per second! For response times (latencies) reporting a simple metric such as ‘average’ is next to useless. Instead we want to understand what’s happening at different ... Continue Reading

SLOG: serializable, low-latency, geo-replicated transactions

September 4, 2019May 25, 2020 ~ Adrian Colyer ~ 1 Comment

SLOG: serializable, low-latency, geo-replicated transactions Ren et al., VLDB'19 SLOG is another research system motivated by the needs of the application developer (aka, user!). Building correct applications is much easier when the system provides strict serializability guarantees. Strict serializability reduces application code complexity and bugs, since it behaves like a system that is running on ... Continue Reading

IPA: invariant-preserving applications for weakly consistent replicated databases

September 2, 2019May 25, 2020 ~ Adrian Colyer ~ 2 Comments

IPA: invariant-preserving applications for weakly consistent replicated databases Balegas et al., VLDB'19 IPA for developers, happy days! Last we week looked at automating checks for invariant confluence, and extending the set of cases where we can show that an object is indeed invariant confluent. I’m not going to re-cover that background in this write-up, so ... Continue Reading

Choosing a cloud DBMS: architectures and tradeoffs

August 30, 2019May 25, 2020 ~ Adrian Colyer ~ 3 Comments

Choosing a cloud DBMS: architectures and tradeoffs Tan et al., VLDB'19 If you’re moving an OLAP workload to the cloud (AWS in the context of this paper), what DBMS setup should you go with? There’s a broad set of choices including where you store the data, whether you run your own DBMS nodes or use ... Continue Reading

Interactive checks for coordination avoidance

August 28, 2019May 25, 2020 ~ Adrian Colyer ~ 4 Comments

Interactive checks for coordination avoidance Whittaker & Hellerstein et al., VLDB'19 I am so pleased to see a database systems paper addressing the concerns of the application developer! To the developer, a strongly consistent system behaves exactly like a single-threaded system running on a single node, so reasoning about the behaviour of the system is ... Continue Reading

Snuba: automating weak supervision to label training data

August 26, 2019May 25, 2020 ~ Adrian Colyer ~ 1 Comment

Snuba: automating weak supervision to label training data Varma & Ré, VLDB 2019 This week we’re moving on from ICML to start looking at some of the papers from VLDB 2019. VLDB is a huge conference, and once again I have a problem because my shortlist of "that looks really interesting, I’d love to read ... Continue Reading

the morning paper

a random walk through Computer Science research, by Adrian Colyer
Made delightfully fast by strattic

Author: Adrian Colyer