Fine-grained, secure and efficient data provenance on blockchain systems Ruan et al., VLDB'19 We haven’t covered a blockchain paper on The Morning Paper for a while, and today’s choice won the best paper award at VLDB’19. The goal here is to enable smart contracts to be written in which the contract logic depends on the … Continue reading Fine-grained, secure and efficient data provenance on blockchain systems
Author: adriancolyer
Declarative recursive computation on an RDBMS
Declarative recursive computation on an RDBMS... or, why you should use a database for distributed machine learing Jankov et al., VLDB'19 If you think about a system like Procella that’s combining transactional and analytic workloads on top of a cloud-native architecture, extensions to SQL for streaming, dataflow based materialized views (see e.g. Naiad, Noria, Multiverses, … Continue reading Declarative recursive computation on an RDBMS
Procella: unifying serving and analytical data at YouTube
Procella: unifying serving and analytical data at YouTube Chattopadhyay et al., VLDB'19 Academic papers aren’t usually set to music, but if they were the chorus of Queen’s "I want it all (and I want it now...)" seems appropriate here. Anchored in the primary use case of supporting Google’s YouTube business, what we’re looking at here … Continue reading Procella: unifying serving and analytical data at YouTube
Experiences with approximating queries in Microsoft’s production big-data clusters
Experiences with approximating queries in Microsoft’s production big-data clusters Kandula et al., VLDB'19 I’ve been excited about the potential for approximate query processing in analytic clusters for some time, and this paper describes its use at scale in production. Microsoft’s big data clusters have 10s of thousands of machines, and are used by thousands of … Continue reading Experiences with approximating queries in Microsoft’s production big-data clusters
DDSketch: a fast and fully-mergeable quantile sketch with relative-error guarantees
DDSketch: a fast and fully-mergeable quantile sketch with relative-error guarantees Masson et al., VLDB'19 Datadog handles a ton of metrics - some customers have endpoints generating over 10M points per second! For response times (latencies) reporting a simple metric such as ‘average’ is next to useless. Instead we want to understand what’s happening at different … Continue reading DDSketch: a fast and fully-mergeable quantile sketch with relative-error guarantees
SLOG: serializable, low-latency, geo-replicated transactions
SLOG: serializable, low-latency, geo-replicated transactions Ren et al., VLDB'19 SLOG is another research system motivated by the needs of the application developer (aka, user!). Building correct applications is much easier when the system provides strict serializability guarantees. Strict serializability reduces application code complexity and bugs, since it behaves like a system that is running on … Continue reading SLOG: serializable, low-latency, geo-replicated transactions
IPA: invariant-preserving applications for weakly consistent replicated databases
IPA: invariant-preserving applications for weakly consistent replicated databases Balegas et al., VLDB'19 IPA for developers, happy days! Last we week looked at automating checks for invariant confluence, and extending the set of cases where we can show that an object is indeed invariant confluent. I’m not going to re-cover that background in this write-up, so … Continue reading IPA: invariant-preserving applications for weakly consistent replicated databases
Choosing a cloud DBMS: architectures and tradeoffs
Choosing a cloud DBMS: architectures and tradeoffs Tan et al., VLDB'19 If you’re moving an OLAP workload to the cloud (AWS in the context of this paper), what DBMS setup should you go with? There’s a broad set of choices including where you store the data, whether you run your own DBMS nodes or use … Continue reading Choosing a cloud DBMS: architectures and tradeoffs
Interactive checks for coordination avoidance
Interactive checks for coordination avoidance Whittaker & Hellerstein et al., VLDB'19 I am so pleased to see a database systems paper addressing the concerns of the application developer! To the developer, a strongly consistent system behaves exactly like a single-threaded system running on a single node, so reasoning about the behaviour of the system is … Continue reading Interactive checks for coordination avoidance
Snuba: automating weak supervision to label training data
Snuba: automating weak supervision to label training data Varma & Ré, VLDB 2019 This week we’re moving on from ICML to start looking at some of the papers from VLDB 2019. VLDB is a huge conference, and once again I have a problem because my shortlist of "that looks really interesting, I’d love to read … Continue reading Snuba: automating weak supervision to label training data