Declarative recursive computation on an RDBMS

Declarative recursive computation on an RDBMS... or, why you should use a database for distributed machine learing Jankov et al., VLDB'19 If you think about a system like Procella that’s combining transactional and analytic workloads on top of a cloud-native architecture, extensions to SQL for streaming, dataflow based materialized views (see e.g. Naiad, Noria, Multiverses, ... Continue Reading

Experiences with approximating queries in Microsoft’s production big-data clusters

Experiences with approximating queries in Microsoft’s production big-data clusters Kandula et al., VLDB'19 I’ve been excited about the potential for approximate query processing in analytic clusters for some time, and this paper describes its use at scale in production. Microsoft’s big data clusters have 10s of thousands of machines, and are used by thousands of ... Continue Reading

IPA: invariant-preserving applications for weakly consistent replicated databases

IPA: invariant-preserving applications for weakly consistent replicated databases Balegas et al., VLDB'19 IPA for developers, happy days! Last we week looked at automating checks for invariant confluence, and extending the set of cases where we can show that an object is indeed invariant confluent. I’m not going to re-cover that background in this write-up, so ... Continue Reading

One SQL to rule them all: an efficient and syntactically idiomatic approach to management of streams and tables

One SQL to rule them all: an efficient and syntactically idiomatic approach to management of streams and tables Begoli et al., SIGMOD'19 In data processing it seems, all roads eventually lead back to SQL! Today’s paper choice is authored by a collection of experts from the Apache Beam, Apache Calcite, and Apache Flink projects, outlining ... Continue Reading

Calvin: fast distributed transactions for partitioned database systems

Calvin: fast distributed transactions for partitioned database systems Thomson et al., SIGMOD'12 Earlier this week we looked at Amazon’s Aurora. Today it’s the turn of Calvin, which is notably used by FaunaDB (strictly “_FaunaDB uses patent-pending technology inspired by Calvin...”). As the paper title suggests, the goal of Calvin is to put the ACID back ... Continue Reading