Declarative recursive computation on an RDBMS

September 13, 2019October 10, 2019 ~ adriancolyer ~ 2 Comments

Declarative recursive computation on an RDBMS... or, why you should use a database for distributed machine learing Jankov et al., VLDB'19 If you think about a system like Procella that’s combining transactional and analytic workloads on top of a cloud-native architecture, extensions to SQL for streaming, dataflow based materialized views (see e.g. Naiad, Noria, Multiverses, … Continue reading Declarative recursive computation on an RDBMS

Procella: unifying serving and analytical data at YouTube

September 11, 2019September 8, 2019 ~ adriancolyer ~ 13 Comments

Procella: unifying serving and analytical data at YouTube Chattopadhyay et al., VLDB'19 Academic papers aren’t usually set to music, but if they were the chorus of Queen’s "I want it all (and I want it now...)" seems appropriate here. Anchored in the primary use case of supporting Google’s YouTube business, what we’re looking at here … Continue reading Procella: unifying serving and analytical data at YouTube

Experiences with approximating queries in Microsoft’s production big-data clusters

September 9, 2019September 8, 2019 ~ adriancolyer ~ 1 Comment

Experiences with approximating queries in Microsoft’s production big-data clusters Kandula et al., VLDB'19 I’ve been excited about the potential for approximate query processing in analytic clusters for some time, and this paper describes its use at scale in production. Microsoft’s big data clusters have 10s of thousands of machines, and are used by thousands of … Continue reading Experiences with approximating queries in Microsoft’s production big-data clusters

IPA: invariant-preserving applications for weakly consistent replicated databases

September 2, 2019September 1, 2019 ~ adriancolyer ~ 2 Comments

IPA: invariant-preserving applications for weakly consistent replicated databases Balegas et al., VLDB'19 IPA for developers, happy days! Last we week looked at automating checks for invariant confluence, and extending the set of cases where we can show that an object is indeed invariant confluent. I’m not going to re-cover that background in this write-up, so … Continue reading IPA: invariant-preserving applications for weakly consistent replicated databases

Choosing a cloud DBMS: architectures and tradeoffs

August 30, 2019August 24, 2019 ~ adriancolyer ~ 3 Comments

Choosing a cloud DBMS: architectures and tradeoffs Tan et al., VLDB'19 If you’re moving an OLAP workload to the cloud (AWS in the context of this paper), what DBMS setup should you go with? There’s a broad set of choices including where you store the data, whether you run your own DBMS nodes or use … Continue reading Choosing a cloud DBMS: architectures and tradeoffs

Interactive checks for coordination avoidance

August 28, 2019August 28, 2019 ~ adriancolyer ~ 4 Comments

Interactive checks for coordination avoidance Whittaker & Hellerstein et al., VLDB'19 I am so pleased to see a database systems paper addressing the concerns of the application developer! To the developer, a strongly consistent system behaves exactly like a single-threaded system running on a single node, so reasoning about the behaviour of the system is … Continue reading Interactive checks for coordination avoidance

One SQL to rule them all: an efficient and syntactically idiomatic approach to management of streams and tables

July 3, 2019July 3, 2019 ~ adriancolyer ~ 5 Comments

One SQL to rule them all: an efficient and syntactically idiomatic approach to management of streams and tables Begoli et al., SIGMOD'19 In data processing it seems, all roads eventually lead back to SQL! Today’s paper choice is authored by a collection of experts from the Apache Beam, Apache Calcite, and Apache Flink projects, outlining … Continue reading One SQL to rule them all: an efficient and syntactically idiomatic approach to management of streams and tables

Fast key-value stores: an idea whose time has come and gone

June 24, 2019June 21, 2019 ~ adriancolyer ~ 2 Comments

Fast key-value stores: an idea whose time has come and gone Adya et al., HotOS'19 No controversy here! Adya et al. would like you to stop using Memcached and Redis, and start building 11-factor apps. Factor VI in the 12-factor app manifesto, "Execute the app as one or more stateless processes," to be dropped and … Continue reading Fast key-value stores: an idea whose time has come and gone

Towards multiverse databases

June 17, 2019June 13, 2019 ~ adriancolyer ~ 16 Comments

Towards multiverse databases Marzoev et al., HotOS'19 A typical backing store for a web application contains data for many users. The application makes queries on behalf of an authenticated user, but it is up to the application itself to make sure that the user only sees data they are entitled to see. Any frontend can … Continue reading Towards multiverse databases

Calvin: fast distributed transactions for partitioned database systems

March 29, 2019March 29, 2019 ~ adriancolyer ~ 4 Comments

Calvin: fast distributed transactions for partitioned database systems Thomson et al., SIGMOD'12 Earlier this week we looked at Amazon’s Aurora. Today it’s the turn of Calvin, which is notably used by FaunaDB (strictly “_FaunaDB uses patent-pending technology inspired by Calvin...”). As the paper title suggests, the goal of Calvin is to put the ACID back … Continue reading Calvin: fast distributed transactions for partitioned database systems