Declarative recursive computation on an RDBMS... or, why you should use a database for distributed machine learing Jankov et al., VLDB'19 If you think about a system like Procella that’s combining transactional and analytic workloads on top of a cloud-native architecture, extensions to SQL for streaming, dataflow based materialized views (see e.g. Naiad, Noria, Multiverses, … Continue reading Declarative recursive computation on an RDBMS
Tag: Datastores
Databases of all shapes and sizes.
Procella: unifying serving and analytical data at YouTube
Procella: unifying serving and analytical data at YouTube Chattopadhyay et al., VLDB'19 Academic papers aren’t usually set to music, but if they were the chorus of Queen’s "I want it all (and I want it now...)" seems appropriate here. Anchored in the primary use case of supporting Google’s YouTube business, what we’re looking at here … Continue reading Procella: unifying serving and analytical data at YouTube
Experiences with approximating queries in Microsoft’s production big-data clusters
Experiences with approximating queries in Microsoft’s production big-data clusters Kandula et al., VLDB'19 I’ve been excited about the potential for approximate query processing in analytic clusters for some time, and this paper describes its use at scale in production. Microsoft’s big data clusters have 10s of thousands of machines, and are used by thousands of … Continue reading Experiences with approximating queries in Microsoft’s production big-data clusters
IPA: invariant-preserving applications for weakly consistent replicated databases
IPA: invariant-preserving applications for weakly consistent replicated databases Balegas et al., VLDB'19 IPA for developers, happy days! Last we week looked at automating checks for invariant confluence, and extending the set of cases where we can show that an object is indeed invariant confluent. I’m not going to re-cover that background in this write-up, so … Continue reading IPA: invariant-preserving applications for weakly consistent replicated databases
Choosing a cloud DBMS: architectures and tradeoffs
Choosing a cloud DBMS: architectures and tradeoffs Tan et al., VLDB'19 If you’re moving an OLAP workload to the cloud (AWS in the context of this paper), what DBMS setup should you go with? There’s a broad set of choices including where you store the data, whether you run your own DBMS nodes or use … Continue reading Choosing a cloud DBMS: architectures and tradeoffs
Interactive checks for coordination avoidance
Interactive checks for coordination avoidance Whittaker & Hellerstein et al., VLDB'19 I am so pleased to see a database systems paper addressing the concerns of the application developer! To the developer, a strongly consistent system behaves exactly like a single-threaded system running on a single node, so reasoning about the behaviour of the system is … Continue reading Interactive checks for coordination avoidance
One SQL to rule them all: an efficient and syntactically idiomatic approach to management of streams and tables
One SQL to rule them all: an efficient and syntactically idiomatic approach to management of streams and tables Begoli et al., SIGMOD'19 In data processing it seems, all roads eventually lead back to SQL! Today’s paper choice is authored by a collection of experts from the Apache Beam, Apache Calcite, and Apache Flink projects, outlining … Continue reading One SQL to rule them all: an efficient and syntactically idiomatic approach to management of streams and tables
Fast key-value stores: an idea whose time has come and gone
Fast key-value stores: an idea whose time has come and gone Adya et al., HotOS'19 No controversy here! Adya et al. would like you to stop using Memcached and Redis, and start building 11-factor apps. Factor VI in the 12-factor app manifesto, "Execute the app as one or more stateless processes," to be dropped and … Continue reading Fast key-value stores: an idea whose time has come and gone
Towards multiverse databases
Towards multiverse databases Marzoev et al., HotOS'19 A typical backing store for a web application contains data for many users. The application makes queries on behalf of an authenticated user, but it is up to the application itself to make sure that the user only sees data they are entitled to see. Any frontend can … Continue reading Towards multiverse databases
Calvin: fast distributed transactions for partitioned database systems
Calvin: fast distributed transactions for partitioned database systems Thomson et al., SIGMOD'12 Earlier this week we looked at Amazon’s Aurora. Today it’s the turn of Calvin, which is notably used by FaunaDB (strictly “_FaunaDB uses patent-pending technology inspired by Calvin...”). As the paper title suggests, the goal of Calvin is to put the ACID back … Continue reading Calvin: fast distributed transactions for partitioned database systems