Amazon Aurora: on avoiding distributed consensus for I/Os, commits, and membership changes

March 27, 2019March 21, 2019 ~ adriancolyer ~ 4 Comments

Amazon Aurora: on avoiding distributed consensus for I/Os, commits, and membership changes, Verbitski et al., SIGMOD’18 This is a follow-up to the paper we looked at earlier this week on the design of Amazon Aurora. I’m going to assume a level of background knowledge from that work and skip over the parts of this paper … Continue reading Amazon Aurora: on avoiding distributed consensus for I/Os, commits, and membership changes

Amazon Aurora: design considerations for high throughput cloud-native relational databases

March 25, 2019March 21, 2019 ~ adriancolyer ~ 5 Comments

Amazon Aurora: design considerations for high throughput cloud-native relational databases Verbitski et al., SIGMOD'17 Werner Vogels recently published a blog post describing Amazon Aurora as their fastest growing service ever. That post provides a high level overview of Aurora and then links to two SIGMOD papers for further details. Also of note is the recent … Continue reading Amazon Aurora: design considerations for high throughput cloud-native relational databases

Veritas: shared verifiable databases and tables in the cloud

January 30, 2019January 24, 2019 ~ adriancolyer

Veritas: shared verifiable databases and tables in the cloud Allen et al., CIDR'19 Two (or more) parties want to transact based on the sharing of information (e.g. current offers). In order to have trust in the system and provide a foundation for resolving disputes, we’d like a tamperproof and immutable audit log of all shared … Continue reading Veritas: shared verifiable databases and tables in the cloud

The case for network-accelerated query processing

January 28, 2019January 24, 2019 ~ adriancolyer ~ 2 Comments

The case for network-accelerated query processing Lerner et al., CIDR'19 Datastores continue to advance on a number of fronts. Some of those that come to mind are adapting to faster networks (e.g. ‘FARM: Fast Remote Memory’) and persistent memory (see e.g. ‘Let’s talk about storage and recovery methods for non-volatile memory database systems’), deeply integrating … Continue reading The case for network-accelerated query processing

Design continuums and the path toward self-designing key-value stores that know and learn

January 21, 2019January 20, 2019 ~ adriancolyer ~ 3 Comments

Design continuums and the path toward self-designing key-value stores that know and learn Idreos et al., CIDR'19 We’ve seen systems that help to select the best data structure from a pre-defined set of choices (e.g. ‘Darwinian data structure selection’), systems that synthesise data structure implementations given an abstract specification (‘Generalized data structure synthesis’), systems that … Continue reading Design continuums and the path toward self-designing key-value stores that know and learn

Towards a hands-free query optimizer through deep learning

January 18, 2019January 12, 2019 ~ adriancolyer ~ 3 Comments

Towards a hands-free query optimizer through deep learning Marcus & Papaemmanouil, CIDR'19 Where the SageDB paper stopped— at the exploration of learned models to assist in query optimisation— today’s paper choice picks up, looking exclusively at the potential to apply learning (in this case deep reinforcement learning) to build a better optimiser. Why reinforcement learning? … Continue reading Towards a hands-free query optimizer through deep learning

SageDB: a learned database system

January 16, 2019January 12, 2019 ~ adriancolyer ~ 20 Comments

SageDB: a learned database system Kraska et al., CIDR'19 About this time last year, a paper entitled ‘The case for learned index structures’ (part I, part II) generated a lot of excitement and debate. Today’s paper choice builds on that foundation, putting forward a vision where learned models pervade every aspect of a database system. … Continue reading SageDB: a learned database system

ApproxJoin: approximate distributed joins

November 9, 2018November 8, 2018 ~ adriancolyer ~ 1 Comment

ApproxJoin: approximate distributed joins Le Quoc et al., SoCC'18 GitHub: https://ApproxJoin.github.io The join is a fundamental data processing operation and has been heavily optimised in relational databases. When you’re working with large volumes of unstructured data though, say with a data processing framework such as Flink or Spark, joins become distributed and much more expensive. … Continue reading ApproxJoin: approximate distributed joins

Sharding the shards: managing datastore locality at scale with Akkio

November 5, 2018November 4, 2018 ~ adriancolyer ~ 3 Comments

Sharding the shards: managing datastore locality at scale with Akkio Annamalai et al., OSDI'18 In Harry Potter, the Accio Summoning Charm summons an object to the caster of the spell, sometimes transporting it over a significant distance. In Facebook, Akkio summons data to a datacenter with the goal of improving data access locality for clients. … Continue reading Sharding the shards: managing datastore locality at scale with Akkio

Noria: dynamic, partially-stateful data-flow for high-performance web applications

October 29, 2018October 27, 2018 ~ adriancolyer ~ 3 Comments

Noria: dynamic, partially-stateful data-flow for high-performance web applications Gjengset, Schwarzkopf et al., OSDI'18 I have way more margin notes for this paper than I typically do, and that’s a reflection of my struggle to figure out what kind of thing we’re dealing with here. Noria doesn’t want to fit neatly into any existing box! We’ve … Continue reading Noria: dynamic, partially-stateful data-flow for high-performance web applications