Seeing is believing: a client-centric specification of database isolation

November 30, 2020October 20, 2025 ~ adriancolyer

Seeing is believing: a client-centric specification of database isolation, Crooks et al., PODC’17. Last week we looked at Elle, which detects isolation anomalies by setting things up so that the inner workings of the database, in the form of the direct serialization graph (DSG), can be externally recovered. Today’s paper choice, ‘Seeing is believing’ also deals … Continue reading Seeing is believing: a client-centric specification of database isolation

Helios: hyperscale indexing for the cloud & edge – part 1

October 26, 2020October 19, 2025 ~ adriancolyer

Helios: hyperscale indexing for the cloud & edge, Potharaju et al., PVLDB’20 On the surface this is a paper about fast data ingestion from high-volume streams, with indexing to support efficient querying. As a production system within Microsoft capturing around a quadrillion events and indexing 16 trillion search keys per day it would be interesting in its own right, … Continue reading Helios: hyperscale indexing for the cloud & edge – part 1

Characterizing, modeling, and benchmarking RocksDB key-value workloads at Facebook

March 11, 2020March 8, 2020 ~ adriancolyer ~ 2 Comments

Characterizing, modeling, and benchmarking RocksDB key-value workloads at Facebook, Cao et al., FAST'20 You get good at what you practice. Or in the case of key-value stores, what you benchmark. So if you want to design a system that will offer good real-world performance, it's really useful to have benchmarks that accurately represent real-world workloads. … Continue reading Characterizing, modeling, and benchmarking RocksDB key-value workloads at Facebook

Building an elastic query engine on disaggregated storage

March 9, 2020March 8, 2020 ~ adriancolyer ~ 1 Comment

Building an elastic query engine on disaggregated storage, Vuppalapati, NSDI'20 This paper describes the design decisions behind the Snowflake cloud-based data warehouse. As the saying goes, 'all snowflakes are special' - but what is it exactly that's special about this one? When I think about cloud-native architectures, I think about disaggregation (enabling each resource type … Continue reading Building an elastic query engine on disaggregated storage

AnyLog: a grand unification of the Internet of things

February 24, 2020February 22, 2020 ~ adriancolyer ~ 3 Comments

AnyLog: a grand unification of the Internet of Things, Abadi et al., CIDR'20 The Web provides decentralised publishing and direct access to unstructured data (searching / querying that data has turned out to be a pretty centralised affair in practice though). AnyLog wants to do for structured (relational) data what the Web has done for … Continue reading AnyLog: a grand unification of the Internet of things

Extending relational query processing with ML inference

February 21, 2020February 16, 2020 ~ adriancolyer ~ 1 Comment

Extending relational query processing with ML inference, Karanasos, CIDR'10 This paper provides a little more detail on the concrete work that Microsoft is doing to embed machine learning inference inside an RDBMS, as part of their vision for Enterprise Grade Machine Learning. The motivation is not that inference will perform better inside the database, but … Continue reading Extending relational query processing with ML inference

Narrowing the gap between serverless and its state with storage functions

January 29, 2020January 24, 2020 ~ adriancolyer ~ 12 Comments

Narrowing the gap between serverless and its state with storage functions, Zhang et al., SoCC'19 "Narrowing the gap" was runner-up in the SoCC'19 best paper awards. While being motivated by serverless use cases, there's nothing especially serverless about the key-value store, Shredder, this paper reports on. Shredder's novelty lies in a new implementation of an … Continue reading Narrowing the gap between serverless and its state with storage functions

Benchmarking spreadsheet systems

December 6, 2019December 1, 2019 ~ adriancolyer ~ 14 Comments

Benchmarking spreadsheet systems Rahman et al., Preprint A recent TwThread drew my attention to this pre-print paper. When spreadsheets were originally conceived, data and formula were input by hand and so everything operated at human scale. Increasingly we’re dealing with larger and larger datasets — for example, data imported via csv files — and spreadsheets … Continue reading Benchmarking spreadsheet systems

Even more amazing papers at VLDB 2019 (that I didn’t have space to cover yet)

September 20, 2019September 15, 2019 ~ adriancolyer ~ 1 Comment

We’ve been covering papers from VLDB 2019 for the last three weeks, and next week it will be time to mix things up again. There were so many interesting papers at the conference this year though that I haven’t been able to cover nearly as many as I would like. So today’s post is a … Continue reading Even more amazing papers at VLDB 2019 (that I didn’t have space to cover yet)

Updating graph databases with Cypher

September 18, 2019September 15, 2019 ~ adriancolyer ~ 4 Comments

Updating graph databases with Cypher Green et al., VLDB'19 This is the story of a great collaboration between academia, industry, and users of the Cypher graph querying language as created by Neo4j. Beyond Neo4j, Cypher is also supported in SAP HANA Graph, RedisGraph, Agnes Graph, and Memgraph. Cypher for Apache Spark, and Cypher over Gremlin … Continue reading Updating graph databases with Cypher