Seeing is believing: a client-centric specification of database isolation

November 30, 2020November 27, 2020 ~ Adrian Colyer ~ Leave a comment

Seeing is believing: a client-centric specification of database isolation, Crooks et al., PODC’17. Last week we looked at Elle, which detects isolation anomalies by setting things up so that the inner workings of the database, in the form of the direct serialization graph (DSG), can be externally recovered. Today’s paper choice, ‘Seeing is believing’ also ... Continue Reading

Helios: hyperscale indexing for the cloud & edge – part 1

October 26, 2020October 25, 2020 ~ Adrian Colyer ~ Leave a comment

Helios: hyperscale indexing for the cloud & edge, Potharaju et al., PVLDB'20 On the surface this is a paper about fast data ingestion from high-volume streams, with indexing to support efficient querying. As a production system within Microsoft capturing around a quadrillion events and indexing 16 trillion search keys per day it would be interesting ... Continue Reading

Characterizing, modeling, and benchmarking RocksDB key-value workloads at Facebook

March 11, 2020May 25, 2020 ~ Adrian Colyer ~ Leave a comment

Characterizing, modeling, and benchmarking RocksDB key-value workloads at Facebook, Cao et al., FAST'20 You get good at what you practice. Or in the case of key-value stores, what you benchmark. So if you want to design a system that will offer good real-world performance, it's really useful to have benchmarks that accurately represent real-world workloads. ... Continue Reading

Building an elastic query engine on disaggregated storage

March 9, 2020May 25, 2020 ~ Adrian Colyer ~ 1 Comment

Building an elastic query engine on disaggregated storage, Vuppalapati, NSDI'20 This paper describes the design decisions behind the Snowflake cloud-based data warehouse. As the saying goes, 'all snowflakes are special' - but what is it exactly that's special about this one? When I think about cloud-native architectures, I think about disaggregation (enabling each resource type ... Continue Reading

AnyLog: a grand unification of the Internet of things

February 24, 2020May 25, 2020 ~ Adrian Colyer ~ 3 Comments

AnyLog: a grand unification of the Internet of Things, Abadi et al., CIDR'20 The Web provides decentralised publishing and direct access to unstructured data (searching / querying that data has turned out to be a pretty centralised affair in practice though). AnyLog wants to do for structured (relational) data what the Web has done for ... Continue Reading

Extending relational query processing with ML inference

February 21, 2020May 25, 2020 ~ Adrian Colyer ~ 1 Comment

Extending relational query processing with ML inference, Karanasos, CIDR'10 This paper provides a little more detail on the concrete work that Microsoft is doing to embed machine learning inference inside an RDBMS, as part of their vision for Enterprise Grade Machine Learning. The motivation is not that inference will perform better inside the database, but ... Continue Reading

Narrowing the gap between serverless and its state with storage functions

January 29, 2020May 25, 2020 ~ Adrian Colyer ~ 12 Comments

Narrowing the gap between serverless and its state with storage functions, Zhang et al., SoCC'19 "Narrowing the gap" was runner-up in the SoCC'19 best paper awards. While being motivated by serverless use cases, there's nothing especially serverless about the key-value store, Shredder, this paper reports on. Shredder's novelty lies in a new implementation of an ... Continue Reading

Benchmarking spreadsheet systems

December 6, 2019May 25, 2020 ~ Adrian Colyer ~ 14 Comments

Benchmarking spreadsheet systems Rahman et al., Preprint A recent [tweet thread from Aditya Parameswaran]TwThread drew my attention to this pre-print paper. When spreadsheets were originally conceived, data and formula were input by hand and so everything operated at human scale. Increasingly we’re dealing with larger and larger datasets — for example, data imported via csv ... Continue Reading

Even more amazing papers at VLDB 2019 (that I didn’t have space to cover yet)

September 20, 2019May 25, 2020 ~ Adrian Colyer ~ 2 Comments

We’ve been covering papers from VLDB 2019 for the last three weeks, and next week it will be time to mix things up again. There were so many interesting papers at the conference this year though that I haven’t been able to cover nearly as many as I would like. So today’s post is a ... Continue Reading

Updating graph databases with Cypher

September 18, 2019May 25, 2020 ~ Adrian Colyer ~ 3 Comments

Updating graph databases with Cypher Green et al., VLDB'19 This is the story of a great collaboration between academia, industry, and users of the Cypher graph querying language as created by Neo4j. Beyond Neo4j, Cypher is also supported in SAP HANA Graph, RedisGraph, Agnes Graph, and Memgraph. Cypher for Apache Spark, and Cypher over Gremlin ... Continue Reading

the morning paper

a random walk through Computer Science research, by Adrian Colyer
Made delightfully fast by strattic

Datastores