Seeing is believing: a client-centric specification of database isolation, Crooks et al., PODC’17. Last week we looked at Elle, which detects isolation anomalies by setting things up so that the inner workings of the database, in the form of the direct serialization graph (DSG), can be externally recovered. Today’s paper choice, ‘Seeing is believing’ also deals … Continue reading Seeing is believing: a client-centric specification of database isolation
Tag: Datastores
Databases of all shapes and sizes.
Helios: hyperscale indexing for the cloud & edge – part 1
Helios: hyperscale indexing for the cloud & edge, Potharaju et al., PVLDB’20 On the surface this is a paper about fast data ingestion from high-volume streams, with indexing to support efficient querying. As a production system within Microsoft capturing around a quadrillion events and indexing 16 trillion search keys per day it would be interesting in its own right, … Continue reading Helios: hyperscale indexing for the cloud & edge – part 1
Characterizing, modeling, and benchmarking RocksDB key-value workloads at Facebook
Characterizing, modeling, and benchmarking RocksDB key-value workloads at Facebook, Cao et al., FAST'20 You get good at what you practice. Or in the case of key-value stores, what you benchmark. So if you want to design a system that will offer good real-world performance, it's really useful to have benchmarks that accurately represent real-world workloads. … Continue reading Characterizing, modeling, and benchmarking RocksDB key-value workloads at Facebook
Building an elastic query engine on disaggregated storage
Building an elastic query engine on disaggregated storage, Vuppalapati, NSDI'20 This paper describes the design decisions behind the Snowflake cloud-based data warehouse. As the saying goes, 'all snowflakes are special' - but what is it exactly that's special about this one? When I think about cloud-native architectures, I think about disaggregation (enabling each resource type … Continue reading Building an elastic query engine on disaggregated storage
AnyLog: a grand unification of the Internet of things
AnyLog: a grand unification of the Internet of Things, Abadi et al., CIDR'20 The Web provides decentralised publishing and direct access to unstructured data (searching / querying that data has turned out to be a pretty centralised affair in practice though). AnyLog wants to do for structured (relational) data what the Web has done for … Continue reading AnyLog: a grand unification of the Internet of things
Extending relational query processing with ML inference
Extending relational query processing with ML inference, Karanasos, CIDR'10 This paper provides a little more detail on the concrete work that Microsoft is doing to embed machine learning inference inside an RDBMS, as part of their vision for Enterprise Grade Machine Learning. The motivation is not that inference will perform better inside the database, but … Continue reading Extending relational query processing with ML inference
Narrowing the gap between serverless and its state with storage functions
Narrowing the gap between serverless and its state with storage functions, Zhang et al., SoCC'19 "Narrowing the gap" was runner-up in the SoCC'19 best paper awards. While being motivated by serverless use cases, there's nothing especially serverless about the key-value store, Shredder, this paper reports on. Shredder's novelty lies in a new implementation of an … Continue reading Narrowing the gap between serverless and its state with storage functions
Benchmarking spreadsheet systems
Benchmarking spreadsheet systems Rahman et al., Preprint A recent TwThread drew my attention to this pre-print paper. When spreadsheets were originally conceived, data and formula were input by hand and so everything operated at human scale. Increasingly we’re dealing with larger and larger datasets — for example, data imported via csv files — and spreadsheets … Continue reading Benchmarking spreadsheet systems
Even more amazing papers at VLDB 2019 (that I didn’t have space to cover yet)
We’ve been covering papers from VLDB 2019 for the last three weeks, and next week it will be time to mix things up again. There were so many interesting papers at the conference this year though that I haven’t been able to cover nearly as many as I would like. So today’s post is a … Continue reading Even more amazing papers at VLDB 2019 (that I didn’t have space to cover yet)
Updating graph databases with Cypher
Updating graph databases with Cypher Green et al., VLDB'19 This is the story of a great collaboration between academia, industry, and users of the Cypher graph querying language as created by Neo4j. Beyond Neo4j, Cypher is also supported in SAP HANA Graph, RedisGraph, Agnes Graph, and Memgraph. Cypher for Apache Spark, and Cypher over Gremlin … Continue reading Updating graph databases with Cypher