Chronix: Long term storage and retrieval technology for anomaly detection in operational data

Chronix: Long term storage and retrieval technology for anomaly detection in operational data Lautenschlager et al., FAST 2017 Chronix (http://www.chronix.io/ ) is a time-series database optimised to support anomaly detection. It supports a multi-dimensional generic time series data model and has built-in high level functions for time series operations. Chronix also a scheme called "Date-Delta-Compaction" (DDC) … Continue reading Chronix: Long term storage and retrieval technology for anomaly detection in operational data

Redundancy does not imply fault tolerance: analysis of distributed storage reactions to single errors and corruptions

Redundancy does not imply fault tolerance: analysis of distributed storage reactions to single errors and corruptions Ganesan et al., FAST 2017 It's a tough life being the developer of a distributed datastore. Thanks to the wonderful work of Kyle Kingsbury (aka, @aphyr) and his efforts on Jepsen.io, awareness of data loss and related issues in … Continue reading Redundancy does not imply fault tolerance: analysis of distributed storage reactions to single errors and corruptions

HopFS: Scaling hierarchical file system metadata using NewSQL databases

HopFS: Scaling hierarchical file system metadata using NewSQL databases Niazi et al., FAST 2017 If you're working with big data and Hadoop, this one paper could repay your investment in The Morning Paper many times over (ok, The Morning Paper is free - but you do pay with your time to read it). You know … Continue reading HopFS: Scaling hierarchical file system metadata using NewSQL databases

How good are query optimizers, really?

How good are query optimizers, really? Leis et al., VLBD 2015 Last week we looked at cardinality estimation using index-based sampling, evaluated using the Join Order Benchmark. Today's choice is the paper that introduces the Join Order Benchmark (JOB) itself. It's a great evaluation paper, and along the way we'll learn a lot about mainstream … Continue reading How good are query optimizers, really?

Cardinality estimation done right: index-based join sampling

Cardinality estimation done right: Index-based join sampling Cardinality estimation done right: Index-based join sampling Leis et al., CIDR 2017 Let's finish up our brief look at CIDR 2017 with something closer to the core of database systems research - query optimisation. For good background on this topic a great place to start is Selinger's 1979 … Continue reading Cardinality estimation done right: index-based join sampling

Self-driving database management systems

Self-driving database management systems Pavlo et al., CIDR 2017 We've previously seen many papers looking into how distributed and database systems technologies can support machine learning workloads. Today's paper choice explores what happens when you do it the other way round - i.e., embed machine learning into a DBMS in order to continuously optimise its … Continue reading Self-driving database management systems

Generic attacks on secure outsourced databases

Generic Attacks on Secure Outsourced Databases Kellaris et al. CCS 2016 Here’s a really interesting paper that helps to set some boundaries around what we can expect from encrypted databases in the cloud. Independently of the details of any one system (or encryption scheme), the authors look at what data it is possible to recover … Continue reading Generic attacks on secure outsourced databases

Scaling Spark in the real world: performance and usability

Scaling Spark in the real world: performance and usability Armbrust et al. VLBD 2015 A short and easy paper from the Databricks team to end the week. Given the pace of development in the Apache Spark world, a paper published in 2015 about enhancements to Spark will of course be a little dated. But this … Continue reading Scaling Spark in the real world: performance and usability

Replex: A scalable, highly available multi-index data store

Replex: A scalable, highly available multi-index data store Tai et al. USENIX 2016 Today’s choice won a best paper award at USENIX this year. Replex addresses the problem of key-value stores in which you also want to have an efficient query capability by values other than the primary key. … NoSQL databases achieve scalability by … Continue reading Replex: A scalable, highly available multi-index data store