Chronix: Long term storage and retrieval technology for anomaly detection in operational data Lautenschlager et al., FAST 2017 Chronix (http://www.chronix.io/ ) is a time-series database optimised to support anomaly detection. It supports a multi-dimensional generic time series data model and has built-in high level functions for time series operations. Chronix also a scheme called "Date-Delta-Compaction" (DDC) … Continue reading Chronix: Long term storage and retrieval technology for anomaly detection in operational data
Tag: Datastores
Databases of all shapes and sizes.
Redundancy does not imply fault tolerance: analysis of distributed storage reactions to single errors and corruptions
Redundancy does not imply fault tolerance: analysis of distributed storage reactions to single errors and corruptions Ganesan et al., FAST 2017 It's a tough life being the developer of a distributed datastore. Thanks to the wonderful work of Kyle Kingsbury (aka, @aphyr) and his efforts on Jepsen.io, awareness of data loss and related issues in … Continue reading Redundancy does not imply fault tolerance: analysis of distributed storage reactions to single errors and corruptions
HopFS: Scaling hierarchical file system metadata using NewSQL databases
HopFS: Scaling hierarchical file system metadata using NewSQL databases Niazi et al., FAST 2017 If you're working with big data and Hadoop, this one paper could repay your investment in The Morning Paper many times over (ok, The Morning Paper is free - but you do pay with your time to read it). You know … Continue reading HopFS: Scaling hierarchical file system metadata using NewSQL databases
How good are query optimizers, really?
How good are query optimizers, really? Leis et al., VLBD 2015 Last week we looked at cardinality estimation using index-based sampling, evaluated using the Join Order Benchmark. Today's choice is the paper that introduces the Join Order Benchmark (JOB) itself. It's a great evaluation paper, and along the way we'll learn a lot about mainstream … Continue reading How good are query optimizers, really?
Cardinality estimation done right: index-based join sampling
Cardinality estimation done right: Index-based join sampling Cardinality estimation done right: Index-based join sampling Leis et al., CIDR 2017 Let's finish up our brief look at CIDR 2017 with something closer to the core of database systems research - query optimisation. For good background on this topic a great place to start is Selinger's 1979 … Continue reading Cardinality estimation done right: index-based join sampling
Ground: A data context service
Ground: A Data Context Service Hellerstein et al. , CIDR 2017 An unfortunate consequence of the disaggregated nature of contemporary data systems is the lack of a standard mechanism to assemble a collective understanding of the origin, scope, and usage of the data they manage. Put more bluntly, many organisations have only a fuzzy picture … Continue reading Ground: A data context service
Self-driving database management systems
Self-driving database management systems Pavlo et al., CIDR 2017 We've previously seen many papers looking into how distributed and database systems technologies can support machine learning workloads. Today's paper choice explores what happens when you do it the other way round - i.e., embed machine learning into a DBMS in order to continuously optimise its … Continue reading Self-driving database management systems
Generic attacks on secure outsourced databases
Generic Attacks on Secure Outsourced Databases Kellaris et al. CCS 2016 Here’s a really interesting paper that helps to set some boundaries around what we can expect from encrypted databases in the cloud. Independently of the details of any one system (or encryption scheme), the authors look at what data it is possible to recover … Continue reading Generic attacks on secure outsourced databases
Scaling Spark in the real world: performance and usability
Scaling Spark in the real world: performance and usability Armbrust et al. VLBD 2015 A short and easy paper from the Databricks team to end the week. Given the pace of development in the Apache Spark world, a paper published in 2015 about enhancements to Spark will of course be a little dated. But this … Continue reading Scaling Spark in the real world: performance and usability
Replex: A scalable, highly available multi-index data store
Replex: A scalable, highly available multi-index data store Tai et al. USENIX 2016 Today’s choice won a best paper award at USENIX this year. Replex addresses the problem of key-value stores in which you also want to have an efficient query capability by values other than the primary key. … NoSQL databases achieve scalability by … Continue reading Replex: A scalable, highly available multi-index data store