Omid reloaded: scalable and highly-available transaction processing

March 17, 2017November 11, 2019 ~ Adrian Colyer ~ Leave a comment

Omid, reloaded: scalable and highly-available transaction processing Shacham et al., FAST '17 Omid is a transaction processing service powering web-scale production systems at Yahoo that digest billions of events per day and push them into a real-time index. It's also been open-sourced and is currently incubating at Apache as the Apache Omid project. What's interesting ... Continue Reading

Enlightening the I/O path: A holistic approach for application performance

March 13, 2017November 11, 2019 ~ Adrian Colyer ~ 12 Comments

Enlightening the I/O Path: A holistic approach for application performance Kim et al., FAST '17 Lots of applications contain a mix of foreground and background tasks. Since we're at the file system level here, for application, think Redis, MongoDB, PostgreSQL and so on. Typically user requests are considered foreground tasks, and tasks such as housekeeping, ... Continue Reading

Explaining outputs in modern data analytics

February 1, 2017November 11, 2019 ~ Adrian Colyer ~ 9 Comments

Explaining outputs in modern data analytics Chothia et al. ETH Zurich Technical Report, 2016 Yesterday we touched on some of the difficulties of explanation in the context of machine learning, and last week we looked at some of the extensions to ExSPAN to track network provenance. Lest you be under any remaining misapprehension that explanation ... Continue Reading

How good are query optimizers, really?

January 30, 2017November 11, 2019 ~ Adrian Colyer ~ 2 Comments

How good are query optimizers, really? Leis et al., VLBD 2015 Last week we looked at cardinality estimation using index-based sampling, evaluated using the Join Order Benchmark. Today's choice is the paper that introduces the Join Order Benchmark (JOB) itself. It's a great evaluation paper, and along the way we'll learn a lot about mainstream ... Continue Reading

Cardinality estimation done right: index-based join sampling

January 27, 2017November 11, 2019 ~ Adrian Colyer ~ 2 Comments

Cardinality estimation done right: Index-based join sampling Cardinality estimation done right: Index-based join sampling Leis et al., CIDR 2017 Let's finish up our brief look at CIDR 2017 with something closer to the core of database systems research - query optimisation. For good background on this topic a great place to start is Selinger's 1979 ... Continue Reading

SnappyData: A unified cluster for streaming, transactions, and interactive analytics

January 18, 2017November 11, 2019 ~ Adrian Colyer ~ 4 Comments

SnappyData: A unified cluster for streaming, transactions, and interactive analytics Mozafari et al., CIDR 2017 Update: fixed broken paper link, thanks Zteve. On Monday we looked at Weld which showed how to combine disparate data processing and analytic frameworks using a common underlying IR. Yesterday we looked at Peloton that adapts to mixed OLTP and ... Continue Reading

Self-driving database management systems

January 17, 2017November 11, 2019 ~ Adrian Colyer ~ 3 Comments

Self-driving database management systems Pavlo et al., CIDR 2017 We've previously seen many papers looking into how distributed and database systems technologies can support machine learning workloads. Today's paper choice explores what happens when you do it the other way round - i.e., embed machine learning into a DBMS in order to continuously optimise its ... Continue Reading

Weld: A common runtime for high performance data analytics

January 16, 2017November 11, 2019 ~ Adrian Colyer ~ 8 Comments

Weld: A common runtime for high performance data analytics Palkar et al. CIDR 2017 This is the first in a series of posts looking at papers from CIDR 2017. See yesterday's post for my conference overview. We have a proliferation of data and analytics libraries and frameworks - for example, Spark, TensorFlow, MxNet, Numpy, Pandas, ... Continue Reading

Existential Consistency: Measuring and Understanding Consistency at Facebook

October 19, 2015 ~ Adrian Colyer ~ 2 Comments

Existential Consistency: Measuring and Understanding Consistency at Facebook - Lu et al. 2015 At the core of this paper is an analysis of the number of anomalies seen in Facebook's production system for clients of TAO, which is impressively low under normal operation (0.0004%) - to interpret that number of course, we'll have to dig ... Continue Reading

Fast Database Restarts at Facebook

September 14, 2015 ~ Adrian Colyer ~ Leave a comment

Fast Database Restarts at Facebook - Goel et al. 2014 In security, you're only as secure as your weakest link in the chain. When it comes to agility, you're only as fast as your slowest link in the chain. Updating and evolving a stateless middle tier is usually pretty quick, but what if you need ... Continue Reading

the morning paper

a random walk through Computer Science research, by Adrian Colyer
Made delightfully fast by strattic

Datastores