Dependency-driven analytics: a compass for uncharted data oceans

Dependency-driven analytics: a compass for uncharted data oceans Mavlyutov et al. CIDR 2017 Like yesterday's paper, today's paper considers what to do when you simply have too much data to be able to process it all. Forget data lakes, we're in data ocean territory now. This is a problem Microsoft faced with their large clusters … Continue reading Dependency-driven analytics: a compass for uncharted data oceans

Prioritizing attention in fast data: principles and promise

Prioritizing attention in fast data: principles and promise Bailis et al., CIDR 2017 Today it's two for the price of one as we get a life lesson in addition to a wonderfully thought-provoking piece of research. I'm sure you'd all agree that we're drowning in information - so much content being pumped out all of … Continue reading Prioritizing attention in fast data: principles and promise

SnappyData: A unified cluster for streaming, transactions, and interactive analytics

SnappyData: A unified cluster for streaming, transactions, and interactive analytics Mozafari et al., CIDR 2017 Update: fixed broken paper link, thanks Zteve. On Monday we looked at Weld which showed how to combine disparate data processing and analytic frameworks using a common underlying IR. Yesterday we looked at Peloton that adapts to mixed OLTP and … Continue reading SnappyData: A unified cluster for streaming, transactions, and interactive analytics

Self-driving database management systems

Self-driving database management systems Pavlo et al., CIDR 2017 We've previously seen many papers looking into how distributed and database systems technologies can support machine learning workloads. Today's paper choice explores what happens when you do it the other way round - i.e., embed machine learning into a DBMS in order to continuously optimise its … Continue reading Self-driving database management systems

Weld: A common runtime for high performance data analytics

Weld: A common runtime for high performance data analytics Palkar et al. CIDR 2017 This is the first in a series of posts looking at papers from CIDR 2017. See yesterday's post for my conference overview. We have a proliferation of data and analytics libraries and frameworks - for example, Spark, TensorFlow, MxNet, Numpy, Pandas, … Continue reading Weld: A common runtime for high performance data analytics

Innovation, experience-based insight and vision at CIDR ’17

Last week was CIDR 2017, the biennial Conference on Innovative Data Systems Research. CIDR encourages authors to take a whole system perspective and especially values "innovation, experience-based insight, and vision." That's a very good match with the attributes of papers I like to cover on The Morning Paper. So what innovation, insight, and vision does … Continue reading Innovation, experience-based insight and vision at CIDR ’17

Incremental consistency guarantees for replicated objects

Incremental consistency guarantees for replicated objects Guerraoui et al., OSDI 2016 We know that there's a price to be paid for strong consistency in terms of higher latencies and reduced throughput. We also know that there's a price to be paid for weaker consistency in terms of application correctness and / or programmer difficulty. Furthermore, … Continue reading Incremental consistency guarantees for replicated objects