Self-driving database management systems

Self-driving database management systems Pavlo et al., CIDR 2017 We've previously seen many papers looking into how distributed and database systems technologies can support machine learning workloads. Today's paper choice explores what happens when you do it the other way round - i.e., embed machine learning into a DBMS in order to continuously optimise its … Continue reading Self-driving database management systems

Weld: A common runtime for high performance data analytics

Weld: A common runtime for high performance data analytics Palkar et al. CIDR 2017 This is the first in a series of posts looking at papers from CIDR 2017. See yesterday's post for my conference overview. We have a proliferation of data and analytics libraries and frameworks - for example, Spark, TensorFlow, MxNet, Numpy, Pandas, … Continue reading Weld: A common runtime for high performance data analytics

Innovation, experience-based insight and vision at CIDR ’17

Last week was CIDR 2017, the biennial Conference on Innovative Data Systems Research. CIDR encourages authors to take a whole system perspective and especially values "innovation, experience-based insight, and vision." That's a very good match with the attributes of papers I like to cover on The Morning Paper. So what innovation, insight, and vision does … Continue reading Innovation, experience-based insight and vision at CIDR ’17

Incremental consistency guarantees for replicated objects

Incremental consistency guarantees for replicated objects Guerraoui et al., OSDI 2016 We know that there's a price to be paid for strong consistency in terms of higher latencies and reduced throughput. We also know that there's a price to be paid for weaker consistency in terms of application correctness and / or programmer difficulty. Furthermore, … Continue reading Incremental consistency guarantees for replicated objects

Adaptive logging: optimizing logging and recovery costs in distributed in-memory databases

Adaptive Logging: Optimizing logging and recovery costs in distributed In-memory databases Yao et al., SIGMOD 2016 This is a paper about the trade-offs between transaction throughput and database recovery time. Intuitively for example, you can do a little more work on each transaction (lowering throughput) in order to reduce the time it takes to recover … Continue reading Adaptive logging: optimizing logging and recovery costs in distributed in-memory databases

Apache Hadoop YARN: Yet another resource negotiator

Apache Hadoop YARN: Yet Another Resource Negotiator Vavilapalli et al., SoCC 2013 The opening section of Prof. Demirbas' reading list is concerned with programming the datacenter, aka 'the Datacenter Operating System' - though I can't help but think of Mesosphere when I hear that latter phrase. There are four papers: in publication order these are … Continue reading Apache Hadoop YARN: Yet another resource negotiator

“A Distributed Systems Seminar Reading List,” Spring 2017 edition

Update: links giving 404s were too confusing, so I've removed links to not-yet published posts and will add them back in at the end of week! Last year we looked at Murat Demirbas' Distributed systems seminar reading list for Spring 2016. Now of course it's 2017 and Prof. Demirbas has a new list of papers … Continue reading “A Distributed Systems Seminar Reading List,” Spring 2017 edition

Strategic attentive writer for learning macro-actions

Strategic attentive writer for learning macro-actions Vezhnevets et al. (Google DeepMind), NIPS 2016 Baldrick may have a cunning plan, but most Deep Q Networks (DQNs) just react to what's immediately in front of them and what has come before. That is, at any given time step they propose the best action to take there and … Continue reading Strategic attentive writer for learning macro-actions