Omid reloaded: scalable and highly-available transaction processing

Omid, reloaded: scalable and highly-available transaction processing Shacham et al., FAST '17 Omid is a transaction processing service powering web-scale production systems at Yahoo that digest billions of events per day and push them into a real-time index. It's also been open-sourced and is currently incubating at Apache as the Apache Omid project. What's interesting … Continue reading Omid reloaded: scalable and highly-available transaction processing

Dependency-driven analytics: a compass for uncharted data oceans

Dependency-driven analytics: a compass for uncharted data oceans Mavlyutov et al. CIDR 2017 Like yesterday's paper, today's paper considers what to do when you simply have too much data to be able to process it all. Forget data lakes, we're in data ocean territory now. This is a problem Microsoft faced with their large clusters … Continue reading Dependency-driven analytics: a compass for uncharted data oceans

SnappyData: A unified cluster for streaming, transactions, and interactive analytics

SnappyData: A unified cluster for streaming, transactions, and interactive analytics Mozafari et al., CIDR 2017 Update: fixed broken paper link, thanks Zteve. On Monday we looked at Weld which showed how to combine disparate data processing and analytic frameworks using a common underlying IR. Yesterday we looked at Peloton that adapts to mixed OLTP and … Continue reading SnappyData: A unified cluster for streaming, transactions, and interactive analytics

Weld: A common runtime for high performance data analytics

Weld: A common runtime for high performance data analytics Palkar et al. CIDR 2017 This is the first in a series of posts looking at papers from CIDR 2017. See yesterday's post for my conference overview. We have a proliferation of data and analytics libraries and frameworks - for example, Spark, TensorFlow, MxNet, Numpy, Pandas, … Continue reading Weld: A common runtime for high performance data analytics

Innovation, experience-based insight and vision at CIDR ’17

Last week was CIDR 2017, the biennial Conference on Innovative Data Systems Research. CIDR encourages authors to take a whole system perspective and especially values "innovation, experience-based insight, and vision." That's a very good match with the attributes of papers I like to cover on The Morning Paper. So what innovation, insight, and vision does … Continue reading Innovation, experience-based insight and vision at CIDR ’17

Optimizing Distributed Actor Systems for Dynamic Interactive Services

Optimizing Distributed Actor Systems for Dynamic Interactive Services - Newell et al. 2016 I'm sure many of you have heard of the Orleans distributed actor system, that was used to build some of the systems supporting Microsoft's online Halo game. Halo Presence is an interactive application which implements presence services for a multi-player game running … Continue reading Optimizing Distributed Actor Systems for Dynamic Interactive Services

GeePS: Scalable deep learning on distributed GPUs with a GPU-specialized parameter server

GeePS: Scalable deep learning on distributed GPUs with a GPU-specialized parameter server - Cui et al. 2016 (EuroSys 2016) We know that deep learning is well suited to GPUs since it has inherent parallelism. But so far this has mostly been limited to either a single GPU (e.g. using Caffe) or to specially built distributed … Continue reading GeePS: Scalable deep learning on distributed GPUs with a GPU-specialized parameter server