On the duality of resilience and privacy - Crowcroft '15 Somewhat of a philosophical start to the week this week as Jon Crowcroft makes the argument for greater privacy through some of the same mechanisms that give systems greater resilience. Plus, it includes this quote: It is a truth universally acknowledged that centralized cloud services … Continue reading On the duality of resilience and privacy
Specialized Evolution of the General Purpose CPU
Specialized Evolution of the General Purpose CPU - Rajwar et. al. 2015 This is the last in a series of five posts highlighting papers from the recent CIDR'15 conference. Today's choice was the keynote talk. If you like this kind of subject matter, see also the excellent 'What's new in CPUs since the 80s and … Continue reading Specialized Evolution of the General Purpose CPU
Impala: a modern, open-source SQL engine for Hadoop
Impala: A modern, open-source SQL engine for Hadoop - Kornacker et al . 2015 (Cloudera*) This is post 4 of 5 in a series looking at the latest research from CIDR'15. Also in the series so far this week: 'The missing piece in complex analytics', 'WANalytics, analytics for a geo-distributed, data intensive world', and 'Liquid: … Continue reading Impala: a modern, open-source SQL engine for Hadoop
Liquid: Unifying nearline and offline big data integration
Liquid: Unifying Nearline and Offline Big Data Integration - Fernandez et al. 2015 This is post 3 of 5 in a series looking at the latest research from the CIDR '15 conference. Also in the series so far this week: 'The missing piece in complex analytics' and 'WANalytics: analytics for a geo-distributed, data intensive world'. … Continue reading Liquid: Unifying nearline and offline big data integration
WANalytics: Analytics for a geo-distributed, data intensive world
WANalytics: analytics for a geo-distributed data intensive world - Vulimiri et al. 2015 ...data is born distributed; we only control data replication and distributed execution strategies. This is true for so many sources of data. Combine this with Dave McCrory's observation that 'Data has Gravity' (i.e. it attracts applications and other data processing workloads to … Continue reading WANalytics: Analytics for a geo-distributed, data intensive world
The Missing Piece in Complex Analytics
The Missing Piece in Complex Analytics: Low latency scalable model management and serving with Velox - Crankshaw et al. 2015. Analytics at scale can be used to create statistical models for making predictions about the world, but once the data scientists and analysts have done their initial work and a model has been built and … Continue reading The Missing Piece in Complex Analytics
Introducing CIDR’15 week on The Morning Paper
The data systems research community are a smart bunch, although it's not their research and papers I'm referring to here. Many conferences move around, but not the Conference on Innovative Data Systems Research (CIDR). CIDR has found a rather nice venue "on the Pacific Ocean, just south of Monterey", and decided to stick there. Schedule … Continue reading Introducing CIDR’15 week on The Morning Paper
Recursive Programming
Recursive Programming - Dijkstra 1960 * Updated link to one that is not behind a paywall - thanks to Graham Markall for the catch * This paper deals with something we take so much for granted that it's hard to imagine a time when it had yet to be introduced to the world. That time … Continue reading Recursive Programming
On Distributed Communications Networks
On Distributed Communications Networks - Baran 1962 Way before it became fashionable to build large-scale distributed systems out of relatively unreliable commodity hardware, Baran was investigating the properties a network built in such a way might have. This paper is very much of its time, and all the more enjoyable for it: want to know … Continue reading On Distributed Communications Networks
Eraser: A dynamic data race detector for multi-threaded programs
Eraser: A dynamic data race detector for multi-threaded programs - Savage et al. 1997 Debugging a multithreaded program can be difficult. Simple errors in synchronization can produce timing-dependent data races that can take weeks or months to track down. Eraser dynamically detects data races in multi-threaded programs. There are two basic approaches to doing this, … Continue reading Eraser: A dynamic data race detector for multi-threaded programs