Musketeer: all for one, one for all in data processing systems - Gog et al. 2015 For between 40-80% of the jobs submitted to MapReduce systems, you'd be better off just running them on a single machine... It was Eurosys 2015 last week, and a great new crop of papers were presented. Gog et al. … Continue reading Musketeer – Part I : What’s the best data processing system?
The Scalable Commutativity Rule: Designing Scalable Software for Multicore Processors
The Scalable Commutativity Rule: Designing Scalable Software for Multicore Processors - Clements et al. 2013 The way you design your interface (API) has a significant impact on the scalability you can achieve with any implementation. Clements et al. define the Scalable Commutativity Rule - which will look familiar to those who study distributed systems - … Continue reading The Scalable Commutativity Rule: Designing Scalable Software for Multicore Processors
From the Aether to the Ethernet – Attacking the Internet using Broadcast Digital Television
From the Aether to the Ethernet - Attacking the Internet using Broadcast Digital Television - Oren & Koremytis 2014 Before reading any further, please ensure you are in a carpeted area or other soft ground. Your jaw may hit the floor a few times when you hear what Oren & Koremytis have to tell us, … Continue reading From the Aether to the Ethernet – Attacking the Internet using Broadcast Digital Television
Distributed Snapshots: Determining Global States of Distributed Systems
Distributed Snapshots: Determining Global States of Distributed Systems - Chandy & Lamport 1985. What state is your distributed system in? In the absence of a universal clock, is that even a well-formed question? And if you could take a distributed snapshot of system state, would that be useful? Through an algorithm that has simply become … Continue reading Distributed Snapshots: Determining Global States of Distributed Systems
Declarative Interaction Design for Data Visualization
Declarative Interaction Design for Data Visualization - Satyanarayan et al. 2015 We've looked at the power of declarative approaches before when it comes to data and distribution (The Declarative Imperative, Bloom, Edelweiss, and of course let's not forget SQL itself!); today's paper applies a declarative approach to interactive data visualizations. With thanks to Dion Almaer … Continue reading Declarative Interaction Design for Data Visualization
Making Sense of Performance in Data Analytics Frameworks
Making Sense of Performance in Data Analytics Frameworks - Ousterhout et al. 2015 We all know the causes of poor performance in big data analytics workloads: network I/O, disk I/O, and straggler tasks. Ousterhout et al. set out to try and quantify this, and found that what we think we know isn't necessarily so. Yet … Continue reading Making Sense of Performance in Data Analytics Frameworks
Asynchronized Concurrency: The Secret to Scaling Concurrent Search Data Structures
Asynchronized Concurrency: The Secret to Scaling Concurrent Search Data Structures - David et al. 2015 Linked Lists, Hash Tables, Skip Lists, Binary Search Trees... these data structures are core to many programs. This paper studies such search data structures, supporting search, insert, and remove operations. In particular, the authors look at concurrent versions of these … Continue reading Asynchronized Concurrency: The Secret to Scaling Concurrent Search Data Structures
ApproxHadoop: Bringing Approximations to MapReduce Frameworks
ApproxHadoop: Bringing Approximations to MapReduce Frameworks - Goiri et al. 2015 Yesterday we saw how including networking concerns in scheduling decisions can increase throughput for MapReduce jobs (and Storm topologies) by ~30%. Today we look at an even more effective strategy for getting the most out of your Hadoop cluster: doing less work! On one … Continue reading ApproxHadoop: Bringing Approximations to MapReduce Frameworks
Cross-layer scheduling in cloud systems
Cross-layer scheduling in cloud systems - Alkaff et al. 2015 This paper was presented last month at the 2015 International Conference on Cloud Engineering, and explores what happens when you coordinate application scheduling with network route allocation via SDN (hence: cross-layer scheduling). With clusters of 30 nodes, the authors demonstrate results that can improve the … Continue reading Cross-layer scheduling in cloud systems
Mojim: A Reliable and Highly-Available Non-Volatile Memory System
Mojim: A Reliable and Highly-Available Non-Volatile Memory System - Zhang et al. 2015 This is the second in a series of posts looking at the latest research from the recently held ASPLOS 15 conference. It seems like we've been anticipating NVMM (Non-volatile main memory) for a while now; and there has been plenty of research … Continue reading Mojim: A Reliable and Highly-Available Non-Volatile Memory System