StreamScope: Continuous reliable distributed processing of big data streams

StreamScope: Continuous Reliable Distributed Processing of Big Data Streams - Lin et al. NSDI '16 An emerging trend in big data processing is to extract timely insights from continuous big data streams with distributed computation running on a large cluster of machines. Examples of such data streams include those from sensors, mobile devices, and on-line ... Continue Reading

Uncovering bugs in Distributed Storage Systems during Testing (not in production!)

Uncovering bugs in Distributed Storage Systems during Testing (not in production!) - Deligiannis et al. 2016 We interviewed technical leaders and senior managers in Microsoft Azure regarding the top problems in distributed system development. The consensus was that one of the most critical problems today is how to improve testing coverage so that bugs can ... Continue Reading

Helping Developers Help Themselves: Automatic Decomposition of Code Review Changes

Helping Developers Help Themselves: Automatic Decomposition of Code Review Changes - Barnett et al. 2015 Earlier this week we saw that pull requests with well organised commits are strongly preferred by integrators. Unfortunately, developers often make changes that incorporate multiple bug fixes, feature additions, refactorings, etc.. These result in changes that are both large and ... Continue Reading

FaRM: Fast Remote Memory

FaRM: Fast Remote Memory - Dragojevic, et al. 2014 Yesterday we looked at Facebook's graph store,TAO, that can handle a billion reads/sec and millions of writes/sec. In today's choice a team from Microsoft Research reimplemented TAO, and beat those numbers by an order of magnitude! FaRM’s per-machine throughput of 6.3 million operations per second is ... Continue Reading

WANalytics: Analytics for a geo-distributed, data intensive world

WANalytics: analytics for a geo-distributed data intensive world - Vulimiri et al. 2015 ...data is born distributed; we only control data replication and distributed execution strategies. This is true for so many sources of data. Combine this with Dave McCrory's observation that 'Data has Gravity' (i.e. it attracts applications and other data processing workloads to ... Continue Reading