Sharing-aware outlier analytics over high-volume data streams

Sharing-aware outlier analytics over high-volume data streams Cao et al. SIGMOD 2016 With yesterday’s preliminaries on skyline queries out of the way, it’s time to turn our attention to the Sharing-aware Outlier Processing (SOP) algorithm of Cao et al. The challenge that SOP addresses is that of building a stream-based outlier detection system that can ... Continue Reading

Realtime data processing at Facebook

Realtime Data Processing at Facebook Chen et al. SIGMOD 2016 ‘Realtime Data Processing at Facebook’ provides us with a great high-level overview of the systems Facebook have built to support real-time workloads. At the heart of the paper is a set of five key design decisions for building such systems, together with an explanation of ... Continue Reading

StreamScope: Continuous reliable distributed processing of big data streams

StreamScope: Continuous Reliable Distributed Processing of Big Data Streams - Lin et al. NSDI '16 An emerging trend in big data processing is to extract timely insights from continuous big data streams with distributed computation running on a large cluster of machines. Examples of such data streams include those from sensors, mobile devices, and on-line ... Continue Reading

Asynchronous Complex Analytics in a Distributed Dataflow Architecture

Asynchronous Complex Analytics in a Distributed Dataflow Architecture - Gonzalez et al. 2015 Here's a theme we've seen before: the programming model offered by large scale distributed systems doesn't always lend itself to efficient algorithms for solving certain classes of problems. In today's paper, Gonzalez et al. examine the growing gap between efficient machine learning ... Continue Reading

The Dataflow Model: A Practical Approach to Balancing Correctness, Latency, and Cost in Massive-Scale, Unbounded, Out-of-Order Data Processing

The Dataflow Model: A Practical Approach to Balancing Correctness, Latency, and Cost in Massive-Scale, Unbounded, Out-of-Order Data Processing - Akidau et al. (Google) - 2015 With thanks to William Vambenepe for suggesting this paper via twitter. Google Cloud Dataflow reached GA last week, and the team behind Cloud Dataflow have a paper accepted at VLDB'15 ... Continue Reading