Asynchronous Distributed Snapshots for Distributed Dataflows

Asynchronous Distributed Snapshots for Distributed Dataflows - Carbone et al. 2015 The team behind Apache Flink and data Artisans are a smart group of folks. Their recent blog post on High-throughput, low-latency, and exactly-once stream processing with Apache Flink is well worth reading and has a good description of the evolution of streaming architectures, the … Continue reading Asynchronous Distributed Snapshots for Distributed Dataflows

The Dataflow Model: A Practical Approach to Balancing Correctness, Latency, and Cost in Massive-Scale, Unbounded, Out-of-Order Data Processing

The Dataflow Model: A Practical Approach to Balancing Correctness, Latency, and Cost in Massive-Scale, Unbounded, Out-of-Order Data Processing - Akidau et al. (Google) - 2015 With thanks to William Vambenepe for suggesting this paper via twitter. Google Cloud Dataflow reached GA last week, and the team behind Cloud Dataflow have a paper accepted at VLDB'15 … Continue reading The Dataflow Model: A Practical Approach to Balancing Correctness, Latency, and Cost in Massive-Scale, Unbounded, Out-of-Order Data Processing

Lasp: A language for distributed, coordination-free programming

Lasp: A language for distributed, coordination-free programming - Meiklejohn & Van Roy 2015 * Update: fixed typo in Chris' surname above. * With thanks to Colin Barrett for suggesting today's choice, and to Chris Meiklejohn for providing a link to a paywall-free preprint of the paper. Christopher Meiklejohn recently announced he is leaving Basho to … Continue reading Lasp: A language for distributed, coordination-free programming

PerfBlower: Quickly Detecting Memory-Related Performance Problems via Amplification

PerfBlower: Quickly Detecting Memory-Related Performance Problems via Amplification - Fang et al. 2015 Another ECOOP '15 paper, and definitely something with immediate pragmatic utility. PerfBlower finds heap-related performance problems during regular test runs (not exhaustive performance tests) by amplifying the effects of small issues to make them visible. The user provides details of classes of … Continue reading PerfBlower: Quickly Detecting Memory-Related Performance Problems via Amplification

Streams à la carte: Extensible pipelines with object algebras

Streams à la carte: Extensible pipelines with object algebras - Biboudis et al. 2015 Streaming APIs are popping up everywhere, allowing the programmer to express streaming computations such as: int sum = IntStream.of(v) .filter(x -> x % 2 == 0) .map(x -> x * x) .sum(); On examining the streaming libraries in Java, Scala, and … Continue reading Streams à la carte: Extensible pipelines with object algebras