One SQL to rule them all: an efficient and syntactically idiomatic approach to management of streams and tables

One SQL to rule them all: an efficient and syntactically idiomatic approach to management of streams and tables Begoli et al., SIGMOD'19 In data processing it seems, all roads eventually lead back to SQL! Today‚Äôs paper choice is authored by a collection of experts from the Apache Beam, Apache Calcite, and Apache Flink projects, outlining … Continue reading One SQL to rule them all: an efficient and syntactically idiomatic approach to management of streams and tables

Enabling signal processing over data streams

Enabling signal processing over data streams Nikolic et al., SIGMOD '17 If you're processing data coming from networks of sensors and devices, then it's not uncommon to use a mix of relational and signal processing operations. Data analysts use relational operators, for example, to group signals by different data sources or join signals with historical … Continue reading Enabling signal processing over data streams

Complete event trend detection in high-rate data streams

Complete Event Trend detection in high-rate event streams Poppe et al., SIGMOD'17 Today's paper choice looks at the tricky problem of detecting Complete Event Trends (CET) in high-rate event streams. CET detection is useful in fraud detection, health care analytics, stock trend analytics and other similar scenarios looking for complex patterns in event streams. Detecting … Continue reading Complete event trend detection in high-rate data streams

Dhalion: self-regulating stream processing in Heron

Dhalion: Self-regulating stream processing in Heron Floratou et al., VLDB 2017 Dhalion follows on nicely from yesterday's paper looking at the modular architecture of Heron, and aims to reduce the "complexity of configuring, managing, and deploying" streaming applications. In particular, streaming applications deployed as Heron topologies, although the authors are keen to point out the … Continue reading Dhalion: self-regulating stream processing in Heron

Explaining outputs in modern data analytics

Explaining outputs in modern data analytics Chothia et al. ETH Zurich Technical Report, 2016 Yesterday we touched on some of the difficulties of explanation in the context of machine learning, and last week we looked at some of the extensions to ExSPAN to track network provenance. Lest you be under any remaining misapprehension that explanation … Continue reading Explaining outputs in modern data analytics

Prioritizing attention in fast data: principles and promise

Prioritizing attention in fast data: principles and promise Bailis et al., CIDR 2017 Today it's two for the price of one as we get a life lesson in addition to a wonderfully thought-provoking piece of research. I'm sure you'd all agree that we're drowning in information - so much content being pumped out all of … Continue reading Prioritizing attention in fast data: principles and promise