Schools where I live are now breaking up for summer, and it’s time for The Morning Paper summer recess too. Over the last term, we’ve covered 67 papers and a broad range of topics. InfoQ are kindly working on another “Quarterly Review” publication (see here for the previous edition). As ever it’s hard to choose … Continue reading End of Term, and the power of compound interest
Month: July 2016
Time-adaptive sketches (Ada sketches) for summarizing data streams
Time-adaptive sketches (Ada Sketches) for Summarizing Data Streams Shrivastava et al. SIGMOD 2016 More algorithm fun today, and again in the context of data streams. It’s the 3 V’s of big data, but not as you know it: Volume, Velocity, and Var… Volatility. Volatility here refers to changing patterns in the data over time, and … Continue reading Time-adaptive sketches (Ada sketches) for summarizing data streams
Range thresholding on streams
Range thresholding on streams Qiao et al. SIGMOD 2016 It’s another streaming paper today, also looking at how to efficiently handle a large volume of concurrent queries over a stream, and also claiming a significant performance breakthrough of several orders of magnitude. We’re looking at a different type of query though, known as a range … Continue reading Range thresholding on streams
Sharing-aware outlier analytics over high-volume data streams
Sharing-aware outlier analytics over high-volume data streams Cao et al. SIGMOD 2016 With yesterday’s preliminaries on skyline queries out of the way, it’s time to turn our attention to the Sharing-aware Outlier Processing (SOP) algorithm of Cao et al. The challenge that SOP addresses is that of building a stream-based outlier detection system that can … Continue reading Sharing-aware outlier analytics over high-volume data streams
Progressive skyline computation in database systems
Progressive skyline computation in database systems Papadias et al. SIGMOD 2003 I’m still working through some of the papers from SIGMOD 2016 (as some of you spotted, that was the unifying them for last week). But today I’m jumping back to 2003 to provide some context for a streaming analytics paper we’ll be looking at … Continue reading Progressive skyline computation in database systems
Spheres of influence for more effective viral marketing
Spheres of influence for more effective viral marketing Mehmood et al. SIGMOD ’16 In viral marketing the idea is to spread awareness of a brand or campaign by exploiting pre-existing social networks. The received wisdom is that by targeting a few influential individuals, they will be able to spread your marketing message to a large … Continue reading Spheres of influence for more effective viral marketing
DBSherlock: A performance diagnostic tool for transactional databases
DBSherlock: A performance diagnostic tool for transactional databases Yoon et al. SIGMOD ’16 …tens of thousands of concurrent transactions competing for the same resources (e.g. CPU, disk I/O, memory) can create highly non-linear and counter-intuitive effects on database performance. If you’re a DBA responsible for figuring out what’s going on, this presents quite a challenge. … Continue reading DBSherlock: A performance diagnostic tool for transactional databases
Ambry: LinkedIn’s scalable geo-distributed object store
Ambry: LinkedIn’s scalable geo-distributed object store Noghabi et al. SIGMOD ’16 Ambry is LinkedIn’s blob store, designed to handle the demands of a modern social network: Hundreds of millions of users continually upload and view billions of diverse massive media objects, from photos and videos to documents. These large media objects, called blobs, are uploaded … Continue reading Ambry: LinkedIn’s scalable geo-distributed object store
Goods: organizing Google’s datasets
Goods: organizing Google’s datasets Havely et al. SIGMOD 2016 You can (try and) build a data cathedral. Or you can build a data bazaar. By data cathedral I’m referring to a centralised Enterprise Data Management solution that everyone in the company buys into and pays homage to, making a pilgrimage to the EDM every time … Continue reading Goods: organizing Google’s datasets
Realtime data processing at Facebook
Realtime Data Processing at Facebook Chen et al. SIGMOD 2016 ‘Realtime Data Processing at Facebook’ provides us with a great high-level overview of the systems Facebook have built to support real-time workloads. At the heart of the paper is a set of five key design decisions for building such systems, together with an explanation of … Continue reading Realtime data processing at Facebook