Capturing and enhancing in situ system observability for failure detection Huang et al., OSDI'18 The central idea in this paper is simple and brilliant. The place where we have the most relevant information about the health of a process or thread is in the clients that call it. Today the state of the practice is … Continue reading Capturing and enhancing in situ system observability for failure detection
Category: Uncategorized
Automatic discovery of tactics in spatio-temporal soccer match data
Automatic discovery of tactics in spatio-temporal soccer match data Decroos et al., KDD'18 Here’s a fun paper to end the week. Data collection from sporting events is now widespread. This fuels an endless thirst for team and player statistics. In terms of football (which shall refer to the game of soccer throughout this write-up) that … Continue reading Automatic discovery of tactics in spatio-temporal soccer match data
Detecting spacecraft anomalies using LSTMs and nonparametric dynamic thresholding
Detecting spacecraft anomalies using LSTMs and nonparametric dynamic thresholding Hundman et al., KDD'18 How do you effectively monitor a spacecraft? That was the question facing NASA’s Jet Propulsion Laboratory as they looked forward towards exponentially increasing telemetry data rates for Earth Science satellites (e.g., around 85 terabytes/day for a Synthetic Aperture Radar satellite). Spacecraft are … Continue reading Detecting spacecraft anomalies using LSTMs and nonparametric dynamic thresholding
Online parameter selection for web-based ranking problems
Online parameter selection for web-based ranking problems Agarwal et al., KDD'18 Last week we looked at production systems from Facebook, Airbnb, and Snap Inc., today it’s the turned of LinkedIn. This paper describes the system and model that LinkedIn use to determine the items to be shown in a user’s feed: It replaces previous hand-tuning … Continue reading Online parameter selection for web-based ranking problems
I know you’ll be back: interpretable new user clustering and churn prediction on a mobile social application
I know you’ll be back: interpretable new user clustering and churn prediction on a mobile social application Yang et al., KDD'18 Churn rates (how fast users abandon your app / service) are really important in modelling a business. If the churn rate is too high, it’s hard to maintain growth. Since acquiring new customers is … Continue reading I know you’ll be back: interpretable new user clustering and churn prediction on a mobile social application
Customized regression model for Airbnb dynamic pricing
Customized regression model for Airbnb dynamic pricing Ye et al., KDD'18 This paper details the methods that Airbnb use to suggest prices to listing hosts (hosts ultimately remain in control of pricing on the Airbnb platform). The proposed strategy model has been deployed in production for more than 1 year at Airbnb. The launch of … Continue reading Customized regression model for Airbnb dynamic pricing
Rosetta: large scale system for text detection and recognition in images
Rosetta: large scale system for text detection and recognition in images Borisyuk et al., KDD'18 Rosetta is Facebook’s production system for extracting text (OCR) from uploaded images. In the last several years, the volume of photos being uploaded to social media platforms has grown exponentially to the order of hundreds of millions every day, presenting … Continue reading Rosetta: large scale system for text detection and recognition in images
Columnstore and B+ tree – are hybrid physical designs important?
Columnstore and B+ tree - are hybrid physical designs important? Dziedzic et al., SIGMOD'18 Earlier this week we looked at the design of column stores and their advantages for analytic workloads. What should you do though if you have a mixed workload including transaction processing, decision support, and operational analytics? Microsoft SQL Server supports hybrid … Continue reading Columnstore and B+ tree – are hybrid physical designs important?
The design and implementation of modern column-oriented database systems
The design and implementation of modern column-oriented database systems Abadi et al., Foundations and trends in databases, 2012 I came here by following the references in the Smoke paper we looked at earlier this week. "The design and implementation of modern column-oriented database systems" is a longer piece at 87 pages, but it’s good value-for-time. … Continue reading The design and implementation of modern column-oriented database systems
Smoke: fine-grained lineage at interactive speed
Smoke: fine-grained lineage at interactive speed Psallidas et al., VLDB'18 Data lineage connects the input and output data items of a computation. Given a set of output records, a backward lineage query selects a subset of the output records and asks "which input records contributed to these results?" A forward lineage query selects a subset … Continue reading Smoke: fine-grained lineage at interactive speed