Automatic discovery of tactics in spatio-temporal soccer match data

Automatic discovery of tactics in spatio-temporal soccer match data Decroos et al., KDD'18 Here’s a fun paper to end the week. Data collection from sporting events is now widespread. This fuels an endless thirst for team and player statistics. In terms of football (which shall refer to the game of soccer throughout this write-up) that … Continue reading Automatic discovery of tactics in spatio-temporal soccer match data

Detecting spacecraft anomalies using LSTMs and nonparametric dynamic thresholding

Detecting spacecraft anomalies using LSTMs and nonparametric dynamic thresholding Hundman et al., KDD'18 How do you effectively monitor a spacecraft? That was the question facing NASA’s Jet Propulsion Laboratory as they looked forward towards exponentially increasing telemetry data rates for Earth Science satellites (e.g., around 85 terabytes/day for a Synthetic Aperture Radar satellite). Spacecraft are … Continue reading Detecting spacecraft anomalies using LSTMs and nonparametric dynamic thresholding

Online parameter selection for web-based ranking problems

Online parameter selection for web-based ranking problems Agarwal et al., KDD'18 Last week we looked at production systems from Facebook, Airbnb, and Snap Inc., today it’s the turned of LinkedIn. This paper describes the system and model that LinkedIn use to determine the items to be shown in a user’s feed: It replaces previous hand-tuning … Continue reading Online parameter selection for web-based ranking problems

I know you’ll be back: interpretable new user clustering and churn prediction on a mobile social application

I know you’ll be back: interpretable new user clustering and churn prediction on a mobile social application Yang et al., KDD'18 Churn rates (how fast users abandon your app / service) are really important in modelling a business. If the churn rate is too high, it’s hard to maintain growth. Since acquiring new customers is … Continue reading I know you’ll be back: interpretable new user clustering and churn prediction on a mobile social application

Customized regression model for Airbnb dynamic pricing

Customized regression model for Airbnb dynamic pricing Ye et al., KDD'18 This paper details the methods that Airbnb use to suggest prices to listing hosts (hosts ultimately remain in control of pricing on the Airbnb platform). The proposed strategy model has been deployed in production for more than 1 year at Airbnb. The launch of … Continue reading Customized regression model for Airbnb dynamic pricing

Unsupervised anomaly detection via variational auto-encoder for seasonal KPIs in web applications

Unsupervised anomaly detection via variational auto-encoder for seasonal KPIs in web applications Xu et al., WWW'18 (If you don’t have ACM Digital Library access, the paper can be accessed either by following the link above directly from The Morning Paper blog site, or from the WWW 2018 proceedings page). Today’s paper examines the problem of … Continue reading Unsupervised anomaly detection via variational auto-encoder for seasonal KPIs in web applications

Dynamic word embeddings for evolving semantic discovery

Dynamic word embeddings for evolving semantic discovery Yao et al., WSDM’18 One of the most popular posts on this blog is my introduction to word embeddings with word2vec (‘The amazing power of word vectors’). In today’s paper choice Yao et al. introduce a lovely extension that enables you to track how the meaning of words … Continue reading Dynamic word embeddings for evolving semantic discovery

Can you trust the trend? Discovering Simpson’s paradoxes in social data

Can you trust the trend? Discovering Simpson’s paradoxes in social data Alipourfard et al., WSDM’18 In ‘Same stats, different graphs,’ we saw some compelling examples of how summary statistics can hide important underlying patterns in data. Today’s paper choice shows how you can detect instances of Simpson’s paradox, thus revealing the presence of interesting subgroups, … Continue reading Can you trust the trend? Discovering Simpson’s paradoxes in social data

Putting data in the driver’s seat: optimising earnings for on-demand ride hailing

Putting data in the driver’s seat: optimising earnings for on-demand ride hailing Chaudhari et al., WSDM’18 (The link above is to the ACM Digital Library official version, which may not grant you access when clicked in your email client, but should do if you visit via the blog itself.) There is something deeply rooted in … Continue reading Putting data in the driver’s seat: optimising earnings for on-demand ride hailing

Tracing fake news footprints: characterizing social media messages by how they propagate

Tracing fake news footprints: characterizing social media messages by how they propagate Wu & Liu, WSDM’18 This week we’ll be looking at some of the papers from WSDM’18. To kick things off I’ve chosen a paper tackling the problem of detecting fake news on social media. One of the challenges here is that fake news … Continue reading Tracing fake news footprints: characterizing social media messages by how they propagate