Popularity prediction of Facebook videos for higher quality streaming Tang et al., USENIX ATC’17 Suppose I could grant you access to a clairvoyance service, which could make one class of predictions about your business for you with perfect accuracy. What would you want to know, and what difference would knowing that make to your business? … Continue reading Popularity predictions of Facebook videos for higher quality streaming
Tag: Data Science
Same stats, different graphs: generating datasets with varied appearance and identical statistics through simulated annealing
Same stats, different graphs: generating datasets with varied appearance and identical statistics through simulated annealing Matejka & Fitzmaurice et al., CHI’17 Today’s paper choice is inspired by the keynote that Prof. Miriah Meyer gave at the recent Velocity conference in London, ‘Why an interactive picture is worth a thousand numbers.’ She made a wonderful and … Continue reading Same stats, different graphs: generating datasets with varied appearance and identical statistics through simulated annealing
Detecting credential spearphishing attacks in enterprise settings
Detecting credential spearphishing attacks in enterprise settings Ho et al., USENIX Security 2017 The Lawrence Berkeley National Laboratory (LBNL) have developed and deployed a new system for detecting credential spearphishing attacks (highly targeted attacks against individuals within the organisation). Like many anomaly detection systems there are challenges of keeping the false positive rate acceptable (not … Continue reading Detecting credential spearphishing attacks in enterprise settings
ActiveClean: Interactive data cleaning for statistical modeling
ActiveClean: Interactive data cleaning for statistical modeling Krishnan et al., VLDB 2016 Yesterday we saw that one of the key features of a machine learning platform is support for data analysis, transformation and validation of datasets used as inputs to the model. In the TFX paper, the authors reference ActiveClean as an example of data … Continue reading ActiveClean: Interactive data cleaning for statistical modeling
Google Vizier: A service for black-box optimization
Google Vizier: a service for black-box optimization Golovin et al., KDD'17 We finished up last week by looking at the role of an internal (or external) experimentation platform. In today's paper Google remind us that such experimentation is just one form of optimisation. Google Vizier is an internal Google service for optimising pretty much anything. … Continue reading Google Vizier: A service for black-box optimization
The evolution of continuous experimentation in software product development
The evolution of continuous experimentation in software product development Fabijan et al., ICSE'17 (Author personal version here) If you've been following along with the A/B testing related papers this week and thinking "we should probably do more of that in my company," then today's paper choice is for you. Anchored in experiences at Microsoft, the … Continue reading The evolution of continuous experimentation in software product development
Peeking at A/B tests: continuous monitoring without pain
Peeking at A/B tests: why it matters, and what to do about it Johari et al., KDD'17 and Continuous monitoring of A/B tests without pain: optional stopping in Bayesian testing Deng, Lu, et al., CEUR'17 Today we have a double header: two papers addressing the challenge of monitoring ongoing experiments. Early stopping in traditional A/B … Continue reading Peeking at A/B tests: continuous monitoring without pain
An efficient bandit algorithm for real-time multivariate optimization
An efficient bandit algorithm for realtime multivariate optimization Hill et al., KDD'17 Aka, "How Amazon improved conversion by 21% in a single week!" Yesterday we saw the hard-won wisdom on display in 'seven myths' recommending that experiments be kept simple and only test one thing at a time, otherwise interpreting the results can get really … Continue reading An efficient bandit algorithm for real-time multivariate optimization
Seven rules of thumb for web site experimenters
Seven rules of thumb for web site experimenters Kohavi et al., KDD'14 Following yesterday's 12 metric interpretation pitfalls, today we're looking at 7 rules of thumb for designing web site experiments. There's a little bit of duplication here, but the paper is packed with great real world examples, and there is some very useful new … Continue reading Seven rules of thumb for web site experimenters
A dirty dozen: twelve common metric interpretation pitfalls in online controlled experiments
A dirty dozen: twelve common metric interpretation pitfalls in online controlled experiments Dmitriev et al., KDD 2017 Pure Gold! Here we have twelve wonderful lessons in how to avoid expensive mistakes in companies that are trying their best to be data-driven. A huge thank you to the team from Microsoft for sharing their hard-won experiences … Continue reading A dirty dozen: twelve common metric interpretation pitfalls in online controlled experiments