PlanAlyzer: assessing threats to the validity of online experiments

PlanAlyzer: assessing threats to the validity of online experiments Tosch et al., OOPSLA'19 It’s easy to make experimental design mistakes that invalidate your online controlled experiments. At an organisation like Facebook (who kindly supplied the corpus of experiments used in this study), the state of art is to have a pool of experts carefully review ... Continue Reading

Futzing and moseying: interviews with professional data analysts on exploration practices

Futzing and moseying: interviews with professional data analysts on exploration practices Alspaugh et al., VAST'18 What do people actually do when they do ‘exploratory data analysis’ (EDA)? This 2018 paper reports on the findings from interviews with 30 professional data analysts to see what they get up to in practice. The only caveat to the ... Continue Reading

CORALS: who are my potential new customers? Tapping into the wisdom of customers’ decisions

CORALS: who are my potential new customers? Tapping into the wisdom of customers' decisions Li et al., WSDM'19 The authors of this paper won round 9 of the Yelp dataset challenge for their work. The goal is to find new target customers for local businesses by mining location-based checkins of users, user preferences, and online ... Continue Reading

Protecting user privacy: an approach for untraceable web browsing history and unambiguous user profiles

Protecting user privacy: an approach for untraceable web browsing history and unambiguous user profiles Beigi et al., WSDM'19 Maybe you’re reading this post online at The Morning Paper, and you came here by clicking a link in your Twitter feed because you follow my paper write-up announcements there. It might even be that you fairly ... Continue Reading

The why and how of nonnegative matrix factorization

The why and how of nonnegative matrix factorization Gillis, arXiv 2014 from: ‘Regularization, Optimization, Kernels, and Support Vector Machines.’ Last week we looked at the paper ‘Beyond news content,’ which made heavy use of nonnegative matrix factorisation. Today we’ll be looking at that technique in a little more detail. As the name suggests, ‘The Why ... Continue Reading