PlanAlyzer: assessing threats to the validity of online experiments Tosch et al., OOPSLA'19 It’s easy to make experimental design mistakes that invalidate your online controlled experiments. At an organisation like Facebook (who kindly supplied the corpus of experiments used in this study), the state of art is to have a pool of experts carefully review … Continue reading PlanAlyzer: assessing threats to the validity of online experiments
Tag: Data Science
Futzing and moseying: interviews with professional data analysts on exploration practices
Futzing and moseying: interviews with professional data analysts on exploration practices Alspaugh et al., VAST'18 What do people actually do when they do ‘exploratory data analysis’ (EDA)? This 2018 paper reports on the findings from interviews with 30 professional data analysts to see what they get up to in practice. The only caveat to the … Continue reading Futzing and moseying: interviews with professional data analysts on exploration practices
Vega-Lite: a grammar of interactive graphics
Vega-lite: a grammar of interactive graphics Satyanarayan et al., IEEE transactions on visualization and computer graphics, 2016 From time to time I receive a request for more HCI (human-computer interaction) related papers in The Morning Paper. If you’ve been a follower of The Morning Paper for any time at all you can probably tell that … Continue reading Vega-Lite: a grammar of interactive graphics
Robust learning from untrusted sources
Robust learning from untrusted sources Konstantinov & Lampert, ICML'19 Welcome back to a new term of The Morning Paper! Just before the break we were looking at selected papers from ICML’19, including “Data Shapley.” I’m going to pick things up pretty much where we left off with a few more ICML papers... Data Shapley provides … Continue reading Robust learning from untrusted sources
CORALS: who are my potential new customers? Tapping into the wisdom of customers’ decisions
CORALS: who are my potential new customers? Tapping into the wisdom of customers' decisions Li et al., WSDM'19 The authors of this paper won round 9 of the Yelp dataset challenge for their work. The goal is to find new target customers for local businesses by mining location-based checkins of users, user preferences, and online … Continue reading CORALS: who are my potential new customers? Tapping into the wisdom of customers’ decisions
Protecting user privacy: an approach for untraceable web browsing history and unambiguous user profiles
Protecting user privacy: an approach for untraceable web browsing history and unambiguous user profiles Beigi et al., WSDM'19 Maybe you’re reading this post online at The Morning Paper, and you came here by clicking a link in your Twitter feed because you follow my paper write-up announcements there. It might even be that you fairly … Continue reading Protecting user privacy: an approach for untraceable web browsing history and unambiguous user profiles
The why and how of nonnegative matrix factorization
The why and how of nonnegative matrix factorization Gillis, arXiv 2014 from: ‘Regularization, Optimization, Kernels, and Support Vector Machines.’ Last week we looked at the paper ‘Beyond news content,’ which made heavy use of nonnegative matrix factorisation. Today we’ll be looking at that technique in a little more detail. As the name suggests, ‘The Why … Continue reading The why and how of nonnegative matrix factorization
A survey on dynamic and stochastic vehicle routing problems
A survey on dynamic and stochastic vehicle routing problems Ritzinger et al., International Journal of Production Research It’s been a while since we last looked at an overview of dynamic vehicle routing problems: that was back in 2014 (See ‘Dynamic vehicle routing, pickup, and delivery problems’). That paper has fond memories for me, I looked … Continue reading A survey on dynamic and stochastic vehicle routing problems
Beyond news contents: the role of social context for fake news detection
Beyond news contents: the role of social context for fake news detection Shu et al., WSDM'19 Today we’re looking at a more general fake news problem: detecting fake news that is being spread on a social network. Forgetting the computer science angle for a minute, it seems intuitive to me that some important factors here … Continue reading Beyond news contents: the role of social context for fake news detection
ScootR: scaling R dataframes on dataflow systems
ScootR: scaling R dataframes on dataflow systems Kunft et al., SoCC'18 The language of big data is Java ( / Scala). The languages of data science are Python and R. So what do you do when you want to run your data science analysis over large amounts of data? ...programming languages with rich support for … Continue reading ScootR: scaling R dataframes on dataflow systems