Probabilistically Bounded Staleness for Practical Partial Quorums - Bailis et al. 2012, and Quantifying Eventual Consistency with PBS - Bailis et al. 2014 'Probabilistically Bounded Staleness... ' was the original VLDB '12 paper, and then the authors were invited to submit an extended version to the VLDB Journal ('Quantifying Eventual Consistency...') which was published in … Continue reading Probabilistically Bounded Staleness for Practical Partial Quorums
Month: August 2015
Mining and Summarizing Customer Reviews
Mining and Summarizing Customer Reviews - Hu and Liu 2004 This is the third of the three 'test-of-time' award winners from KDD'15. From the awards page: The paper introduces the problem of summarizing customer reviews and decomposes the problem into the three steps of (1) mining product features (aspects), (2) identifying opinion sentences and their … Continue reading Mining and Summarizing Customer Reviews
Optimizing Search Engines using Clickthrough Data
Optimizing Search Engines using Clickthrough Data - Joachims, 2002 Today's choice is another KDD 'test-of-time' winner. The paper introduced the problem of ranking documents w.r.t. a query using not explicit user feedback but implicit user feedback in the form of clickthrough data. The author presented the Ranking SVM Algorithm to solve the proposed ranking problem. … Continue reading Optimizing Search Engines using Clickthrough Data
Mining High-Speed Data Streams
Mining High-Speed Data Streams - Domingos & Hulten 2000 This paper won a 'test of time' award at KDD'15 as an 'outstanding paper from a past KDD Conference beyond the last decade that has had an important impact on the data mining community.' Here's what the test-of-time committee have to say about it: This paper … Continue reading Mining High-Speed Data Streams
Efficient Algorithms for Public-Private Social Networks
Efficient Algorithms for Public-Private Social Networks - Chierichetti et al. 2015 Today's choice won a best paper award at KDD'15. The authors examine a number of algorithms for computing graph (network) measures in the context of social networks that enable private groups and connections. These are characterised by a large public graph G=(V,E), and for … Continue reading Efficient Algorithms for Public-Private Social Networks
A Machine Learning Framework to Identify Students at Risk of Adverse Academic Outcomes
A Machine Learning Framework to Identify Students at Risk of Adverse Academic Outcomes - Lakkaraju et al. 2015 This is the first of a series of papers from the Knowledge Discovery and Data Mining (KDD'15) conference that we'll look at this week. Today's paper is all about helping high school students in the US who … Continue reading A Machine Learning Framework to Identify Students at Risk of Adverse Academic Outcomes
MillWheel: Fault-Tolerant Stream Processing at Internet Scale
MillWheel: Fault-Tolerant Stream Processing at Internet Scale - Akidau et al. (Google) 2013 Earlier this week we looked at the Google Cloud Dataflow model which is implemented on top of FlumeJava (for batch) and MillWheel (for streaming): We have implemented this model internally in FlumeJava, with MillWheel used as the underlying execution engine for streaming … Continue reading MillWheel: Fault-Tolerant Stream Processing at Internet Scale