Mining High-Speed Data Streams - Domingos & Hulten 2000 This paper won a 'test of time' award at KDD'15 as an 'outstanding paper from a past KDD Conference beyond the last decade that has had an important impact on the data mining community.' Here's what the test-of-time committee have to say about it: This paper … Continue reading Mining High-Speed Data Streams
Category: Machine Learning
The machine learning subset of AI. Includes deep learning among other topics.
Efficient Algorithms for Public-Private Social Networks
Efficient Algorithms for Public-Private Social Networks - Chierichetti et al. 2015 Today's choice won a best paper award at KDD'15. The authors examine a number of algorithms for computing graph (network) measures in the context of social networks that enable private groups and connections. These are characterised by a large public graph G=(V,E), and for … Continue reading Efficient Algorithms for Public-Private Social Networks
A Machine Learning Framework to Identify Students at Risk of Adverse Academic Outcomes
A Machine Learning Framework to Identify Students at Risk of Adverse Academic Outcomes - Lakkaraju et al. 2015 This is the first of a series of papers from the Knowledge Discovery and Data Mining (KDD'15) conference that we'll look at this week. Today's paper is all about helping high school students in the US who … Continue reading A Machine Learning Framework to Identify Students at Risk of Adverse Academic Outcomes
FlashGraph: Processing Billion Node Graphs on an Array of Commodity SSDs
FlashGraph: Processing Billion Node Graphs on an Array of Commodity SSDs - Zheng et al. The Web Data Commons project is the largest web corpus available to the public. Their hyperlink (page) graph dataset contains 3.4B vertices and 129B edges contained in over 1TB of data, and a graph diameter of 650. To the best … Continue reading FlashGraph: Processing Billion Node Graphs on an Array of Commodity SSDs
PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs
PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs - Gonzalez et al. 2012 A lot of the time, we want to perform computations on graphs that model the real world. As we saw in Exploring Complex Networks, such graphs often follow a power-law degree distribution (i.e., a few nodes are very highly connected, and many nodes … Continue reading PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs
Distributed GraphLab: A framework for machine learning and data mining in the cloud
Distributed GraphLab: A framework for machine learning and data mining in the cloud - Low et al. 2012 Two years on from the initial GraphLab paper we looked at yesterday comes this extension to support distributed graph processing for larger graphs, including data mining use cases. In this paper, we extend the GraphLab framework to … Continue reading Distributed GraphLab: A framework for machine learning and data mining in the cloud
GraphLab: A new framework for parallel machine learning
GraphLab: A new framework for parallel machine learning - Low et al. 2010 In this paper we propose GraphLab, a new parallel framework for ML which exploits the sparse structure and common computational patterns of ML algorithms. GraphLab enables ML experts to easily design and implement efficient scalable parallel algorithms by composing problem specific computation, … Continue reading GraphLab: A new framework for parallel machine learning
Machine Learning Classification over Encrypted Data
Machine Learning Classification over Encrypted Data - Bost et al. 2015 This is the 2nd of three papers we'll be looking at this week from the NDSS '15 conference held earlier this month in San Diego. When it comes to providing an informed critique of the security techniques applied in this paper, I'm out of … Continue reading Machine Learning Classification over Encrypted Data
WANalytics: Analytics for a geo-distributed, data intensive world
WANalytics: analytics for a geo-distributed data intensive world - Vulimiri et al. 2015 ...data is born distributed; we only control data replication and distributed execution strategies. This is true for so many sources of data. Combine this with Dave McCrory's observation that 'Data has Gravity' (i.e. it attracts applications and other data processing workloads to … Continue reading WANalytics: Analytics for a geo-distributed, data intensive world
The Missing Piece in Complex Analytics
The Missing Piece in Complex Analytics: Low latency scalable model management and serving with Velox - Crankshaw et al. 2015. Analytics at scale can be used to create statistical models for making predictions about the world, but once the data scientists and analysts have done their initial work and a model has been built and … Continue reading The Missing Piece in Complex Analytics