Efficient Algorithms for Public-Private Social Networks

Efficient Algorithms for Public-Private Social Networks - Chierichetti et al. 2015 Today's choice won a best paper award at KDD'15. The authors examine a number of algorithms for computing graph (network) measures in the context of social networks that enable private groups and connections. These are characterised by a large public graph G=(V,E), and for … Continue reading Efficient Algorithms for Public-Private Social Networks

A Machine Learning Framework to Identify Students at Risk of Adverse Academic Outcomes

A Machine Learning Framework to Identify Students at Risk of Adverse Academic Outcomes - Lakkaraju et al. 2015 This is the first of a series of papers from the Knowledge Discovery and Data Mining (KDD'15) conference that we'll look at this week. Today's paper is all about helping high school students in the US who … Continue reading A Machine Learning Framework to Identify Students at Risk of Adverse Academic Outcomes

FlashGraph: Processing Billion Node Graphs on an Array of Commodity SSDs

FlashGraph: Processing Billion Node Graphs on an Array of Commodity SSDs - Zheng et al. The Web Data Commons project is the largest web corpus available to the public. Their hyperlink (page) graph dataset contains 3.4B vertices and 129B edges contained in over 1TB of data, and a graph diameter of 650. To the best … Continue reading FlashGraph: Processing Billion Node Graphs on an Array of Commodity SSDs

PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs

PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs - Gonzalez et al. 2012 A lot of the time, we want to perform computations on graphs that model the real world. As we saw in Exploring Complex Networks, such graphs often follow a power-law degree distribution (i.e., a few nodes are very highly connected, and many nodes … Continue reading PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs

Distributed GraphLab: A framework for machine learning and data mining in the cloud

Distributed GraphLab: A framework for machine learning and data mining in the cloud - Low et al. 2012 Two years on from the initial GraphLab paper we looked at yesterday comes this extension to support distributed graph processing for larger graphs, including data mining use cases. In this paper, we extend the GraphLab framework to … Continue reading Distributed GraphLab: A framework for machine learning and data mining in the cloud

GraphLab: A new framework for parallel machine learning

GraphLab: A new framework for parallel machine learning - Low et al. 2010 In this paper we propose GraphLab, a new parallel framework for ML which exploits the sparse structure and common computational patterns of ML algorithms. GraphLab enables ML experts to easily design and implement efficient scalable parallel algorithms by composing problem specific computation, … Continue reading GraphLab: A new framework for parallel machine learning

WANalytics: Analytics for a geo-distributed, data intensive world

WANalytics: analytics for a geo-distributed data intensive world - Vulimiri et al. 2015 ...data is born distributed; we only control data replication and distributed execution strategies. This is true for so many sources of data. Combine this with Dave McCrory's observation that 'Data has Gravity' (i.e. it attracts applications and other data processing workloads to … Continue reading WANalytics: Analytics for a geo-distributed, data intensive world