Time evolving graph processing at scale Iyer et al., GRADES 2016 Here's a new (June 2016) paper from the distinguished AMPlab group at Berkeley that really gave me cause to reflect. The work addresses the problem of performing graph computations on graphs that are constantly changing (because updates flow in, such as a new follower … Continue reading Time evolving graph processing at scale
Tag: Graph
Graph processing systems and algorithms
Arabesque: A System for Distributed Graph Mining
Arabesque: A System For Distributed Graph Mining - Teixeira et al. 2015 We've studied graph computation systems before in The Morning Paper: systems such as Pregel, Giraph and GraphLab that provide vertex-centric programming models ('think like a vertex') on top of a Bulk Synchronous Parallel compute model. We've also seen some of the limitations of … Continue reading Arabesque: A System for Distributed Graph Mining
A Bridging Model for Parallel Computation
A Bridging Model for Parallel Computation - Valiant 1990 We've seen a lot of references to the 'Bulk Synchronous Parallel' model over the last two weeks. When it was conceived by Valiant in 1990 though, it was intended as a much more general model than simply an abstraction to support graph processing. As the von … Continue reading A Bridging Model for Parallel Computation
Scalability! But at what COST?
Scalability! But at what COST? - McSherry et al. 2015 With thanks to Felix Cuadrado, @felixcuadrado, for pointing this paper out to me via twitter. Scalability is highly prized, yet it can be a misleading metric when studied in isolation. McSherry et al. study the COST of distributed systems: the Configuration that Outperforms a Single … Continue reading Scalability! But at what COST?
Blogel: A Block-Centric Framework for Distributed Computation on Real-World Graphs
Blogel: A Block-Centric Framework for Distributed Computation on Real-World Graphs - Yan et al. 2014 We've looked at a lot of different Graph-processing systems over the last couple of weeks (onto a new topic next week I promise!), and despite a variety of different implementation and execution models, one thing they all have in common … Continue reading Blogel: A Block-Centric Framework for Distributed Computation on Real-World Graphs
Pregelix: Big(ger) Graph Analytics on a Dataflow Engine
Pregelix: Big(ger) Graph Anayltics on a Dataflow Engine - Bu et al. 2015 FlashGraph shows us that it's possible to efficiently process graphs that aren't solely in-memory, and GraphX showed us that we can map graph abstractions on top of a dataflow engine. Put the two ideas together, and you get something that looks like … Continue reading Pregelix: Big(ger) Graph Analytics on a Dataflow Engine
FlashGraph: Processing Billion Node Graphs on an Array of Commodity SSDs
FlashGraph: Processing Billion Node Graphs on an Array of Commodity SSDs - Zheng et al. The Web Data Commons project is the largest web corpus available to the public. Their hyperlink (page) graph dataset contains 3.4B vertices and 129B edges contained in over 1TB of data, and a graph diameter of 650. To the best … Continue reading FlashGraph: Processing Billion Node Graphs on an Array of Commodity SSDs
GraphX: Graph Processing in a Distributed Dataflow Framework
GraphX: Graph Processing in a Distributed Dataflow Framework - Gonzalez et al. 2014 This is the second of two weeks dedicated to graph processing. So far in this mini-series we've looked at what we know about networks of complex systems and graphs that model the real-world; Google's Pregel which led to a whole set of … Continue reading GraphX: Graph Processing in a Distributed Dataflow Framework
PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs
PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs - Gonzalez et al. 2012 A lot of the time, we want to perform computations on graphs that model the real world. As we saw in Exploring Complex Networks, such graphs often follow a power-law degree distribution (i.e., a few nodes are very highly connected, and many nodes … Continue reading PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs
Distributed GraphLab: A framework for machine learning and data mining in the cloud
Distributed GraphLab: A framework for machine learning and data mining in the cloud - Low et al. 2012 Two years on from the initial GraphLab paper we looked at yesterday comes this extension to support distributed graph processing for larger graphs, including data mining use cases. In this paper, we extend the GraphLab framework to … Continue reading Distributed GraphLab: A framework for machine learning and data mining in the cloud