Pregelix: Big(ger) Graph Analytics on a Dataflow Engine

June 3, 2015 ~ Adrian Colyer ~ Leave a comment

Pregelix: Big(ger) Graph Anayltics on a Dataflow Engine - Bu et al. 2015 FlashGraph shows us that it's possible to efficiently process graphs that aren't solely in-memory, and GraphX showed us that we can map graph abstractions on top of a dataflow engine. Put the two ideas together, and you get something that looks like ... Continue Reading

FlashGraph: Processing Billion Node Graphs on an Array of Commodity SSDs

June 2, 2015 ~ Adrian Colyer ~ 1 Comment

FlashGraph: Processing Billion Node Graphs on an Array of Commodity SSDs - Zheng et al. The Web Data Commons project is the largest web corpus available to the public. Their hyperlink (page) graph dataset contains 3.4B vertices and 129B edges contained in over 1TB of data, and a graph diameter of 650. To the best ... Continue Reading

GraphX: Graph Processing in a Distributed Dataflow Framework

June 1, 2015 ~ Adrian Colyer ~ 4 Comments

GraphX: Graph Processing in a Distributed Dataflow Framework - Gonzalez et al. 2014 This is the second of two weeks dedicated to graph processing. So far in this mini-series we've looked at what we know about networks of complex systems and graphs that model the real-world; Google's Pregel which led to a whole set of ... Continue Reading

PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs

May 29, 2015 ~ Adrian Colyer ~ 7 Comments

PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs - Gonzalez et al. 2012 A lot of the time, we want to perform computations on graphs that model the real world. As we saw in Exploring Complex Networks, such graphs often follow a power-law degree distribution (i.e., a few nodes are very highly connected, and many nodes ... Continue Reading

Distributed GraphLab: A framework for machine learning and data mining in the cloud

May 28, 2015 ~ Adrian Colyer ~ 4 Comments

Distributed GraphLab: A framework for machine learning and data mining in the cloud - Low et al. 2012 Two years on from the initial GraphLab paper we looked at yesterday comes this extension to support distributed graph processing for larger graphs, including data mining use cases. In this paper, we extend the GraphLab framework to ... Continue Reading

GraphLab: A new framework for parallel machine learning

May 27, 2015 ~ Adrian Colyer ~ 6 Comments

GraphLab: A new framework for parallel machine learning - Low et al. 2010 In this paper we propose GraphLab, a new parallel framework for ML which exploits the sparse structure and common computational patterns of ML algorithms. GraphLab enables ML experts to easily design and implement efficient scalable parallel algorithms by composing problem specific computation, ... Continue Reading

Pregel: A System for Large-Scale Graph Processing

May 26, 2015 ~ Adrian Colyer ~ 13 Comments

Pregel: A System for Large-Scale Graph Processing - Malewicz et al. (Google) 2010 "Many practical computing problems concern large graphs." Yesterday we looked at some of the models for understanding networks and graphs. Today's paper focuses on processing of graphs, especially the efficient processing of large graphs where large can mean billions of vertices and ... Continue Reading

FAWN: A Fast Array of Wimpy Nodes

May 22, 2015 ~ Adrian Colyer ~ 1 Comment

FAWN: A Fast Array of Wimpy Nodes - Andersen et al. 2009 A few days ago we looked at FaRM (Fast Remote Memory), which used RDMA to match network speed with the speed of CPUs and got some very impressive results in terms of queries & transactions per second. But maybe there's another way of ... Continue Reading

Practical Byzantine Fault Tolerance

May 18, 2015 ~ Adrian Colyer ~ 49 Comments

Practical Byzantine Fault Tolerance - Castro & Liskov 1999 Oh Byzantine, you conflict me. On the one hand, we know that the old model of a security perimeter around an undefended centre is hopelessly broken (witness Google moves its Corporate Applications to the Internet)- so Byzantine models, which allow for any deviation from expected behaviour ... Continue Reading

the morning paper

a random walk through Computer Science research, by Adrian Colyer
Made delightfully fast by strattic

Distributed Systems