Do we need specialized graph databases? Benchmarking real-time social networking applications Pacaci et al., GRADES'17 Today's paper comes from the GRADES workshop co-located with SIGMOD. The authors take an established graph data management system benchmark suite (LDBC) and run it across a variety of graph and relational stores. The findings make for very interesting reading, … Continue reading Do we need specialized graph databases? Benchmarking real-time social networking applications
Mosaic: Processing a trillion-edge graph on a single machine Maass et al., EuroSys'17 Unless your graph is bigger than Facebook's, you can process it on a single machine. With the inception of the internet, large-scale graphs comprising web graphs or social networks have become common. For example, Facebook recently reported their largest social graph comprises … Continue reading Mosaic: processing a trillion-edge graph on a single machine
Dependency-driven analytics: a compass for uncharted data oceans Mavlyutov et al. CIDR 2017 Like yesterday's paper, today's paper considers what to do when you simply have too much data to be able to process it all. Forget data lakes, we're in data ocean territory now. This is a problem Microsoft faced with their large clusters … Continue reading Dependency-driven analytics: a compass for uncharted data oceans
Time evolving graph processing at scale Iyer et al., GRADES 2016 Here's a new (June 2016) paper from the distinguished AMPlab group at Berkeley that really gave me cause to reflect. The work addresses the problem of performing graph computations on graphs that are constantly changing (because updates flow in, such as a new follower … Continue reading Time evolving graph processing at scale
Arabesque: A System For Distributed Graph Mining - Teixeira et al. 2015 We've studied graph computation systems before in The Morning Paper: systems such as Pregel, Giraph and GraphLab that provide vertex-centric programming models ('think like a vertex') on top of a Bulk Synchronous Parallel compute model. We've also seen some of the limitations of … Continue reading Arabesque: A System for Distributed Graph Mining
A Bridging Model for Parallel Computation - Valiant 1990 We've seen a lot of references to the 'Bulk Synchronous Parallel' model over the last two weeks. When it was conceived by Valiant in 1990 though, it was intended as a much more general model than simply an abstraction to support graph processing. As the von … Continue reading A Bridging Model for Parallel Computation
Scalability! But at what COST? - McSherry et al. 2015 With thanks to Felix Cuadrado, @felixcuadrado, for pointing this paper out to me via twitter. Scalability is highly prized, yet it can be a misleading metric when studied in isolation. McSherry et al. study the COST of distributed systems: the Configuration that Outperforms a Single … Continue reading Scalability! But at what COST?