Exploring Complex Networks - Strogatz 2001 Network anatomy is important to characterize because structure always affects function... Written in 2001, this article - recently recommended by Werner Vogels in his 'Back-to-Basics' series - explores the topic of complex networks. It turns out that the behaviour of individual nodes, and the way that we connect them … Continue reading Exploring Complex Networks
FAWN: A Fast Array of Wimpy Nodes
FAWN: A Fast Array of Wimpy Nodes - Andersen et al. 2009 A few days ago we looked at FaRM (Fast Remote Memory), which used RDMA to match network speed with the speed of CPUs and got some very impressive results in terms of queries & transactions per second. But maybe there's another way of … Continue reading FAWN: A Fast Array of Wimpy Nodes
Congestion Avoidance and Control
Congestion Avoidance and Control - Jacobson & Karels, 1988 (** corrected spelling of Jacobs_o_n **) It's October 1986 and there's trouble on the internet. A congestion collapse has reduced the bandwidth between LBL and UC Berkeley by a factor of a thousand. These two sites happened to be 400 yds apart. And that drop in … Continue reading Congestion Avoidance and Control
FaRM: Fast Remote Memory
FaRM: Fast Remote Memory - Dragojevic, et al. 2014 Yesterday we looked at Facebook's graph store,TAO, that can handle a billion reads/sec and millions of writes/sec. In today's choice a team from Microsoft Research reimplemented TAO, and beat those numbers by an order of magnitude! FaRM’s per-machine throughput of 6.3 million operations per second is … Continue reading FaRM: Fast Remote Memory
TAO: Facebook’s Distributed Data Store for the Social Graph
TAO: Facebook's Distributed Data Store for the Social Graph Bronson et al. (Facebook) 2013 A single Facebook page may aggregate and filter hundreds of items from the social graph. We present each user with content tailored to them, and we filter every item with privacy checks that take into account the current viewer. This extreme … Continue reading TAO: Facebook’s Distributed Data Store for the Social Graph
Practical Byzantine Fault Tolerance
Practical Byzantine Fault Tolerance - Castro & Liskov 1999 Oh Byzantine, you conflict me. On the one hand, we know that the old model of a security perimeter around an undefended centre is hopelessly broken (witness Google moves its Corporate Applications to the Internet)- so Byzantine models, which allow for any deviation from expected behaviour … Continue reading Practical Byzantine Fault Tolerance
FastRoute: A scalable load-aware anycast routing architecture for modern CDNs
FastRoute: A scalable load-aware anycast routing architecture for modern CDNs - Flavel et al. 2015 This is the story of how a team at Microsoft redesigned their CDN that supports 'numerous popular online services.' It's also a great example of mature systems thinking: the team deliberately eschew designs that would give marginally better performance at … Continue reading FastRoute: A scalable load-aware anycast routing architecture for modern CDNs
Wormhole: Reliable pub-sub to support Geo-Replicated Internet Services
Wormhole: Reliable pub-sub to support Geo-Replicated Internet Services - Sharma et al. 2015 At Facebook, lots of applications are interested in data being written to Facebook's data stores. Having each of these applications poll the data stores of interest would be untenable, so Facebook built a pub-sub system to identify updates and transmit notifications to … Continue reading Wormhole: Reliable pub-sub to support Geo-Replicated Internet Services
The Design and Implementation of Open vSwitch
The Design and Implementation of Open vSwitch - Pfaff et al. 2015 Another selection from this month's NSDI 2015 programme, this time from the operational systems track. What inspired the creation of Open vSwitch? What has most influenced its design? And what's next? As virtualized (or containerized) workloads grew, physically provisioning networks to support them … Continue reading The Design and Implementation of Open vSwitch
Queues don’t matter when you can JUMP them
Queues don't matter when you can JUMP them - Grosvenor et al. 2015 The Cambridge Systems at Scale team are on a roll. Hot on the heels of the excellent Musketeer paper from Eurosys 2015 comes this paper on QJUMP which last week won a best paper award at NSDI'15. Distributed systems design involves trade-offs. … Continue reading Queues don’t matter when you can JUMP them