Slim: OS kernel support for a low-overhead container overlay network

March 22, 2019March 16, 2019 ~ adriancolyer ~ 1 Comment

Slim: OS kernel support for a low-overhead container overlay network Zhuo et al., NSDI'19 Container overlay networks rely on packet transformations, with each packet traversing the networking stack twice on its way from the sending container to the receiving container. There are CPU, throughput, and latency overheads associated with those traversals. In this paper, we … Continue reading Slim: OS kernel support for a low-overhead container overlay network

Understanding lifecycle management complexity of datacenter topologies

March 20, 2019March 14, 2019 ~ adriancolyer ~ 1 Comment

Understanding lifecycle management complexity of datacenter topologies Zhang et al., NSDI'19 There has been plenty of interesting research on network topologies for datacenters, with Clos-like tree topologies and Expander based graph topologies both shown to scale using widely deployed hardware. This research tends to focus on performance properties such as throughput and latency, together with … Continue reading Understanding lifecycle management complexity of datacenter topologies

Datacenter RPCs can be general and fast

March 18, 2019March 14, 2019 ~ adriancolyer ~ 9 Comments

Datacenter RPCs can be general and fast Kalia et al., NSDI'19 We’ve seen a lot of exciting work exploiting combinations of RDMA, FPGAs, and programmable network switches in the quest for high performance distributed systems. I’m as guilty as anyone for getting excited about all of that. The wonderful thing about today’s paper, for which … Continue reading Datacenter RPCs can be general and fast

Exploiting commutativity for practical fast replication

March 15, 2019March 15, 2019 ~ adriancolyer ~ 8 Comments

Exploiting commutativity for practical fast replication Park & Ousterhout, NSDI'19 I’m really impressed with this work. The authors give us a practical-to-implement enhancement to replication schemes (e.g., as used in primary-backup systems) that offers a signification performance boost. I’m expecting to see this picked up and rolled-out in real-world systems as word spreads. At a … Continue reading Exploiting commutativity for practical fast replication

Cloud computing simplified: a Berkeley view on serverless computing

March 13, 2019March 9, 2019 ~ adriancolyer ~ 8 Comments

Cloud programming simplified: a Berkeley view on serverless computing Jonas et al., arXiv 2019 With thanks to Eoin Brazil who first pointed this paper out to me via Twitter…. Ten years ago Berkeley released the ‘Berkeley view of cloud computing’ paper, predicting that cloud use would accelerate. Today’s paper choice is billed as its logical … Continue reading Cloud computing simplified: a Berkeley view on serverless computing

Efficient synchronisation of state-based CRDTs

March 11, 2019March 9, 2019 ~ adriancolyer

Efficient synchronisation of state-based CRDTs Enes et al., arXiv’18 CRDTs are a great example of consistency as logical monotonicity. They come in two main variations: operation-based CRDTs send operations to remote replicas using a reliable dissemination layer with exactly-once causal delivery. (If operations are idempotent then at-least-once is ok too). state-based CRDTs exchange information about … Continue reading Efficient synchronisation of state-based CRDTs

A generalised solution to distributed consensus

March 8, 2019March 2, 2019 ~ adriancolyer ~ 12 Comments

A generalised solution to distributed consensus Howard & Mortier, arXiv'19 This is a draft paper that Heidi Howard recently shared with the world via Twitter, and here’s the accompanying blog post. It caught my eye for promising a generalised solution to the consensus problem, and also for using reasoning over immutable state to get there. … Continue reading A generalised solution to distributed consensus

Keeping CALM: when distributed consistency is easy

March 6, 2019February 28, 2019 ~ adriancolyer ~ 20 Comments

Keeping CALM: when distributed consistency is easy Hellerstein & Alvaro, arXiv 2019 The CALM conjecture (and later theorem) was first introduced to the world in a 2010 keynote talk at PODS. Behind its simple formulation there’s a deep lesson to be learned with the power to create ripples through our industry akin to the influence … Continue reading Keeping CALM: when distributed consistency is easy

Efficient large-scale fleet management via multi-agent deep reinforcement learning

March 4, 2019February 28, 2019 ~ adriancolyer

Efficient large-scale fleet management via multi-agent deep reinforcement learning Lin et al., KDD'18 A couple of weeks ago we looked at a survey paper covering approaches to dynamic, stochastic, vehicle routing problems (DSVRPs). At the end of the write-up I mentioned that I couldn’t help wondering about an end-to-end deep learning based approach to learning … Continue reading Efficient large-scale fleet management via multi-agent deep reinforcement learning

Large scale GAN training for high fidelity natural image synthesis

March 1, 2019February 24, 2019 ~ adriancolyer

Large scale GAN training for high fidelity natural image synthesis Brock et al., ICLR'19 Ian Goodfellow’s tweets showing x years of progress on GAN image generation really bring home how fast things are improving. For example, here’s 4.5 years worth of progress on face generation: And here we have just two years of progress on … Continue reading Large scale GAN training for high fidelity natural image synthesis