STTR: A system for tracking all vehicles all the time at the edge of the network Xu et al., DEBS'18 With apologies for only bringing you two paper write-ups this week: we moved house, which turns out to be not at all conducive to quiet study of research papers! Today’s smart camera surveillance systems are … Continue reading STTR: A system for tracking all vehicles all the time at the edge of the network
Tag: Distributed Systems
Core distributed systems topics, for example consistency, availability and so on.
ServiceFabric: a distributed platform for building microservices in the cloud
ServiceFabric: a distributed platform for building microservices in the cloud Kakivaya et al., EuroSys'18 (If you don’t have ACM Digital Library access, the paper can be accessed either by following the link above directly from The Morning Paper blog site). Microsoft’s Service Fabric powers many of Azure’s critical services. It’s been in development for around … Continue reading ServiceFabric: a distributed platform for building microservices in the cloud
SmoothOperator: reducing power fragmentation and improving power utilization in large-scale datacenters
SmoothOperator: reducing power fragmentation and improving power utilization in large-scale datacenters Hsu et al., ASPLOS'18 What do you do when your theory of constraints analysis reveals that power has become your major limiting factor? That is, you can’t add more servers to your existing datacenter(s) without blowing your power budget, and you don’t want to … Continue reading SmoothOperator: reducing power fragmentation and improving power utilization in large-scale datacenters
Skyway: connecting managed heaps in distributed big data systems
Skyway: connecting managed heaps in distributed big data systems Nguyen et al., ASPLOS'18 Yesterday we saw how to make Java objects persistent using NVM-backed heaps with Espresso. One of the drawbacks of using that as a persistence mechanism is that they’re only stored in the memory of a single node. If only there was some … Continue reading Skyway: connecting managed heaps in distributed big data systems
WSMeter: A performance evaluation methodology for Google’s production warehouse-scale computers
WSMeter: A performance evaluation methodology for Google’s production warehouse-scale computers Lee et al., ASPLOS'18 (The link above is to the ACM Digital Library, if you don’t have membership you should still be able to access the paper pdf by following the link from The Morning Paper blog post directly.) How do you know how well … Continue reading WSMeter: A performance evaluation methodology for Google’s production warehouse-scale computers
Anna: A KVS for any scale
Anna: A KVS for any scale Wu et al., ICDE'18 This work comes out of the RISE project at Berkeley, and regular readers of The Morning Paper will be familiar with much of the background. Here’s how Joe Hellerstein puts it in his blog post introducing the work: As researchers, we asked the counter-cultural question: … Continue reading Anna: A KVS for any scale
Protocol aware recovery for consensus-based storage
Protocol aware recovery for consensus based storage Alagappan et al., FAST’18 Following on from their excellent previous work on ‘All file systems are not created equal’ (well worth a read if you haven’t encountered it yet), in this paper the authors look at how well some of our most reliable protocols — those used in … Continue reading Protocol aware recovery for consensus-based storage
Fail-slow at scale: evidence of hardware performance faults in large production systems
Fail-slow at scale: evidence of hardware performance faults in large production systems Gunawi et al., FAST’18 The first thing that strikes you about this paper is the long list of authors from multiple different establishments. That’s because it’s actually a study of 101 different fail-slow hardware incidents collected across large-scale cluster deployments in 12 different … Continue reading Fail-slow at scale: evidence of hardware performance faults in large production systems
SoK: Consensus in the age of blockchains
SoK: Consensus in the age of blockchains Bano et al., arXiv 2017 There are so many things to consider when evaluating a blockchain based technology / system. For example (and in no particular order): What cryptographic building blocks does it depend on? Does it offer privacy of identity (anonymity)? Does it offer privacy of data? … Continue reading SoK: Consensus in the age of blockchains
Why is random testing effective for partition tolerance bugs?
Why is random testing effective for partition tolerance bugs? Majumdar & Niksic, POPL 18 A little randomness is a powerful thing! It can make the impossible possible (FLP ), balance systems remarkably well (the power of two random choices), and of course underpin much of cryptography. Today’s paper choice examines the unreasonable effectiveness of random … Continue reading Why is random testing effective for partition tolerance bugs?