STTR: A system for tracking all vehicles all the time at the edge of the network

STTR: A system for tracking all vehicles all the time at the edge of the network Xu et al., DEBS'18 With apologies for only bringing you two paper write-ups this week: we moved house, which turns out to be not at all conducive to quiet study of research papers! Today’s smart camera surveillance systems are … Continue reading STTR: A system for tracking all vehicles all the time at the edge of the network

ServiceFabric: a distributed platform for building microservices in the cloud

ServiceFabric: a distributed platform for building microservices in the cloud Kakivaya et al., EuroSys'18 (If you don’t have ACM Digital Library access, the paper can be accessed either by following the link above directly from The Morning Paper blog site). Microsoft’s Service Fabric powers many of Azure’s critical services. It’s been in development for around … Continue reading ServiceFabric: a distributed platform for building microservices in the cloud

SmoothOperator: reducing power fragmentation and improving power utilization in large-scale datacenters

SmoothOperator: reducing power fragmentation and improving power utilization in large-scale datacenters Hsu et al., ASPLOS'18 What do you do when your theory of constraints analysis reveals that power has become your major limiting factor? That is, you can’t add more servers to your existing datacenter(s) without blowing your power budget, and you don’t want to … Continue reading SmoothOperator: reducing power fragmentation and improving power utilization in large-scale datacenters

Skyway: connecting managed heaps in distributed big data systems

Skyway: connecting managed heaps in distributed big data systems Nguyen et al., ASPLOS'18 Yesterday we saw how to make Java objects persistent using NVM-backed heaps with Espresso. One of the drawbacks of using that as a persistence mechanism is that they’re only stored in the memory of a single node. If only there was some … Continue reading Skyway: connecting managed heaps in distributed big data systems

WSMeter: A performance evaluation methodology for Google’s production warehouse-scale computers

WSMeter: A performance evaluation methodology for Google’s production warehouse-scale computers Lee et al., ASPLOS'18 (The link above is to the ACM Digital Library, if you don’t have membership you should still be able to access the paper pdf by following the link from The Morning Paper blog post directly.) How do you know how well … Continue reading WSMeter: A performance evaluation methodology for Google’s production warehouse-scale computers

Protocol aware recovery for consensus-based storage

Protocol aware recovery for consensus based storage Alagappan et al., FAST’18 Following on from their excellent previous work on ‘All file systems are not created equal’ (well worth a read if you haven’t encountered it yet), in this paper the authors look at how well some of our most reliable protocols — those used in … Continue reading Protocol aware recovery for consensus-based storage

Fail-slow at scale: evidence of hardware performance faults in large production systems

Fail-slow at scale: evidence of hardware performance faults in large production systems Gunawi et al., FAST’18 The first thing that strikes you about this paper is the long list of authors from multiple different establishments. That’s because it’s actually a study of 101 different fail-slow hardware incidents collected across large-scale cluster deployments in 12 different … Continue reading Fail-slow at scale: evidence of hardware performance faults in large production systems

Why is random testing effective for partition tolerance bugs?

Why is random testing effective for partition tolerance bugs? Majumdar & Niksic, POPL 18 A little randomness is a powerful thing! It can make the impossible possible (FLP ), balance systems remarkably well (the power of two random choices), and of course underpin much of cryptography. Today’s paper choice examines the unreasonable effectiveness of random … Continue reading Why is random testing effective for partition tolerance bugs?