Virtual consensus in Delos

November 9, 2020October 20, 2025 ~ adriancolyer

Virtual consensus in Delos, Balakrishnan et al. (Facebook, Inc.), OSDI’2020 Before we dive into this paper, if you click on the link above and then download and open up the paper pdf you might notice the familiar red/orange splash of USENIX, and appreciate the fully open access. USENIX is a nonprofit organisation committed to making content and … Continue reading Virtual consensus in Delos

Understanding, detecting and localizing partial failures in large system software

March 16, 2020March 15, 2020 ~ adriancolyer ~ 2 Comments

Understanding, detecting and localizing partial failures in large system software, Lou et al., NSDI'20 Partial failures (gray failures) occur when some but not all of the functionalities of a system are broken. On the surface everything can appear to be fine, but under the covers things may be going astray. When a partial failure occurs, … Continue reading Understanding, detecting and localizing partial failures in large system software

Cloudburst: stateful functions-as-a-service

February 7, 2020February 2, 2020 ~ adriancolyer ~ 2 Comments

Cloudburst: stateful functions-as-a-service, Sreekanti et al., arXiv 2020 Today's paper choice is a fresh-from-the-arXivs take on serverless computing from the RISELab at Berkeley, addressing some of the limitations outlined in last year's 'Berkeley view on serverless computing.' Stateless is fine until you need state, at which point the coarse-grained solutions offered by current platforms limit … Continue reading Cloudburst: stateful functions-as-a-service

Seamless offloading of web app computations from mobile device to edge clouds via HTML5 Web Worker migration

January 31, 2020January 24, 2020 ~ adriancolyer ~ 15 Comments

Seamless offloading of web app computations from mobile device to edge clouds via HTML5 web worker migration, Jeong et al., SoCC'19 [^1] This paper caught my eye for its combination of an intriguing idea (opportunistic offload of computation from mobile devices to the edge) and the elegance of the way the web worker interface supports … Continue reading Seamless offloading of web app computations from mobile device to edge clouds via HTML5 Web Worker migration

Mergeable replicated data types – Part II

November 27, 2019November 23, 2019 ~ adriancolyer

Mergeable replicated data types - part II Kaki et al., OOPLSA '19 Last time out we saw how Mergeable Replicated Data Types (MRDTs) use a bijection between the natural domain of a data type and relational sets to define merge semantics between two concurrently modified versions given their lowest common ancestor (LCA). Today we’re picking … Continue reading Mergeable replicated data types – Part II

Mergeable replicated data types – Part I

November 25, 2019November 23, 2019 ~ adriancolyer ~ 16 Comments

Mergeable replicated data types Kaki et al., OOPSLA'19 This paper was published at OOPSLA, but perhaps it’s amongst the distributed systems community that I expect there to be the greatest interest. Mergeable Replicated Data Types (MRDTs) are in the same spirit as CRDTs but with the very interesting property that they compose. Furthermore, a principled … Continue reading Mergeable replicated data types – Part I

Local-first software: you own your data, in spite of the cloud

November 20, 2019November 17, 2019 ~ adriancolyer ~ 32 Comments

Local-first software: you own your data, in spite of the cloud Kleppmann et al., Onward! '19 Watch out! If you start reading this paper you could be lost for hours following all the interesting links and ideas, and end up even more dissatisfied than you already are with the state of software today. You might … Continue reading Local-first software: you own your data, in spite of the cloud

SLOG: serializable, low-latency, geo-replicated transactions

September 4, 2019September 1, 2019 ~ adriancolyer ~ 1 Comment

SLOG: serializable, low-latency, geo-replicated transactions Ren et al., VLDB'19 SLOG is another research system motivated by the needs of the application developer (aka, user!). Building correct applications is much easier when the system provides strict serializability guarantees. Strict serializability reduces application code complexity and bugs, since it behaves like a system that is running on … Continue reading SLOG: serializable, low-latency, geo-replicated transactions

Software-defined far memory in warehouse scale computers

May 22, 2019May 16, 2019 ~ adriancolyer ~ 13 Comments

Software-defined far memory in warehouse-scale computers Lagar-Cavilla et al., ASPLOS'19 Memory (DRAM) remains comparatively expensive, while in-memory computing demands are growing rapidly. This makes memory a critical factor in the total cost of ownership (TCO) of large compute clusters, or as Google like to call them "Warehouse-scale computers (WSCs)." This paper describes a "far memory" … Continue reading Software-defined far memory in warehouse scale computers

Seer: leveraging big data to navigate the complexity of performance debugging in cloud microservices

May 15, 2019May 9, 2019 ~ adriancolyer ~ 3 Comments

Seer: leveraging big data to navigate the complexity of performance debugging in cloud microservices Gan et al., ASPLOS'19 Last time around we looked at the DeathStarBench suite of microservices-based benchmark applications and learned that microservices systems can be especially latency sensitive, and that hotspots can propagate through a microservices architecture in interesting ways. Seer is … Continue reading Seer: leveraging big data to navigate the complexity of performance debugging in cloud microservices