Fast key-value stores: an idea whose time has come and gone

Fast key-value stores: an idea whose time has come and gone Adya et al., HotOS'19 No controversy here! Adya et al. would like you to stop using Memcached and Redis, and start building 11-factor apps. Factor VI in the 12-factor app manifesto, "Execute the app as one or more stateless processes," to be dropped and … Continue reading Fast key-value stores: an idea whose time has come and gone

Nines are not enough: meaningful metrics for clouds

Nines are not enough: meaningful metrics for clouds Mogul & Wilkes, HotOS'19 It’s hard to define good SLOs, especially when outcomes aren’t fully under the control of any single party. The authors of today’s paper should know a thing or two about that: Jeffrey Mogul and John Wilkes at Google1! John Wilkes was also one … Continue reading Nines are not enough: meaningful metrics for clouds

Towards federated learning at scale: system design

Towards federated learning at scale: system design Bonawitz et al., SysML 2019 This is a high level paper describing Google’s production system for federated learning. One of the most interesting things to me here is simply to know that Google are working on this, have a first version in production working with tens of millions … Continue reading Towards federated learning at scale: system design

Software-defined far memory in warehouse scale computers

Software-defined far memory in warehouse-scale computers Lagar-Cavilla et al., ASPLOS'19 Memory (DRAM) remains comparatively expensive, while in-memory computing demands are growing rapidly. This makes memory a critical factor in the total cost of ownership (TCO) of large compute clusters, or as Google like to call them "Warehouse-scale computers (WSCs)." This paper describes a "far memory" … Continue reading Software-defined far memory in warehouse scale computers

Dynamic control flow in large-scale machine learning

Dynamic control flow in large-scale machine learning Yu et al., EuroSys'18 (If you don’t have ACM Digital Library access, the paper can be accessed either by following the link above directly from The Morning Paper blog site). In 2016 the Google Brain team published a paper giving an overview of TensorFlow, "TensorFlow: a system for … Continue reading Dynamic control flow in large-scale machine learning

Andromeda: performance, isolation, and velocity at scale in cloud network virtualization

Andromeda: performance, isolation, and velocity at scale in cloud network virtualization Dalton et al., NSDI'18 Yesterday we took a look at the Microsoft Azure networking stack, today it’s the turn of the Google Cloud Platform. (It’s a very handy coincidence to have two such experience and system design report papers appearing side by side so … Continue reading Andromeda: performance, isolation, and velocity at scale in cloud network virtualization

WSMeter: A performance evaluation methodology for Google’s production warehouse-scale computers

WSMeter: A performance evaluation methodology for Google’s production warehouse-scale computers Lee et al., ASPLOS'18 (The link above is to the ACM Digital Library, if you don’t have membership you should still be able to access the paper pdf by following the link from The Morning Paper blog post directly.) How do you know how well … Continue reading WSMeter: A performance evaluation methodology for Google’s production warehouse-scale computers