SmoothOperator: reducing power fragmentation and improving power utilization in large-scale datacenters

April 27, 2018 ~ Adrian Colyer ~ Leave a comment

SmoothOperator: reducing power fragmentation and improving power utilization in large-scale datacenters Hsu et al., ASPLOS'18 What do you do when your theory of constraints analysis reveals that power has become your major limiting factor? That is, you can’t add more servers to your existing datacenter(s) without blowing your power budget, and you don’t want to ... Continue Reading

Popularity predictions of Facebook videos for higher quality streaming

November 28, 2017 ~ Adrian Colyer ~ Leave a comment

Popularity prediction of Facebook videos for higher quality streaming Tang et al., USENIX ATC’17 Suppose I could grant you access to a clairvoyance service, which could make one class of predictions about your business for you with perfect accuracy. What would you want to know, and what difference would knowing that make to your business? ... Continue Reading

SVE: Distributed video processing at Facebook scale

November 27, 2017 ~ Adrian Colyer ~ Leave a comment

SVE: Distributed video processing at Facebook scale Huang et al., SOSP’17 SVE (Streaming Video Engine) is the video processing pipeline that has been in production at Facebook for the past two years. This paper gives an overview of its design and rationale. And it certainly got me thinking: suppose I needed to build a video ... Continue Reading

Canopy: an end-to-end performance tracing and analysis system

November 22, 2017 ~ Adrian Colyer ~ 1 Comment

Canopy: an end-to-end performance tracing and analysis system Kaldor et al., SOSP’17 In 2014, Facebook published their work on ‘The Mystery Machine,’ describing an approach to end-to-end performance tracing and analysis when you can’t assume a perfectly instrumented homogeneous environment. Three years on, and a new system, Canopy, has risen to take its place. Whereas ... Continue Reading

DQBarge: Improving data-quality tradeoffs in large-scale internet services

December 9, 2016November 11, 2019 ~ Adrian Colyer ~ Leave a comment

DQBarge: Improving data-quality tradeoffs in large-scale Internet services Chow et al. OSDI 2106 I'm sure many of you recall the 2009 classic "The Datacenter as a Computer," which encouraged us to think of the datacenter as a warehouse-scale computer. From being glad simply to have such a computer, the bar keeps on moving. We don't ... Continue Reading

Kraken: Leveraging live traffic tests to identify and resolve resource utilization bottlenecks in large scale web services

November 28, 2016November 11, 2019 ~ Adrian Colyer ~ 4 Comments

Kraken: Leveraging live traffic tests to identify and resolve resource utilization bottlenecks in large scale web services Veeraraghavan et al. (Facebook) OSDI 2016 How do you know how well your systems can perform under stress? How can you identify resource utilization bottlenecks? And how do you know your tests match the condititions experienced with live ... Continue Reading

Realtime data processing at Facebook

July 11, 2016 ~ Adrian Colyer ~ 4 Comments

Realtime Data Processing at Facebook Chen et al. SIGMOD 2016 ‘Realtime Data Processing at Facebook’ provides us with a great high-level overview of the systems Facebook have built to support real-time workloads. At the heart of the paper is a set of five key design decisions for building such systems, together with an explanation of ... Continue Reading

Gorilla: A fast, scalable, in-memory time series database

May 3, 2016 ~ Adrian Colyer ~ 16 Comments

Gorilla: A fast, scalable, in-memory time series database - Pelkonen et al. 2015 Error rates across one of Facebook's sites were spiking. The problem had first shown up through an automated alert triggered by an in-memory time-series database called Gorilla a few minutes after the problem started. One set of engineers mitigated the immediate issue. ... Continue Reading

Memory Networks

March 10, 2016 ~ Adrian Colyer ~ 14 Comments

Memory Networks Weston et al. 2015 As with the Neural Turing Machine that we look at yesterday, this paper looks at extending machine learning models with a memory component. The Neural Turing Machine work was developed at Google by the DeepMind team, today's paper on Memory Networks was developed by the Facebook AI Research group. ... Continue Reading

the morning paper

a random walk through Computer Science research, by Adrian Colyer
Made delightfully fast by strattic

Facebook