SmoothOperator: reducing power fragmentation and improving power utilization in large-scale datacenters

SmoothOperator: reducing power fragmentation and improving power utilization in large-scale datacenters Hsu et al., ASPLOS'18 What do you do when your theory of constraints analysis reveals that power has become your major limiting factor? That is, you can’t add more servers to your existing datacenter(s) without blowing your power budget, and you don’t want to … Continue reading SmoothOperator: reducing power fragmentation and improving power utilization in large-scale datacenters

Popularity predictions of Facebook videos for higher quality streaming

Popularity prediction of Facebook videos for higher quality streaming Tang et al., USENIX ATC’17 Suppose I could grant you access to a clairvoyance service, which could make one class of predictions about your business for you with perfect accuracy. What would you want to know, and what difference would knowing that make to your business? … Continue reading Popularity predictions of Facebook videos for higher quality streaming

SVE: Distributed video processing at Facebook scale

SVE: Distributed video processing at Facebook scale Huang et al., SOSP’17 SVE (Streaming Video Engine) is the video processing pipeline that has been in production at Facebook for the past two years. This paper gives an overview of its design and rationale. And it certainly got me thinking: suppose I needed to build a video … Continue reading SVE: Distributed video processing at Facebook scale

Canopy: an end-to-end performance tracing and analysis system

Canopy: an end-to-end performance tracing and analysis system Kaldor et al., SOSP’17 In 2014, Facebook published their work on ‘The Mystery Machine,’ describing an approach to end-to-end performance tracing and analysis when you can’t assume a perfectly instrumented homogeneous environment. Three years on, and a new system, Canopy, has risen to take its place. Whereas … Continue reading Canopy: an end-to-end performance tracing and analysis system

DQBarge: Improving data-quality tradeoffs in large-scale internet services

DQBarge: Improving data-quality tradeoffs in large-scale Internet services Chow et al. OSDI 2106 I'm sure many of you recall the 2009 classic "The Datacenter as a Computer," which encouraged us to think of the datacenter as a warehouse-scale computer. From being glad simply to have such a computer, the bar keeps on moving. We don't … Continue reading DQBarge: Improving data-quality tradeoffs in large-scale internet services

Kraken: Leveraging live traffic tests to identify and resolve resource utilization bottlenecks in large scale web services

Kraken: Leveraging live traffic tests to identify and resolve resource utilization bottlenecks in large scale web services Veeraraghavan et al. (Facebook) OSDI 2016 How do you know how well your systems can perform under stress? How can you identify resource utilization bottlenecks? And how do you know your tests match the condititions experienced with live … Continue reading Kraken: Leveraging live traffic tests to identify and resolve resource utilization bottlenecks in large scale web services