Medea: scheduling of long running applications in shared production clusters

June 13, 2018 ~ Adrian Colyer ~ Leave a comment

Medea: scheduling of long running applications in shared production clusters Garefalakis et al., EuroSys'18 (If you don’t have ACM Digital Library access, the paper can be accessed either by following the link above directly from The Morning Paper blog site). We’re sticking with schedulers today, and a really interesting system called Medea which is designed ... Continue Reading

Optimus: an efficient dynamic resource scheduler for deep learning clusters

June 12, 2018 ~ Adrian Colyer ~ Leave a comment

Optimus: an efficient dynamic resource scheduler for deep learning clusters Peng et al., EuroSys'18 (If you don’t have ACM Digital Library access, the paper can be accessed either by following the link above directly from The Morning Paper blog site). It’s another paper promising to reduce your deep learning training times today. But instead of ... Continue Reading

Apache Hadoop YARN: Yet another resource negotiator

January 9, 2017November 11, 2019 ~ Adrian Colyer ~ 2 Comments

Apache Hadoop YARN: Yet Another Resource Negotiator Vavilapalli et al., SoCC 2013 The opening section of Prof. Demirbas' reading list is concerned with programming the datacenter, aka 'the Datacenter Operating System' - though I can't help but think of Mesosphere when I hear that latter phrase. There are four papers: in publication order these are ... Continue Reading

Morpheus: Towards automated SLOs for enterprise clusters

December 1, 2016November 11, 2019 ~ Adrian Colyer ~ 7 Comments

Morpheus: Towards automated SLOs for enterprise clusters Jyothi et al. OSDI 2016 I'm really impressed with this paper - it covers all the bases from user studies to find out what's really important to end users, to data-driven engineering, a sprinkling of algorithms, a pragmatic implementation being made available in open source, and of course, ... Continue Reading

Firmament: Fast, centralized cluster scheduling at scale

November 30, 2016November 11, 2019 ~ Adrian Colyer ~ 7 Comments

Firmament: Fast, centralized cluster scheduling at scale Gog et al. OSDI' 16 Updated link to point to official usenix hosted version As this paper demonstrates very well, cluster scheduling is a tricky thing to get right at scale. It sounds so simple on the surface: "here are some new jobs/tasks - where should I run ... Continue Reading

HCloud: Resource-efficient provisioning in shared cloud systems

May 26, 2016 ~ Adrian Colyer ~ 2 Comments

HCloud: Resource-efficient provisioning in shared cloud systems - Delimitrou & Kozyrakis, ASPLOS '16 Do you use the public cloud? If so, I'm pretty confident you're going to find today's paper really interesting. Delimitrou & Kozyrakis study the provisioning strategies that provide the best balance between performance and cost. The sweet spot it turns out, is ... Continue Reading

The Linux Scheduler: a Decade of Wasted Cores

April 26, 2016 ~ Adrian Colyer ~ 30 Comments

The Linux Scheduler: a Decade of Wasted Cores - Lozi et al. 2016 This is the first in a series of papers from EuroSys 2016. There are three strands here: first of all, there's some great background into how scheduling works in the Linux kernel; secondly, there's a story about Software Aging and how changing ... Continue Reading

Universal Packet Scheduling

March 22, 2016 ~ Adrian Colyer ~ 6 Comments

Universal Packet Scheduling - Mittal et al. 2015 (presented at NSDI '16) Is there a universal scheduling algorithm, such that simply by changing its configuration parameters, we can produce any desired schedule? In Universal Packet Scheduling, Mittal et al. show us that in theory there can be no Universal Packet Scheduling (UPS) algorithm which achieves ... Continue Reading

Split-Level IO Scheduling

October 28, 2015 ~ Adrian Colyer ~ 1 Comment

Split-Level IO Scheduling - Yang et al. 2015 The central idea in today's paper is pretty simple: block-level I/O schedulers (the most common kind) lack the higher level information necessary to perform write-reordering and accurate accounting, whereas system-call level schedulers have the appropriate context but lack the low-level knowledge needed to build efficient schedulers - ... Continue Reading

Cloud Computing Resource Scheduling and a Survey of its Evolutionary Approaches

October 2, 2015 ~ Adrian Colyer ~ Leave a comment

Cloud Computing Resource Scheduling and a Survey of its Evolutionary Approaches - Zhan et al. 2015 In both academia and industry, the problem of cloud resource scheduling is seen to be as hard as a Nondeterministic Polynomial (NP) optimization problem, that is, an NP-hard problem, whose intractability increases exponentially with the number of variables if ... Continue Reading

the morning paper

a random walk through Computer Science research, by Adrian Colyer
Made delightfully fast by strattic

Scheduling