Ben Hindman

visitor-photo

The Case for Multi-Level Scheduling

May 20, 2016 at 3:30pm
CSE 691 (Gates Commons)

Abstract

Existing research has shown the benefits of using multi-level schedulers, either for single node parallel computation or multi-node distributed computation. But, there are some important practical considerations that must be addressed in order to use these multi-level scheduling architectures in multi-user production environments. In this presentation we’ll discuss these practical considerations through lessons learned deploying Apache Mesos, a 2-level distributed scheduling system that has been used in organizations such as Twitter, PayPal, and Apple. We’ll first highlight the multi-level scheduling systems that influenced Mesos as well as describe the 2-level Mesos architecture in detail. We’ll then focus on the 1st-level scheduler of Mesos and the efficient multi-resource fair-sharing algorithm that it employs. Finally, we’ll discuss the extensions that have been added over the years (or are being added today) driven by practical needs, from weights, to reservations, to quotas, to optimistic allocations, and deallocation that make Mesos a practical kernel for running distributed systems in production.

Despite what the abstract may suggest, the talk will be interesting to PL folks, in addition to systems folks.

Bio

Ben is the co-creator of Mesos (while a PhD student at Berkeley) and the co-founder of Mesosphere (after leading the use of Mesos at Twitter). Far more importantly :), Ben is our undergraduate alum and former active participant in PLSE under a previous group name – 590P participant, 505 graduate, star TA, and so forth.
Ben will be in town to receive the College of Engineering’s 2016 Early Career Diamond Award.

Talk