Flux: Building a Framework for Resource Management
Large computer centers that house dozens of large-scale systems with unique capabilities must have a method for efficiently scheduling their resources for use. In the case of Livermore Computing (LC), those resources include extremely large Linux clusters, such as the 2,688-node, 3.2-petaflop Jade, as well as myriad smaller support systems for generating, visualizing, analyzing, and storing data that are critical to fulfilling LLNL’s national security missions. LC developers have a long history of developing state-of-the-art software—including SLURM and its predecessors—that allows users to run and manage their simulation codes across multiple clusters. However, current resource and job management approaches cannot keep up with the challenges of increasing system scales and interplays, such as those that occur between compute clusters and file systems.
Flux is a next-generation resource and job management framework that expands the scheduler’s view beyond the single dimension of “nodes.” Instead of simply developing a replacement for SLURM and Moab, Flux offers a framework that enables new resource types, schedulers, and framework services to be deployed as data centers continue to evolve.
A resource manager tracks and monitors the hardware deployed in the data center, and then arbitrates access as customers submit work they would like to run. The job-scheduling algorithms must not only determine when and where resources that meet the user-specified requirements will be available, but also implement an allocation policy. Job placement in both space and time is critical to achieving efficient execution and getting the most work done for the time, power, and money spent. Flux addresses this issue by making smarter placement decisions and by offering greater flexibility and more opportunity for adaptation than current resource management software. These solutions help scientific researchers and computing users more effectively harness the power of LC capabilities. For example, with a holistic view of the data center’s input/output (I/O) bandwidth capability and utilization, Flux avoids the “perfect storm” of I/O operations that can occur when a naïve scheduler places I/O-intensive work without regard to I/O availability.
In Flux, each job is a complete instance of the framework, meaning the individual task can support parallel tools, monitoring, and even launch sub-jobs that are, like fractals, smaller images of the parent job. Because each job is a full Flux instance, users can customize Flux for use within their jobs. For example, a user desiring to launch many small, high-throughput jobs could submit a large, long-running parent job, and inside it load a specialized scheduler that is streamlined for high throughput. Panning outward in scale, schedulers operating at a larger granularity can move resources between child jobs as bottlenecks occur and employ pluggable schedulers for resource types that do not exist today.
“We are providing more capable resource management through hierarchical, multi-level management and scheduling schemes,” says Becky Springmeyer, LC division leader and Flux project leader. “Users benefit from pluggable schedulers with deeper knowledge of network, I/O, and power interconnections, and the ability to dynamically shape running work. One of the challenges we faced in designing Flux was making sure its framework was general and extensible enough to support resources and use cases that are only now emerging in research. Our team includes researchers in power-aware and I/O-aware scheduling.”
Flux was designed with input from system developers, computer science researchers, and end users, as well as external organizations that operate large computer centers. In addition, Livermore’s co-design efforts with code-development tools, such as STAT, Spindle, and TotalView, provide a highly scalable and composable code-development environment for LC users. Users can pick and choose the programming tools they need and seamlessly use them together under Flux’s framework. For example, users of the Kull code can scalably launch the application with Spindle and debug it using TotalView or STAT if necessary.
Flux is open-source software available to high performance computing centers around the world via the Flux collaboration space on GitHub. Flux developers have worked with the University of Delaware to develop the I/O-aware scheduling component of Flux, and the team is open to expanding research collaborations with other academic institutions for elements such as elastic resource and job management.