Large-scale parallel computing systems generate massive amounts of data every second, and file systems are crucial for handling data transfer at LLNL’s HPC center. Livermore Computing’s (LC’s) Scalable Storage group helps manage the hardware and software necessary to keep file systems—and therefore supercomputers—running smoothly and efficiently. One key tool in the group’s portfolio is the ZFS (Zettabyte File System) project, which controls input/output (I/O) operations and optimizes storage volume capacity.
The ZFS origin story and the project’s evolution over time are testaments to the benefits of open-source software. Developed at Sun Microsystems, Inc., in the early 2000s for the Solaris operating system, ZFS caught the Lab’s attention because of its advanced storage features and scalable design. LC developers set about adapting the software for the Linux-based Sequoia supercomputer, and a new version—ZFS on Linux—was born in 2011. The public release generated significant interest within the Linux community, and many external contributors completed the porting work and tested ZFS on various types of Linux hardware.
With expanded capabilities for failover protection, data integrity, and Lustre compatibility, ZFS on Linux has been used on most of LLNL’s commodity clusters and El Capitan’s early access systems. The software effectively converts the concurrent, random writes arriving at an object storage target to a stream of faster, less resource-intensive sequential writes, which is especially important on large systems with high data storage rates. Lustre-supported ZFS on Linux also uses advanced caching technology and solid-state disks to improve read performance and incorporates robust data integrity-checking features for increased system availability and reliability. As one example, ZFS on Linux facilitated 1 terabyte per second of data transfer with the Grove file system for all of Sequoia’s 8-year run. This bandwidth achievement cost less than half of a file system built with standard Lustre components.
As computing systems grow, so too must file systems. In 2013, the original ZFS project and ZFS on Linux evolved into what is now known as OpenZFS, which is maintained by a global developer community that includes LC staff. The Scalable Storage group has adapted OpenZFS for the Lab’s needs, especially as new generations of Lustre-based HPC systems—including the upcoming El Capitan exascale supercomputer—are designed and installed.
One of the project’s adaptations improves upon traditional RAID (Redundant Array of Independent Disks) technology, which provides resilience against individual disk failures. Nicknamed dRAID, the distributed storage upgrade speeds up recovery from disk failure—a task that traditionally involves replacing the disk and rebuilding its missing data—by spreading a virtual spare disk across all of a system’s hard drives. This process significantly decreases the time to rebuild and, therefore, the time the system operates with reduced redundancy.
OpenZFS’s dRAID layer is already implemented on LC’s Adaptable Storage Platform (ASP), which began production deployment in late 2021. ASP’s novel, multipurpose architecture includes Lustre file system clusters, HPSS disk caches, network attached storage appliances, and a flexible number of other storage solutions. To support ASP, OpenZFS must remain highly configurable and capable of efficiently supporting a wide range of workloads. As more ASP systems come online, and as LC prepares for El Capitan, the Scalable Storage group continues to develop OpenZFS and integrate LC’s use cases into the open-source software repository.