Computer scientist Chris Morrone, a life-long computer and electronics enthusiast, fondly recalls the fun he had as a kid, typing BASIC commands into his first computer, a classic Commodore 64. Chris may have moved from programming an ʼ80s home computer with 200 kilobytes of storage on an audio cassette tape to creating software for the awesome 23,040-hard-drive, 55-petabyte storage system that supports Sequoia, LLNL’s newest supercomputer, but what hasn’t changed is his fascination with understanding how computers work—and how they can work better.
These days, Chris leads Livermore Computing’s (LC’s) Lustre Development Team, a group responsible for developing and supporting the high performance Lustre parallel file systems used to store application data sets and checkpoint files for many of LC’s large-scale systems, including Sequoia. Lustre delivers global data access and can be mounted across multiple compute clusters simultaneously. Chris notes that users generally take these complex file systems for granted—until something unexpected happens. “When something does go wrong, error handling and failure recovery are very important,” he adds.
Sequoia’s file system and its Lustre software, collectively called Grove, serve as the first stop for the flood of simulation data coming from the 20-petaFLOP/s Sequoia platform. Grove’s size and bandwidth rate (at least 750 gigabytes per second) are unprecedented. To scale file system capabilities to meet the requirements of Sequoia, Chris and his team of software developers, in conjunction with other LC personnel and third-party developers, engaged in a multi-year project to replace much of Grove’s Lustre underpinnings with the Zettabyte File System (ZFS).
With the project completed in late 2012, Chris and his colleagues continue to discover new benefits to the effort as they test and tune the revamped Grove back end. In addition to improved data-moving efficiency, ZFS increases uptime and enables more efficient data storage. The availability, accessibility, and reliability of Grove’s data will be critical to the success of the missions that rely on this world-class computer.
Chris also leads a working group within the Open Scalable File Systems organization, a nonprofit group cofounded by LLNL that promotes collaboration between entities deploying Lustre file systems on leading HPC systems, communicates future requirements to the Lustre developers, and funds Lustre projects.
Chris speaks regularly at conferences and Lustre user meetings, too. Working extensively with open-source software such as Lustre and engaging with the broader open-source community is what Chris enjoys most about his job, and it is one of the features that attracted him to LLNL in the first place.
Looking ahead, Chris is very interested in the exascale effort and developing solutions for scaling file systems to meet future computing needs. Says Chris, “We’re currently hobbled by interfaces—many of them work well with a single computer but with a large parallel machine, the situation becomes more challenging. We also need to know how to handle failures in an exascale environment. The path forward is still up in the air. We’re working with many other labs and industry partners to find a common solution.”
—Rose Hansen