In order to accurately show the HPSS capabilities delivered to ASCI users today, the HPSS systems used in the demonstrations were in full production and were serving production users concurrent with these Milepost tests. ASCI Snow and Frost platforms were used to demonstrate basic functionality, and the White platform was used to demonstrate this functionality at scale and under load.
A total of five demonstrations were developed. In each demonstration, a series of like-sized files are transferred from the application platform to an HPSS production archive using the PFTP interface. The tests varied in the following ways:
- File size: Files were either large (2GB) or small (20MB).
- Concurrency: From two to six files were transferred at a time (concurrently).
- Platform: The ASCI application platforms used were Snow, Frost and White.
- Target HPSS system: Depending on the application platform, production SCF (White) or OCF (Frost and Snow) HPSS systems were used concurrently with regular production user loads.
- Duration: The number of files to be transferred was calculated to yield a test duration of two hours. A simple miscalculation doubled this length for one test (four hours), but we let the test run anyway to demonstrate extended stability.
- GPFS load: One test was intentionally run concurrently with an introduced heavy platform GPFS load.
- Networks: The number and type of network connections to HPSS varied depending on the application platform used. The Snow demonstration used a single 100Mb Ethernet connection. The Frost demonstration used two Jumbo Gigabit Ethernet links. The White demonstrations used eight Jumbo Gigabit Ethernet links.
- Application Nodes: The number of "login nodes" used to run PFTP varied by platform from one node on Snow and Frost, to two on White.
The following table details the demonstrations:
Demo File Size Application Host Concurrent Transfers GPFS Load Duration 1 2 GB Snow 2 No 4 hrs 2 2 GB Frost 3 No 2 hrs 3 2 GB White 6 No 2 hrs 4 20 MB White 6 No 2 hrs 5 2 GB White 6 Yes 2 hrs
All runs were made with the enhanced parallel ftp client in parallel mode. The default pwidth was 4 and the default pblocksize was 1,048,576 bytes. All files resided in GPFS on the corresponding systems.
Machines: Snow to OCF HPSS Storage Network: 100 megabit ethernet Date/time: November 23, 12:06:38 pm to 3:59:33 pm Duration: 3 hours 52 minutes 55 seconds Number of files: 80 File size: 2 gigabytes Login nodes: 1 Concurrent sessions: 2 Total data moved: 0.16 terabytes Average throughput: 11.5 megabytes per second Comment: Limited by network bandwidth at 92%
Machines: Frost to OCF HPSS Storage Network: 1 gigabit jumbo frame ethernet, 2 links/layers Date/time: November 26, 1:04:29 pm to 2:57:50 pm Duration: 1 hours 53 minutes 21 seconds Number of files: 216 File size: 2 gigabytes Login nodes: 1 Concurrent sessions: 3 Total data moved: 0.432 terabytes Average throughput: 63.5 megabytes per second Comment: Limited by OCF HPSS Storage disk bandwidth
Machines: White to SCF HPSS Storage Network: 1 gigabit jumbo frame ethernet, 4 links/layers Date/time: December 4, 8:56:12 am to 10:42:41 am Duration: 1 hours 46 minutes 29 seconds Number of files: 540 File size: 2 gigabytes Login nodes: 2 Concurrent sessions: 6 Total data moved: 1.08 terabytes Average throughput: 169.0 megabytes per second Comment: Projected to 2 hours, 1.2 terabytes would have moved
Machines: White to SCF HPSS Storage Network: 1 gigabit jumbo frame ethernet, 4 links/layers Date/time: December 8, 12:00:39 pm to 1:58:56 pm Duration: 1 hours 58 minutes 17 seconds Number of files: 13,947 File size: 20 megabytes Login nodes: 2 Concurrent sessions: 6 Total data moved: 0.279 terabytes Average throughput: 39.3 megabytes per second Comment: Less than 1/4 of the throughput because of small files
Machines: White to SCF HPSS Storage Network: 1 gigabit jumbo frame ethernet, 4 links/layers Date/time: December 8, 2:00:46 pm to 3:59:36 pm Duration: 1 hours 58 minutes 50 seconds Number of files: 498 File size: 2 gigabytes Login nodes: 2 Concurrent sessions: 6 Total data moved: 0.996 terabytes Average throughput: 140 megabytes per second Comment: Introduced heavy load on GPFS yielding a 17% drop in performance compared to Run 3
The results of the HPSS Milepost runs closely mirror performance and stability delivered to ASCI platform users on a daily basis. Following are comparisons of four Milepost runs to typical HPSS production:
Comparison 1: Milepost Run4 vs White
small file load.
On October 19, 2000, a single user transferred 1,600 small files from White to SCF storage mimicking Milepost Run 4. The user sustained an average of 41MB/sec over the transfers, exceeding the 39MB/sec performance measured in Run 4.
Comparison 2: Milepost Run 3 vs
White large file performance measurements.
In mid-December 2000, a series of performance tests between White and the production SCF HPSS system were run. The graph below shows the results of these runs. At a 2GB file size and six concurrent sessions, the production performance test achieved 222MB/sec, exceeding the 169MB/sec achieved in Milepost Run 3 which used the same file size and number of sessions.
Comparison 3: Milepost Run 2 vs
Frost user offload.
During the last two weeks of January 2001, users moved 8.8 Terabytes of into the OCF HPSS system. One LANL user on Frost transferred 1.3 Terabytes in 91 files (filesizes ranged from 1.9 to 35.7GB) to HPSS in 7 hours and 12 minutes at a sustained rate of 50.5 MB/sec. While this is slightly less than Run 2 performance (64MB/sec) it was concurrent with an extremely heavy offload of Frost data and used the NFT user interface rather than PFTP.
Comparison 4: Milepost Run 5 vs
White large file offload.
During the first two days of February 2001, a number of very large files were stored from White to the SCF HPSS system. While the numbers presented below signify single file performance, not aggregate, it shows the per-file transfer rates provided to users storing large files from White.
User File Size (Gigabytes) Per File Transfer Rate (MB/sec) User1 239.7 67.4 User2 221.0 74.5 User3 21.5 83.7 User4 8.8 84.2
We believe that the above comparisons show that the HPSS Milepost runs were demonstrative of the service that HPSS provides to ASCI platform users.
The High Performance Storage System (HPSS) is a large collaborative software development project, begun in 1993 as a Cooperative Research and Development Agreement (CRADA) between government and industry. The HPSS collaboration is based on the premise that no single organization has the experience and resources to meet all the extreme challenges represented by the growing storage system I/O, capacity and functionality imbalances present in high-performance computing environments such as ASCI.
For the past five years, the ASCI PSE Archival Storage Project has been the primary funding agent for HPSS and has focused the requirements, design, development and Tri-Lab deployment of HPSS. Tri-Lab PSE developers lead the design and development of most critical HPSS software components and their efforts ensure that ASCI user priorities are accurately represented in new HPSS releases. Testbed and production HPSS systems are deployed and supported in open and secure computing environments at all three Tri-Lab sites.
While the PSE provides major support, over 20 organizations have contributed to the success of HPSS and IBM markets the HPSS system. This large successful collaboration has garnered an R&D 100 Award, official ISO 9001 certification and SEI Capability Maturity Model (CMM) Level 3 status. It is however the requirements of the ASCI program for file, capacity and performance scalability as well as security that continue to drive the HPSS Project. As ASCI code development teams and their applications continue to raise the bar for HPSS, PSE developers continue to respond to these challenges.
The most obvious and direct benefits of PSE's involvement in, and funding of, the HPSS Project are made evident by the PSE Milepost runs. These runs demonstrated HPSS providing ASCI codes and code developers with a scalable, stable, high performance archive in which the results, images and work products generated by ASCI multi-TeraOp machines could be stored. This archive not only provides permanent and safe storage of the fruits of the ASCI computational investment, but also frees up platform disk resources allowing codes and users to continue generating stockpile simulations while moving data to more cost effective storage media.
As mentioned above the ASCI Program's extreme storage requirements require the performance, security and scalability provided by HPSS. The ability of the ASCI program to directly and immediately influence HPSS requirements based on unique ASCI user and security needs can not be underestimated. At the same time, the ASCI Program directly benefits by leveraging the substantial investment and expertise that IBM, Sun, StorageTek and other HPSS development partners bring to the HPSS product and its functionality. This collaboration and its product directly serve ASCI users around-the-clock, 365 days a year at all three ASCI laboratories.