Stream of Alumina Particles Impacting an Aluminum Target (pressure plot at 110 microseconds)
Run time: 300,000 CPU hours
Technical contact: Vlad Georgevich
In the latter half of the 1990s, the maturation of parallel computing technology made it possible for the nation to contemplate the development of production-level 3D scientific applications requiring super-teraflop computational capability. In fact, the Stockpile Stewardship Program (SSP) led in identifying this potential and proposed the Accelerated Strategic Computing Initiative (ASCI) as its spearhead to enable certification, in conjunction with subcritical and other experiments and theory, in the absence of underground testing.
LLNL, as an institution, recognized that if one of its major programs was embarking on an adventure that had the potential to revolutionize scientific methods in the next century, the health of the institution depended on a science and technology (S&T) base that also had access to powerful ASCI-class computing environments. This strategic move kept the disciplines at the forefront and positioned LLNL as the pre-eminent simulation site today. From this notion was born Multiprogrammatic and Institutional Computing.
M&IC is truly institutional. Many directorates invest, and the institution invests. The growth of M&IC since 1997 has been significant, as shown in Figure 1 and Table 1; the total capacity currently available to M&IC scientists is about 550 TF/s.
Figure 1. Growth of M&IC computing power (in GF/s) from 1997-2009.
|Total Peak GF||72||98||279||972||1384||12594||12665||35647||35009||35009||81325||81325||410482|
Table 1. Growth of M&IC computing power (in GF/s) from 1997-2009.
The M&IC governance model is both grass roots and hierarchical. The "board of directors" (the Institutional Computing Executive Group, or ICEG) consists of well-known LLNL scientists who are qualified to identify deficiencies and request improvements. Typically, ICEG members are appointed by ADs in the various directorates. Hierarchically, M&IC management reports to the Director's Office, namely to the Deputy Director for S&T, who provides guidance relative to the institution's overall S&T goals and at the highest level manages allocations. Generally, it is not difficult to meet both the scientists' requests and the institution's, and this is a challenge that M&IC facilitates. Lest the investment levels highlighted in Figure 2 be viewed as excessive, we note that the M&IC environment is comparable to the best unclassified environments anywhere in the country, and the total investment at LLNL is only about $11 million per year. Such is the power of leverage and momentum from partnering with the Advanced Simulation and Computing (ASC) Program.
Figure 2. M&IC cost history (in $K), FY03-FY10.
The institution covers all the operational costs and also invests in the high performance computing (HPC) hardware. The programs and directorates invest only in the hardware. A share of the computing resource (called a bank) is correlated to the level of investment. The size of the bank is proportional to the level of investment. Access to the institution's banks is managed through an HPC request process, which depends on the size of the request. Smaller requests are awarded by the M&IC program office. Large requests are required to compete under the Grand Challenge process.
Because of strong and consistent investments, LLNL has the benefit of one of the most experienced and well-staffed scientific computing centers in the world. An investment in hardware is leveraged by attention from experienced integrators, operators, and services staff, and from a well-engineered foundation in networks and storage. All of this mitigates considerably the risks inherent in investing in the newest and best cost performance technologies.
Our platform strategy has been to straddle multiple technology curves to appropriately balance risk and benefit, following three complementary technology curves as shown in Figure 3. The first allows support for today's stockpile needs, the second delivers an affordable path to a future petaflop system, and the third provides a low-cost transition to the next generation of platform. M&IC investments have favored curve #2, open source commodity clusters. We believe that for the next 2–3-year cycle, clusters are the best solution for M&IC.
Figure 3. Platform strategy technology curves.