LLNL

EZSTORAGE

EZSTORAGE

Introduction

This manual introduces tools for effectively storing and archiving files from Livermore Computing (LC) computers by using the High Performance Storage System (HPSS), also referred to herein as "storage." Individual reference manuals provide detailed technical instructions on the tools and techniques introduced in EZSTORAGE: FTP, NFT, HPSS, HTAR, HSI, and Hopper. Additionally, the EZFILES document is a basic guide to using local directories and general file-handling software at LC.

For help, contact the LC Hotline at 925-422-4531 or via e-mail (OCF: lc-hotline@llnl.gov, SCF: lc-hotline@pop.llnl.gov).


Overview

Reliable, massive, archival data storage is a crucial part of any effective high-performance computing environment. Although the actual disk and tape resources for storing files at LC are large and elaborate, the user interface is constrained to use either the FTP daemon—the protocol for FTP clients or local alternatives to FTP clients, such as NFT and parallel FTP (PFTP), or the Hierarchical Storage Interface Gateway Daemon (hsigwd)—the protocol for HSI and HTAR. Hopper can use either protocol depending on user settings.

Moving files to and from LC production machines, open or secure, is a mainstream storage mission, easy to perform and very reliable. Using file storage in this context avoids quotas on user home directories, avoids purges of files on temporary work disks, and provides virtually unlimited capacity for managing data or computational output. Transfer rates are fast, and FTP connections are very reliable. Customized FTP-daemon interfaces to handle special storage needs (such as NFT for persistent storage transfers or HTAR for very efficiently making large archives directly in storage) are available, too.

Moving files to and from other LLNL machines is more complex. Features of special FTP clients, together with the need to protect unusual file formats during transfer to or from storage, may call for taking extra steps.

Finally, moving files to and from non-llnl.gov machines, such as computers at other sites or the workstations of distant ASC collaborators, is the most complicated of the three situations. It requires either using a two-stage process or running extra enabling software such as VPN (Virtual Private Network). This may involve running FTP twice, or using nonFTP transfers to an LC production machine before actually storing the files with FTP (run on an LC machine).

Storage Summary

This section briefly summarizes the chief storage system constraints and tells how to perform the most important file-storage tasks at LC.

Storage System Constraints

HPSS has the storage system constraints noted below.

Constraint Type Constraint Parameters
Largest allowed file size 100 TB (using FTP, NFT, HSI, or Hopper interface)
68 GB/member; 100 TB archive (using HTAR interface)
Longest file name 1023 characters (with HTAR, longest entry name or soft link is 100 characters)
Problem characters in file names
   Treated as file filters
   Forbidden first characters
   Forbidden in any position

? * {a,b}
- ! ~
* ? [ {

Common File Storage Commands

The following commands are used for common file storage tasks. These commands are also available graphically by using Hopper. To efficiently transfer a very large number of (related) files as a manageable archive or library, use HTAR.

Task FTP NFT HSI
Connect to storage ftp storage nft hsi
Make storage directory mkdir dir mkdir dir mkdir dir
Change storage directories cd dir cd dir cd dir
Store a file put file put file put file
Retrieve a stored file get file get file get file
Retrieve from within a stored archive See HTAR See HTAR See HTAR
Delete a stored file delete file delete file delete file
List stored files dir dir ls
Change file permissions chmod chmod chmod
Change "class of service" (COS) site setcos setcos chcos
Start migration of stored file from tape site stage file
Control file overwriting
   Prevent overwriting
   Allow overwriting

[default]

noclobber
clobber

[default]

Storage Home Directories

Regardless of their access software (FTP, NFT, etc.), LC users arrive at HPSS in their storage home directory. This always has a path name of the form

/users/u[00-54]/username

where username is your LC login name (for example: /users/u34/jsmith). This basic directory structure supports customized division into subdirectories (e.g., by using the mkdir command) as well as access control of stored files.

Accessing Storage

Accessing storage is most easily done from an LC production machine but can be done from non-LC machines and from offsite in some circumstances.

When onsite, NFT, FTP, HSI, and Hopper can be used to transfer between and LC production host and storage. If onsite but not on an LC production host, use FTP or Hopper to transfer files to or from storage.

Offsite users may access storage only if connected to the LC network via a VPN client or if their network has a trust relationship with LC's network. See https://access.llnl.gov/ for information regarding VPN. See Access Information for access prerequisites and additional information about accessing LC systems. Offsite users are limited to FTP and Hopper for accessing storage.

Copies in Storage

Some files may be so important to your project that you want to store separate, duplicate copies on independent storage media. LC's Open Computing Facility (OCF) and Secure Computing Facility (SCF) storage systems offer such dual-copy storage using the "class of service" (COS) concept.

The storage server(s) assign a COS to every incoming file based on the file's size and the client that writes it:

For more COS technical details, consult the SETCOS section of LC's NFT Reference Manual.

Storage Interfaces

FTP

FTP is the standard interface to HPSS. When you run FTP (on an OTP or Kerberos-passworded LC machine) with storage as the target host, access is "preauthenticated" and you are not prompted for your password. Also, on all LC production machines (but not necessarily on other LC machines), a parallel FTP client (equivalent to PFTP) is the default. All files that are 4 MB or larger automatically move to or from storage using parallel FTP.

For more information about the FTP file-transfer utility, consult the FTP Usage Guide.

Note: Because the storage FTP daemon (based on the HPSS version) behaves differently from the other LC FTP daemons (based on the WU FTP daemon), users should be aware that "m" commands (mdelete, mget, mput, etc.) may produce unintended results. These "m" commands process multiple files by using as their argument either an explicit file list or a file filter (an implicit file list specified with one or more UNIX wildcard or metacharacters.) The best method for checking the behavior is to type ls pattern where pattern is what will be used with the "m" command. If ls pattern returns something unexpected, the pattern should be reformulated.

NFT

NFT is a locally developed file transfer tool. Although NFT uses standard FTP daemons to carry out its file transfers, it offers enhanced features.

For a complete analysis of NFT syntax and special features, along with a thorough alphabetical command dictionary, consult the NFT Reference Manual

HSI

HSI provides a UNIX shell-style interface to HPSS and supports several of the commonly used FTP commands with the following differences:

See more detailed information about HSI usage, including a command dictionary.

Hopper

Hopper offers a graphical interface to storage and other LC resources, including support for simple drag-and-drop file-transfer services using FTP, NFT, HSI, HTAR, etc. By invoking Hopper, you can do many file-management tasks graphically, including:

More general background information on Hopper is available at the Hopper Web site. See "Getting Started" for instructions on how to download Hopper to your local desktop machine.

HTAR

On LC production machines (but not at other ASC sites), HTAR is a separate, locally developed utility program that serves as a special-purpose front end to the parallel FTP daemons for storage access. HTAR combines a flexible file bundling tool (like TAR) with fast parallel access (it acts as an alternative to the PFTP client) to storage that lets you store and selectively retrieve even very large sets of files very efficiently.

HTAR's enhanced features include the following:

Complete details about using HTAR are available in the HTAR Reference Manual.

Sharing Stored Files

Sharing some stored files with one or several other users is one of the most common storage goals. You may also want to consider using other file-sharing techniques available on LC production machines. Consult the File-Sharing Alternatives section of EZFILES for an overt analysis of several choices.

All sharing of stored files on LC's HPSS happens by means of storage groups. You and those with whom you want to share stored files must first find or create an LDAP (Lightweight Directory Access Protocol) storage group to which you all belong, assign the files to be shared and every parent directory of them to that common storage group, and open the file and directory permissions (of the whole tree) to allow group reads (executes, or writes). Please note that users on the Restrict Zone (RZ) may not share stored files in their home directory.

Using Storage Groups

A group is just a named set of users that agree among themselves to optionally allow (some of their) files to be readable, or even writable, by all group members. At LC, online groups (e.g., on either an AIX or Linux cluster) are obtained from LDAP. For most LC users, your online and storage groups will have the same name, and those groups will have the same sets of members online and in HPSS. However, a file loses its group status at the time you store it, so you must arrange the sharing of stored files by working exclusively with groups. For basic information about using groups, see the Using Groups section of EZFILES.

Setting Stored-File Permissions by Group

Once you have the files you want to share and the name of a group to whom all sharing users belong (see the previous subsection), you can follow these steps, all involving (somewhat unusual) FTP commands, to enable the sharing of stored files.

  1. Open an FTP session to storage.

         ftp storage

  2. Create a storage directory to hold the shared files. In this example, the shared-files directory is called "share" and the shared file is called "share.code." In your FTP session type

         mkdir share

  3. Assign your storage home directory to the share group. For example, if your default arrival directory in storage is /users/u34/jfk and if the storage group containing all the file-sharing users is "sgroup," then use this FTP command

         chgrp sgroup /users/u34/jfk

    to associate the two. One side effect is that you cannot share with two different groups at once. (You can also change storage groups for any of these steps by using the special CHGRPSTG tool.)

  4. Assign your file-sharing directory to the share group. Because you made the share directory as a child of /users/u34/jfk in step 2, you can now associate it, too, with the file-sharing storage group sgroup:

         chgrp sgroup share

  5. Assign group permisssions to the file-sharing directory. To allow other members of storage group sgroup to read, write, and execute (list) the file(s) in the share directory, use this FTP command

         chmod 775 share

    to expand its default group permissions. (You can also change storage permissions for any of these steps by using the special CHMODSTG tool.)

  6. Store the files to be shared. If you move (cd) to the file-sharing directory and put the file(s) to be shared, they will lose their online permissions but they will arrive associated with the share group sgroup, which they inherit from the file-sharing directory:

         cd share
         put share.code
         put ... [if there are more files to share]

  7. Assign group permissions to the file(s) to be shared. Even if their online permissions allowed sharing by group, storing the file(s) erased those decisions. So as with step 5 above, you need to declare the availability of each file to the members of sgroup:

         quote site chmod 775 share.code

Reading Shared Stored Files

After you have used the previous two subsections to enable others in storage group sgroup to share the file(s) in the share directory, they can follow these steps to retrieve those file(s):

     ftp storage
     cd /users/u34/jfk/share
     get share.code

Note that attempts to directly get the file /users/u34/jfk/share.code (while in another storage directory) may misleadingly fail with the message "no such file or directory."

Storage Assistance Tools

LC's production machines offer three public user-developed programs to more conveniently handle three common storage tasks. These special storage tools and their roles are:

All are located in /usr/local/bin on the machines where they they have been installed, so most users can run them just by typing their names. Note: Because all three storage-assistance tools are really Perl scripts, they yield very verbose and confusing error messages if you happen to run them when the LC storage system (either open or secure) is offline.

List Stored Files (lstorage)

lstorage lists your storage directories and the files that they contain. To run lstorage on the LC production machines where it is installed, type:

lstorage [options] [dirnames]

By choice of lstorage options you can specify output format (single or multiple columns), output scope (local or recursive), and level of detail (names only or other information, too). Without a specified directory, lstorage reports on your top-level ("home") storage directory. Without options, lstorage lists (only) the names of files and directories contained in the specified storage directory, in multiple columns. If you specify a space-delimited list of several target storage directories (all names relative to your home storage directory), lstorage reports on each one in the order in which you listed them on the execute line.

lstorage Options

Scope Options

-a
lists all directories and files, including those whose names begin with a dot (.). Note: Listing stored files such as .cshrc is default behavior for lstorage even without invoking -a; with -a invoked the list still omits the single and double dot (. and ..) entries.
-l
lists in long format, with details on the permissions and groups for every storage directory or stored file covered in the report.
-R
recursively includes all the subdirectories and stored files of the directory specified on the execute line (compare with -j).

Format Options

-C
(default) lists storage directories and stored files in multiple columns with entries sorted down the columns.
-j
lists storage directories and stored files recursively (entails -R) in a single column with nesting revealed by extra indenting (names only).
-h
displays the lstorage help package (a brief list of options). Help cannot be combined with any other options.
-t sss
sets the lstorage timeout to sss seconds (default timeout is 300 seconds).

Change Storage Permissions (chmodstg)

chmodstg changes the permissions on your storage directories or your stored files. To run chmodstg on the LC production machines where it is installed, type:

chmodstg [options] [dirname]

By choice of chmodstg options you can specify the desired permissions for a specific storage directory, a specific stored file, all files in a directory, or (recursively) all children of a specific directory to all levels. You can also specify uninterrupted, noninteractive changes or instead request interactive prompting for your desired permissions and files (with optional report on each change made). Without a specified directory, chmodstg acts on your top-level ("home") storage directory. Without permission-related options (e.g., chmodstg -R dir1), chmodstg prompts for your desired directory and file permissions and then changes both with no confirmation.

chmodstg accepts permissions as either three-digit octal numbers (exactly three digits, no spaces) or as a comma-delimited list of symbolic triples (e.g., u+x,g-w) built up from the UNIX components [augo], [+-], and [rwx].

chmodstg Options

Permission Options

-Dperm
specifies (in either octal or symbolic format) the UNIX permissions perm to assign to every storage directory (but not stored files) that chmodstg treats during this run, as selected by other options. This disarms the directory-permissions prompt.
-Fperm
specifies (in either octal or symbolic format) the UNIX permissions perm to assign to every stored file (but not directories) that chmodstg treats during this run, as selected by other options. This disarms the file-permissions prompt.

Scope Options

-d
changes directory permissions only (omits files). chmodstg prompts you for the desired permissions. The default without -f or -d is to change both.
-f
changes file permissions only (omits directories). chmodstg prompts you for the desired permissions. The default without -f or -d is to change both.
-R
recursively includes all the subdirectories and stored files of the directory specified on the execute line. You can combine -R with other options (except -s) to further control chmodstg's scope of action.
-s pathname
changes permissions only for the one directory or file specified by its pathname (relative to your home storage directory). Using -s disables all other chmodstg options except -v, so chmodstg always prompts for your desired permissions even if you include -F or -D on the execute line.

Interaction Options

-h
displays the chmodstg help package (a brief list of options). Help cannot be combined with any other options.
-i
prompts for your yes/no confirmation for every directory or stored file that chmodstg tries to change (regardless whether you also want prompting for desired permissions). Any response except YES is treated as NO; you cannot supply different permissions for different files by using -i.
-s
is a scope option (see above) but always behaves interactively, even if you try to disable its prompts.
-v
interactively reports the permission change made for every directory or stored file that chmodstg changes (e.g., "changed from 650 to 700"). You can combine -v with chmodstg's various prompting options, or use it for confirmations even without prompts.

Change Storage Groups (chgrpstg)

chgrpstg changes the group for your storage directories or your stored files. To run chgrpstg on the LC production machines where it is installed, type:

chgrpstg [options] groupname [dirname]

There is no prompt or default for the desired groupname, which you must specify on every chgrpstg execute line. To discover your current groups, type the groups command on LC production machines. Most chgrpstg invocations run noninteractively, but you can request prompting or confirmatory reports, alone or together with recursive execution. Without a specified directory, chgrpstg acts on your top-level ("home") storage directory. Without options, chgrpstg changes the group for one "layer" in your storage hierarchy (for every member of a specified directory but not the directory itself nor the children of its subdirectories).

chgrpstg Options

Scope Options

-d
changes directory groups only (omits files). The default without -f or -d is to change both.
-f
changes file groups only (omits directories). The default without -f or -d is to change both.
-R
recursively includes all the subdirectories and stored files of the directory specified on the execute line. You can combine -R with other options (except -s) to further control chgrpstg's scope of action.
-s groupname pathname
changes groups only for the one directory or file specified by its path name (relative to your home storage directory). Using -s disables all other chgrpstg options except -v. Note the syntax difference from chmodstg: here, the group name precedes the path name immediately after -s.

Interaction Options

-h
displays the chgrpstg help package (a brief list of options). Help cannot be combined with any other options.
-i
prompts for your yes/no confirmation for every directory or stored file that chgrpstg tries to change. Any response except YES is treated as NO; you cannot supply different groups for different files by using -i.
-s
is a scope option (see above) but always behaves interactively, even if you try to disable its prompts.
-v
interactively reports the group change made for every directory or stored file that chgrpstg changes (e.g., "changed from oldgrp to newgrp"). You can combine -v with chgrpstg's -i prompting option or use it for confirmations even without prompts.