Managing Change | Access
Policy | Configuring and Accessing the GDO | Host
Logging In | Interacting with LC and Other
Yellow Network Resources | Key Directories
Building and Installing Project Software | Moving
Data To and From the GDO
GDO Network | Data Migration | Data
Storage | Controlling External Access to Data
Receiving External Data and Making It Available | Operational
Green Data Oasis (GDO) is a large data store (620 TB) on the unrestricted LLNL network.
It is intended to facilitate the sharing of scientific data with external collaborators.
Support of these capabilities requires the following activities:
Program Leaders, the GDO project leader, LC management, and the Laboratory
Science and Technology Office (LSTO) will manage policies for usage and access
to the GDO system. The baseline set of procedures, tools, and features will be
proposed, discussed, defined, agreed upon, and implemented as part of the GDO
GA Release process.
We expect the GDO system to be heavily used, given the large number of projects
with data sharing needs and the large amounts of data to be shared. To accommodate
equitable access to the LLNL community, GDO accounts and disk allocations will
be granted through an informal Lab-wide process, with decisions made by the M&IC
Program Leaders on behalf of the LSTO.
Configuring and Accessing the GDO
Once a project has been granted access to the GDO, a ZFS disk partition of
the size specified by its allocation will be created. One or more Solaris zones
will also be allocated, and user accounts will be created.
Because of the collaborative purpose of the system and the non-sensitive nature
of its data, GDO uses the ucllnl.org domain on the green network. Host names
will be assigned to zones to indicate the association with the project. For example,
project ABC's virtual hosts might be called "gdo-abc.ucllnl.org"
and, if a project requires an upload capability, gdo-abc-upload.ucllnl.org."
These project host names are aliases for corresponding numeric host names, such
as gdo1, gdo2, and so on.
Only U.S.-citizen LLNL project members will be granted login access to the
project's zone(s). Two-factor authentication using RSA tokens will be required
for login accounts. Logins must be done with SSH; e.g., ssh
Note that SSH connections are only accepted from within the llnl.gov or ucllnl.org
domains. If you want to connect from another (non-LLNL) host, you must first
SSH to an LLNL host and then SSH from there to the GDO.
External collaborators will not have login accounts. They will be able to
retrieve data using FTP, HTTP, or other approved protocols, and will be able
to upload data in certain circumstances if the project has enabled this capability.
Local project members without login accounts can use the system in the same way.
Interacting with LC and Other Yellow Network Resources
Because the GDO is on the green network, all interactions between it and resources
on the yellow network must be initiated from within the yellow network. Data
transfers involving LC systems must therefore originate on the LC system.
The following table shows some of the key directories in each GDO zone:
||ZFS area visible to anonymous FTP users. This is where the majority of a
project's data will reside.
||An optional ZFS area into which external collaborators will be able to upload
data. This will only be available via a project's optional upload zone (e.g.,
||Project directory for software and information to be shared amongst all of
a project's zones.
||LC-provided shared applications.
||Location of many OpenSource (e.g., GNU) UNIX utilities, excluding gcc.
||Location of additional external utilities, including gcc.
||Location of Sun Studio compilers and documentation.
Building and Installing Project Software
Projects will be able to install project-specific software in the /usr/apps
directory, which will be visible from all of a project's zones. Note that software
that provides connection or data transfer services must first be approved by
the GDO project leader. Client software does not require approval but may require
modifications to the GDO firewall if communication is done on a non-standard
Both GNU and Sun Studio compiler environments will be provided. Note that
the system is an AMD64 running Solaris 10. The GNU tools are installed in /opt/sfw/bin
and /usr/sfw/bin; GNU versions of utilities typically start with a "g" (
e.g., gmake, gmd5sum, gfind, gtar, ...). The default environment on the GDO
is set up to include access to Sun-supplied tools and documentation.
Moving Data To and From the GDO
There are three major data paths involving the GDO, each of which involves
moving data both to and from the GDO: (1) LC production hosts on the yellow network,
including Yana, Atlas, and Zeus, (2) project hosts on LLNL's unrestricted
network, and (3) offsite collaborator hosts.
The GDO is on its own segment of the green network—a direct offshoot
from the Lab's ESNet router. This is a 10 Gb/s
network connection. The GDO also has access to ESNet's circuit-oriented Science
Data Network (SDN) for massive dataset transfers to or from other ESNet sites.
Data movement is a challenging problem, in large part because of the amount
of data the GDO will be handling. The standard unit for datasets is now measured
in terabytes. There are two significant issues in moving large amounts of data:
(1) bandwidth, and (2) tools for achieving high end-to-end throughput. Funds
and network engineering can mostly solve the bandwidth issue. To solve the throughput
issue it is critical to provide tools that utilize parallel data streams when
moving data. This is the only approach that can work around limits imposed by
network links, firewalls, and various I/O devices. Parallel transfer options
are being investigated for all major data paths involving the GDO so that large
aggregate throughput can be achieved for transfers of large files.
A project's large disk space is available in the directory /export/ftp/pub.
Project members with login accounts will be able to populate this space with
project data to be shared.
GDO's network link to the outside world is via a 10 Gb/s connection to ESNet.
Currently, the GDO firewall bottlenecks individual transfer streams at 1.7 Gb/s.
The proper utilization of parallel streams when moving data can improve aggregate
transfer rates significantly.
Transfers between GDO and other LLNL networks will be limited by the other
networks' firewalls, routers, etc. Currently, the unrestricted and yellow networks
run at 1 Gb/s, so that will be the limiting factor in transfers involving
hosts on that network.
Tools for Transferring Data
For moving data to the GDO, all major data paths will support some form of
FTP for transfers. For users with login accounts, SFTP and SCP are also options.
For retrieving data from the GDO, several protocols such as variants of FTP,
HTTP, and NFS are supported. When transferring entire files locally or remotely,
the user can expect the best performance from FTP for single-thread transfers.
For hosts that have enabled GridFTP access, remote clients can improve parallelism
by using a utility like Bulk Data Mover (BDM). Read-only file access is also
available using NFS to other hosts on the unrestricted network; expected throughput
is likely to be 25 MB/s or less initially. Coordination with the GDO project
leader is required if a project wishes to NFS export its disk space to another
host on the unrestricted network.
Projects may request additional or alternative protocols (GridFTP, SRB, etc.)
for providing data access to external collaborators. These requests should be
made to the GDO project leader.
For LC production hosts on the yellow network, the standard FTP and PFTP clients
can be used for moving data to the GDO, but note that all such transfers are
done serially—PFTP cannot do parallel transfers through a firewall. Other
tools for doing parallel transfers in this environment are being investigated;
GridFTP and BBFTP are two candidates.
The GDO has approximately 620 TB of RAID storage, which is split amongst the
various projects with GDO accounts. Due to size constraints, this data is not
backed up. Users are strongly urged to back up data to other devices (such as
HPSS archival storage). Although measures have been taken to prevent data loss,
unexpected system failures, power outages, or other problems will eventually
cause a loss of data.
A small amount of local disk space is also available and is used for home
directories and system data and information. Local home and system directories
will be backed up regularly.
Controlling External Access to Data
Although all data on the GDO is required to be approved for release for general
distribution, some projects may wish to control access to their data. Options
for controlling access include:
|FTP Access Control
||Description of External Access
||Project data is world readable via anonymous FTP.
||Project data is accessible via anonymous FTP only to those hosts explicitly
|Host + Time
||This option is mandatory when using the upload capability. It is not used
for controlling download access. This restricts uploads to just hosts explicitly
listed and just for the window of time specified by the project.
||Project data is accessible via FTP only to virtual users explicitly defined
by the project. The information regarding these virtual users is managed by the
project itself and generally consists of a username/password pair stored in
a UNIX DBM file. These virtual users log in as normal users but are restricted
in much the same way as are anonymous users.
Other protocols will have their own access control options, but for the most
part they will be similar to these FTP options.
In addition, projects can request that access be disallowed from unregistered
IP addresses and networks. Contact the LC
Hotline or GDO project leader to have this restriction enabled.
Receiving External Data and Making It Available
The GDO supports the ability for collaborators and project members without
login accounts to upload data to the GDO. Projects with the need for this capability
should express their needs to the GDO project leader. A separate zone will be
created for this purpose, with a name such as "gdo-abc-upload.ucllnl.org."
The disk layout will be very similar to the layout of the project's main zone,
but there will also be an /export/ftp/incoming directory into which externally
uploaded data will go.
Assuming the upload zone has been created, the following is the sequence of
steps to perform to get external data and make it available to others:
- LLNL project member (data custodian) uses
access control tool on GDO to specify the collaborator's host name and time window
from which the upload will occur.
- Remote collaborator uses anonymous FTP to connect to (for example) gdo-abc-upload.ucllnl.org
from the specified host and within the specified time window.
- Remote collaborator uploads data into the /incoming directory. Note that
this directory has write access but not read access; therefore, neither the collaborator
nor other remote users will be able to see the contents of this directory.
- LLNL data custodian reviews the contents of /export/ftp/incoming and confirms
that data are as expected. See the rules in the "Nature
of Data Allowed on the GDO " section
for the type of external data that is allowed on the GDO.
- If data is confirmed as valid, LLNL data custodian can then move the data
into the externally accessible project data space (/export/ftp/pub) if so desired.
Operational support for this system will include integrated hotline support,
timely dissemination of operational information (e.g., scheduled and unscheduled
machine or network downtime), and training and documentation that meets the needs
of local and remote users.
The LC Hotline is fully staffed from
8:00 a.m.–noon and 1:00–4:45 p.m. Pacific Time and can be accessed by telephone
or e-mail. Outside of these hours, callers will have the option of leaving a
message or being forwarded to the LC Operations staff, which is present 24x7,
365 days a year. E-mail received outside of business hours will be processed
the next business day. In urgent off-hour situations, send e-mail to firstname.lastname@example.org if
phone service is unavailable.
Remote collaborators should contact LLNL project collaborators first with
questions and requests for assistance.