The Livermore Computing Center recognizes that our user's work can be seriously impacted by the unavailability of systems and services. Unfortunately, some amount of interruption in service is unavoidable. The Center strives to minimize user disruption caused by scheduled interruptions for repairs, updates, installation of new equipment, troubleshooting, or preventive maintenance. We therefore observe the following guidelines in scheduling these events:
Please note that sometimes an urgent problem or the threat of a failure requires that we take action quickly. This may preclude the scheduling options and advance notice described above.
Why is there so much going on with LC systems and services?
We have a large number of systems. Besides the ASC IBMs, Linux clusters and storage, there are many machines supporting services for those systems. These include NFS servers, LCRM and Moab control hosts, and LDAP servers. Each system has numerous hardware and software components. Many down times are part of our continuous effort to integrate new equipment to increase capacity and improve performance.
All hardware is subject to failure. All software is subject to bugs. We work to mitigate these sad but true facts by keeping up with preventive maintenance, software patches and updates, and taking a proactive response to problems. Some of our scheduled downtimes are part of the effort to prevent unscheduled downtimes, with accompanying potential loss of work or data.
Our systems and services are complex and closely entwined. An interruption to a central service, such as NFS-provided home directories or Kerberos authentication service, may have far reaching effects.
Why is it going on while I'm trying to work?
Some disruptive events are done off-hours, in the early morning hours, or occasionally on weekends. However, much work needs to be done when LC, network, and vendor staff are available to provide expertise and to help deal with any problems that arise. When full local and vendor support is available, we are better able to ensure minimal down time. We try to compromise with early morning and lunch time scheduling.