Disaster Recovery Procedures

Disaster Recovery Procedures (ATNF - Narrabri)

This document contains procedures that are to be followed in the case of a disaster affecting computer/network components crucial to the operation of the telescope (offline and online). Routine administration procedures vital to maintaining a reliable system such as configuring disks and maintaining well defined tape backups are also outlined here.

Screen Room destroyed

kaputar
RAID disks
ningadhun
leon

Control Room destroyed

noel
vladimir
desk2
nemesis

System Backup
VMS Cluster
Fileservers and Critical Workstations
SUN Workstations
Contact names
Service contract

A copy of this document is to be kept in the system administrators room in the case where this information is inacessible due to the web server being unavailable.

There are a number of possible outcomes that are discussed below. In each the appropriate staff and affected users are to be notified.

Screen Room destroyed

If this was ever to occur we would surely be up the creek without a paddle. Critical systems in the system administration domain are indicated below:

The following hardware devices currently exist in the screen room

kaputar
ningadhun
leon
prometheus (Cisco, Catalyst 5505 Ethernet Switch)

Assess the damage and set priorities as to which hardware devices need immediate attention
Call the relevant hardware vendors to arrange for replacement parts (i.e. if none are currently available)

In the case of kaputar place a service call with COMPAQ (1300 788 990) quoting the serial number (AY45700625) and customer number (3632500)

Contact relevant technical staff to organise installation of communication, ethernet cables, power outlets and other devices
Connect relevant hardware devices to the ethernet (SCSI disks and tape drives)
Reinstall the relevant operating systems on kaputar, ningadhun and leon
Restore all data on kaputar, ningadhun and leon
Check that all the relevant services and applications are successfully installed

Control Room Destroyed

The following hardware devices currently exist in the control room

noel
desk2
nemesis
vladimir
Kodak CD Rom Archiving Unit
Exabyte EXB8500 8mm tape drive unit

Call the relevant hardware vendors to arrange for replacement parts/hardware
In the case of Digital hardware check the DEC maintenance contract to see if the hardware is covered under the maintenance agreement. Place a service call quoting the serial numbers of the damaged hardware components.
Contact relevant technical staff to check ethernet cables and power outlets.
Connect relevant hardware devices to the ethernet (SCSI disks and tape drives)
Restore operating systems and data on the relevant host machines
Ensure that the functionality of the replace host machines has been restored

kaputar

Current Specifications/Configuration

DEC Alpha Server 1000A 5/400
Runs a number of services: web, mail, offline computing applications for observer data processing
640Mb RAM, ~54Gb disk space (see details on RAID disks)
Attached devices:

DLT 7000 Tape drive unit
External 2Gb SCSI swap disk
Exabyte EXB-8700 Tape drive
Paralan SCSI extender

The following manuals are located in the system administrators office. The documentation provided in these manuals can be most useful during a disaster recovery:

Digital UNIX system Administration v4.0B+
DEC Unix Network Administration
Networker DEC UNIX v4.2A
UNIX System Administration Handbook, 2nd Ed., Nemeth, Snyder, Seebass & Hein

IMPORTANT NOTE: If kaputar requires a full emergency (cold) shutdown it is critical that the external RAID disk rack is powered down AFTER the server is powered down. The reverse order applies when cold booting. Otherwise the RAID set may be toggle to an NOT READY/OFFLINE status which will prevent kaputar from booting up next time. To rectify this one needs to get an ARC console and run the RAID Array 200 Software for AXP Systems (see the section on Configuring RAID groups below).

Determine which hardware components have been damaged and which can be recovered
In the case of damaged disks (in the RAID set), replace with new disks (can be hot swapped on the fly)
In the case of a other damaged components such as the motherboard or powersupply, place a service call with COMPAQ (DEC) providing details of the problem. Arrangements should be made to replace the damaged part(s)
Once the hardware has been restored boot kaputar and check the configuration settings at the console prompt. For more information see AlphaServer 1000A Owner's Guide.

>>> show config

Digital Equipment Coorporation
Alpha Server 1000A 5/400

Firmware
SRM Console:    V5.0-101
ARC Console:    v5.34
PAL Code:       VMS PALCode V1.19-4, OSF PALcode V1.21-6
Serial Rom:     v1.4
Processor
DECchip (tm)    21164A-2   400Mhz

Memory
                640 Meg of System Memory
                Bank 0 = 256 Mbytes (64MB Per SIM) Starting at 0x00000000
                Bank 1 = 256 Mbytes (64MB Per SIM) Starting at 0x10000000
                Bank 2 = 128 Mbytes (32MB Per SIM) Starting at 0x20000000 
                Bank 3 = No memory detected 

Slot    Option          Hose 0,   Bus 0,  PCI
 7      Intel 82375                             Bridge to Bus 1, EISA
 8      DECchip 21050-AA                        Bridge to Bus 2, PCI
11      DECchip 21050-AA                        Bridge to Bus 3, PCI
13      Mylex DAC960       dra.0.0.13.0
                           dra.0.0.13.0         2 Member RAID 1
                           dra.1.0.13.0         4 Member RAID 5
                           dra.2.0.13.0         4 Member RAID 5
                           dra.3.0.13.0         3 Member RAID 5

 0      QLogic ISP 1020    pka0.7.0.2000.0      SCSI Bus ID 7
                           dka.400.4.0.2000     RRD45
                           dka.500.5.0.2000     RRD43
                           dka.600.6.0.2000     QUANTUM DLT7000 

Slot    Option          Hose 0,   Bus 0,  PCI
 0      DECchip 21050-AA   ewa0.0.3000.0        00-00-F8-20-6B-04
 1      QLogic ISP1020     pkb0.7.3001.0        SCSI Bus ID 7
                           dkb0.0.0.0.3001      RRD43
                           dkb100.1.0.3001      RRD43

                           mkb400.4.0.3001.0
                           mkb401.4.0.3001.0
                           mkb402.4.0.3001.0
                           mkb403.4.0.3001.0
                           mkb404.4.0.3001.0
                           mkb405.4.0.3001.0
                           mkb406.4.0.3001.0
                           mkb407.4.0.3001.0

bus 2, slot 0   -- pka -- QLogic ISP 1020
bus 3, slot 0   -- ewa -- DECchip 21040-AA 
bus 3, slot 1   -- pkb -- QLogic ISP 1020
bus 0, slot 13  -- dra -- Mylex DAC960

Restore the operating system and check the integrity of the filesystems. The steps are briefly outlined below:

At the console prompt boot into single user mode using the command

boot -fl s

and check the integrity of the root filesystem, i.e use the fsck command.

Restore the root filesystem using a tape containing a full backup labelled vdump. Such a backup tape would have been produced in single user mode.
IMPORTANT: When logging into kaputar in single user mode, the root-file system is mounted as read-only. Use the following command to change it to a read/write filesystem:

mount -u /

Mount the filesystem containing the networker server (backup/restore) i.e. /usr/bin
Restore all filesystems from the last full backup and all subsequent incremental backups until the date of the crash
Check the integrity of the restored filesystems using the fsck command.

kaputar's RAID disks

Specifications on kaputar's RAID disks can be obtained by running the StorageWorks RAID Array 200 Management Utility v1.1.1 (/usr/bin/swxcrmgr).

Specifications on RAID disks (on kaputar)

Channel         Vendor          Model           Rev:        Size (Mb)
----------------------------------------------------------------------------
  A-0           DEC             RZ29B           0016            4091
  A-1           DEC             RZ29B           0014            4091
  B-0           DEC             RZ29B           0014            4091
  B-1           DEC             RZ29B           0016            4091
  B-2           DEC             RZ29B           0016            4091
  B-3           DEC             RZ29B           0016            4091
  C-0           Quantum         XP34300W        L915            4101
  C-1           Quantum         XP34300W        L915            4101
  C-2           Quantum         XP34300W        L915            4101
  C-3           Quantum         XP34300W        L915            4101
  D-0           DEC             RZ1DB-CA        LYJ0            8678
  D-1           DEC             RZ1DF-CB        0372            8678
  D-2           DEC             RZ1DB-CA        LYJ0            8678
  D-3           DEC             RZ1DB-CA        LYJ0            8678
----------------------------------------------------------------------------

Logical Drive Table (RAID array)

-----------------------------------------------------------------------
Drive           RAID            Size            Status
Group           Level           (Mb)
-----------------------------------------------------------------------
  A              1               4091           Optimal
  B              5              12273           Optimal
  C              5              12303           Optimal
  D              5              26034           Optimal
-----------------------------------------------------------------------

RAID Levels and redundancy

RAID Level 1: the amount of storage data available for unique data is 50% (i.e. one disk is a mirror image of the other).
RAID Level 5: the amount of storage data available for unique data is 75% (i.e. for a set of 4 disks one of them is redundant).
Only one disk is permitted to fail without the drive group completely failing.
Upon failure one may add an identical disk to replace the faulty one on the fly. One would then need to rebuild the drive group using the StorageWorks RAID Array 200 Management Utility v1.1.1 (/usr/bin/swxcrmgr).

Configuring RAID groups

In order to increase the disk capacity of a current RAID group one needs to do the following:

Shutdown kaputar
Remove the RAID set and replace with new disks
At the halt prompt start an ARC console by typing arc.
Insert the floppy RAID Array 200 Software for AXP Systems and type A:swxcrmgr
Ensure that the new RAID set can be detected and then proceed to format the drive group (see the RAID Array 200 User's Guide)
Boot kaputar into multi-user mode and run disklabel in order to label the new RAID set
Create the appropriate file domains and filesets and then restore data onto the new disks.

ningadhun

Current Specifications/Configuration

DELL PowerEdge 2300 Server: Pentium III-450
256Mb ECC SDRAM
18Gb (7200rpm) internal Western Digital Ultra-2/LVD SCSI disk
Video Card: Integrated ATI-Rage Pro 3D, 2Mb SDRAM, 1024x768x256 resolution
Manuals (located in the screen room next to Monitor):

DELL PowerEdge 2300 - Installation and TroubleShooting Guide
DELL PowerEdge 2300 - User's Guide
DELL OpenManage Server Assistant v4.1.1 (useful tool for setting up diagnostic diskettes)

Start of 3 year lease: June 1999
S/N: TK2Q7

Determine and identify damaged components. If the server has been completely destroyed (e.g. a fire) then goto section 1 otherwise proceed to section 2. A special case is section 3 where one is expecting a certain failure and has time to move the data to another server.

Establishing a new PC server
ningadhun is partially damaged
Promoting a BDC to a PDC

Establishing a new PC server

This section is relevant for the scenario where the data on ningadun is not accessible due to a major hardware failure.

Make arrangements to obtain a replacement server. Place a service call with COMPAQ Customer Services quoting the serial number and other details as required.
If a server cannot be obtaind in reasonable time (i.e 24 to 48 hours) sacrifice a high end PC with the following minimum requirements

Pentium II 300
128 Mb RAM, at least 8Gb hard disk
Note that the current capacity of the server hard disk is 18Gb. However due to the nature of the emergency is should be possible to run with at least 8Mb, in order to run critical services, until an appropriate disk or disks can be obtained.

Installng Windows NT Server v4 and user data

Boot off the Windows NT Server 4 CD
Install the relevant SCSI device drivers
Install as a PDC (Primary Domain Controller)
Licencing Mode - per seat
Install the appropriate video drivers from a floppy disk.
Enter all relevant network infomation, such as hostname, IP address, etc...
Upon rebooting, install service pack 5 from the PCAPPS CD in the system administrators office.
From this point onwards, the backup procedure of user data cannot procede in the event of kaputar not being available.
Install the following critical applications. These will enable communication and authorisation for ningadhun to access the DLT tape drive so that the user data can be restored from backup tape.

DiskShare for Windows NT (v4). The Integraph DiskShare, Quick-Start Guide is located in the cupboard next to ningadhun.
Networker v4.4
eXcursion (X server)

Ensure that the following corresponding services are running on ningadhun

DsSvc
Networker Remote Exec Service
Remote Procedure Call (RPC) Locator
Remote Procedure Call (RPC) Service

Load kaputar networker via eXcursion:

D:\pcapps\XCURSION\X86\LAUNCH.EXE -Xapp -alias 'kaputar-root' /usr/bin/networker

The location of LAUNCH.EXE is not critical so long as it can be executed with the command arguments indicated above.

Load the networker recover application with the following arguments:

C:\win32app\nsr\bin\winworkr.exe -s kaputar.atnf.csiro.au

Goto Options/Recover, then View/select browse time. The browse time allows one to select files to recover from tape for a specified backup date. Click the Start icon. Note that the C: drive contains system files and local applications. The D: drive contains user data and other shared resourses.
Checks points after data recovery

Ensure that the following shares have been restored correctly: profiles, pcapps, users and cad.
A trust relationship between ningadhun and the following domains exists: VORTEX, and MOPRA
The directory D:\profiles has been successfully restored. This contains user specific profile information to enable successful roaming profile logins into the CULGOORA domain.
The directory D:\users has been successfully restored which contains centrally stored user data.
The SAM (Security Account Manager) database contains all the expected user account and security information. This can be checked by running User Manager for Domains.
The following critical printer drivers have been recovered

Admin: HP LaserJet 5/5M PostScript
Laser: HP LaserJet 5Si/5Si MX PS
CACOLOUR: HP C LaserJet 4500-HP

Perform a test login (from another workstation) with at least one user account and ensure that the relevant network resourses are available, i.e. user data, profiles and printers.

ningadhun is partially damaged

This section refers to some component of ningadhun (DELL PowerEdge 2300) being damaged to such an extent that the continuation of normal user services is prevented, thus requiring urgent attention..

Faulty motherboard or dead power supply

Place a service call with DELL Customer Service quoting the serial number and the problem. A job number will be given and should be noted for reference when chasing up the status of a call.A serviceman will be scheduled to come out to site with the relevant replacement parts.

System Disk crash

If the internal (18Gb) SCSI disk has been damaged for some reason which is preventing the system from booting up a complete recovery may be required.
If the disk cannot be detected as indicated from the bootup console output (with SCSI ID set to 0), then a service call is most likely required in order to arrange for a replacement disk.

                        Screen Output:
                        Adaptec AIC-7890 SCSI BIOS v2.01 Dell 001
                        (c) 1998 Adaptec, Inc. All Rights Reserved.

                        Press  for SCSISelect(TM) Utility

                        SCSI ID: 0 WDIGTL   WDE18300 ULTRA2   ULTRA2-LVD   - Hard Disk 0
                        SCSI ID: 6 DELL     1x6 U2W SCSI BP   ULTRA2-LVD  

                        SCSI BIOS Installed Successfully !

                        Adaptec AIC-7860 SCSI BIOS v2.01 Dell 001
                        (c) 1998 Adaptec, Inc. All Rights Reserved.

                        Press  for SCSISelect(TM) Utility

                        SCSI ID: 5  NEC         CD-ROM DRIVE:465

The second device indicated is the SCSI BackPlane board allowing up to 5 additional SCSI disks to be mounted into the front disk rack.

Re-installing Windows NT4 onto the DELL PowerEdge 2300

One cannot install NT Server unless the SCSI controller card is installed correctly.
Boot the server with the Windows NT Server 4 CD.
During the initial boot phase (i.e when blue screen appears) press F6 to interupt and specify other SCSI devices to be installed.
Select: Adaptec AHA-294xU2/295xU2/AIC-789x PCI Ultra2 SCSI Controller(NT 4.0)
The SCSI driver is located on a DELL floppy disk labelled NT 4 Video and SCSI controller drivers.
Other options/considerations:

Install as a PDC (Primary Domain Controller)
Network drivers are for: Intel Pro 100+
Licencing Mode - per seat
Video drivers: ATI Technologies Inc. 3D RAGE PRO
Install service pack 5 (PCAPPS CD in system administrators office). Do not replace the file EB100.SYS during installation.

Promoting a BDC to a PDC

This procedure is relevant if one suspects an imminent failure of ningadhun and there is still time to move the data to another server.

BDC - Backup Domain Controller
PDC - Primary Domain Controller

Make arrangements for a backup server which satisfies the minimum requirements (see section 1).
Boot off the Windows NT Server 4 CD.
Install as a BDC.
Licensing Mode - per seat.
Enter all relevant network information. Use an hostname and IP address not currently used. This will be switched to the correct name and IP address at the end of this section.
Install service pack 5 from the PCAPPS CD in the system administrator's office.
Locate the scopy utility with the /o /s options. This utility is available on the Windows NT Server Resource Kit CD located in the system administrator's office. This utility is particularly useful in copying security information which is not possible with the standard copy command in Windows NT.
Use the scopy utility to copy the following directories (and sub-directories) in order of importance:

D:\profiles
D:\users
D:\pcapps
D:\dist
D:\Cad

Synchronise the BDC with the PDC, i.e. goto Start/Administrative Tools/ Server Manager/Computer/Synchronise Entire Domain.
Promote the BDC to a PDC - in the same menu as the previous step goto Promote to Primary Domain Controller. Note that this will also automatically demote the current PDC to a BDC.
Shutdown (faulty) ningadhun.
Rename the new PDC to ningadhun and update the IP address in the TCP/IP properties section.
Reboot the new ningadhun
Check user logins and other relevant tests as in step 1.

noel

A number of outcomes are discussed below.

System disk damaged (VMSSYS0)

Find a replacement SCSI disk and ensure that a non-conflicting SCSI address have been selected
Make sure that all disks have been properly dismounted
Shutdown the VMS cluster and attach the replacement disk.
Attach the MKA600 drive and check that both the replacement disk and tape drive have been successfully detected.
Restore the files from the tape.
Restore the VMS Cluster by booting the VMS members in the correct sequence.
Record the event in the observer's logbook.

Data disk damaged ($DISK3)

Find a replacement SCSI disk and ensure that a non-conflicting SCSI address have been selected
Shutdown the VMS cluster and attach the replacement disk.
Attach the MKA600 drive and check that both the replacement disk and tape drive have been successfully detected.
Locate and restore files from the last full backup including all subsequent incremental backups to the date of the disk crash.
Reboot the cluster and check for a successful login.
Record the event in the observer's logbook.

Faulty monitor(s)

If a fuse has blown in either one of monitors try and find an appropriate fuse and replace accordingly
If the monitor cannot be repaired easily, contact COMPAQ (Dec) quoting the serial number of the faulty monitor (from the Service contract).
Locate and install a temporary replacement monitor from another VMS cluster member.

Does not power up

First check to see if the fuse of the power supply has been blown. Get a qualified electronics technician to quickly check but ensure that they do not attempt a more complex operation since it would violate warranty conditions. If the fuse has been blown ensure that the fuse has been correctly chosen !!
If there is still no power, place a service call with COMPAQ Customer Services quoting the serial number and other details if required. The responce is 24 hours. If this is unreasonable another VMS cluster member such as leon would have to be sacrificed temporarily.

leon

A number of outcomes are discussed below.

System disk damaged (VMSSYS1)

Find a replacement SCSI disk and ensure that a non-conflicting SCSI address have been selected
Make sure that all disks have been properly dismounted
Shutdown the VMS cluster and attach the replacement disk.
Attach the MKA600 drive and check that both the replacement disk and tape drive have been successfully detected.
Restore the files from the most recent full backup and subsequent incremental backups.
Restore the VMS Cluster by booting the VMS members in the correct sequence.
Record the event in the observer's logbook.

Data disk damaged ($DISK0)

Find a replacement SCSI disk and ensure that a non-conflicting SCSI address have been selected
Shutdown the VMS cluster and attach the replacement disk.
Attach the MKA600 drive and check that both the replacement disk and tape drive have been successfully detected.
Locate and restore files from the last full backup including all subsequent incremental backups to the date of the disk crash.
Reboot the cluster and check for a successful login.
Record the event in the observer's logbook.

Faulty VT320 terminal

Simply replace with a spare terminal.

Does not power up

First check to see if the fuse of the power supply has been blown. Get a qualified electronics technician to quickly check but ensure that they do not attempt a more complex operation since it would violate warranty conditions. If the fuse has been blown ensure that the fuse has been correctly chosen !!
If there is still no power, place a service call with COMPAQ Customer Services quoting the serial number and other details if required. The responce is 24 hours. If this is unreasonable another VMS cluster member such as delphi would have to be sacrificed temporarily.

System Backup

Unix and WinNT machines

Tape backups are performed daily using a DLT 7000 Tape Drive connected to kaputar.
Networker v4.4 (/usr/bin/networker) is the software used to drive the backups. For more information see the manual Networker DEC Unix v4.2A in the system administrator's office.
The process daemons for the Networker server on kaputar can be stopped and restarted via /sbin/init.d/nsrd.
The backup for kaputar commenses daily at 23:30
A snapshot of the current filesystem disk usage (Nov-1999) is shown in the following table

-------------------------------------------------------------------------
Filesystem            Total     Used   Capacity 
                       (Mb)      (Mb)            Mount point
-------------------------------------------------------------------------
/dev/re0a               129       105   91%     /
/dev/re0g               705       464   73%     /usr
/dev/re0h              3114      2662   95%     /x
/dev/rz5c               629       529   84%     /syscdrom1
kaputar#opt           12567       453   88%     /opt
kaputar#usrlocal      12567       668   91%     /usr/local
kaputar#atapplic      12567      1172   95%     /atapplic
kaputar#applic        12567       491   89%     /applic
kaputar#AIPS          12567      1011   94%     /AIPS
kaputar#narusers      12567      2510   98%     /narusers
kaputar#source        12567       892   93%     /source
kaputar#SOLARIS2local 12567      1629   96%     /export/SOLARIS2local
kaputar#SOLARIS2opt   12567      1782   97%     /export/SOLARIS2opt
kaputar#aips++        12567        64   51%     /aips++
kaputar#ATOMS         12567      1411   96%     /ATOMS
kaputar#www           12567       138   69%     /www
data#visitors         39257     14591   64%     /data/KAPUTAR_1
data#students         39257       565    6%     /data/KAPUTAR_2
data#localdata        39257     15666   65%     /data/KAPUTAR_3
-------------------------------------------------------------------------

The backup for ningadhun commenses daily at 02:00

The following directories are backed up

Drive	Capacity (Gb)	Description
C:	2.5	Operating System, System files, Registry
D:	14.5	User Data files, User profiles

A full backup is performed every 14 days.

Level 9 incremental backups are performed every other day.

The naming convention for DLT tape backups is CULGOORA_###, where ### denotes a 3 digit number
All DLT tapes are currently stored in the computer room. The tape just before the currently loaded tape (in the DLT tape drive) is to be stored offsite (at the system admin's home). This is to ensure that:

The latest (mounted) tape is on site to provide quick recovery for the latest files. This however is at the expence of possibly losing approximately one month's backup of data (assuming that the screeb room is completely destroyed) since currently mounted tapes are not duplicated.
The offsite tape offsite provides security and data integrity. The probability of both the computer room and the offsite location being destroyed is extremely small.

Contents of the file /etc/fstab.

The following details are shown from left to right respectively: Name of filesystem, mount point on kaputar, type of filesystem and read/write details. All the filesystems indicated below are backed up (except for swap1, swap2, /syscdrom0, /syscdrom1, /data/KAPUTAR_2 and /data/KAPUTAR_3).

=====================================================================================
Filesystem         1024-blocks  Used Available Capacity Mounted on
=====================================================================================
/dev/re0b               swap1                   ufs     sw 0 2
/dev/rz1b               swap2                   ufs     sw 0 2
/dev/re0a               /                       ufs     rw 1 1
/dev/re0g               /usr                    ufs     rw 1 2
/dev/re0h               /x                      ufs     rw 1 2
/proc                   /proc                   procfs  rw 0 0
/dev/rz4c               /syscdrom0              ufs     ro 0 0
/dev/rz5c               /syscdrom1              ufs     ro 0 0
#/dev/fd0c              /fd                     ufs     rw 0 0
kaputar#opt             /opt                    advfs   rw,userquota,groupquota 0 2
kaputar#usrlocal        /usr/local              advfs   rw,userquota,groupquota 0 2
kaputar#atapplic        /atapplic               advfs   rw,userquota,groupquota 0 2
kaputar#applic          /applic                 advfs   rw,userquota,groupquota 0 2
kaputar#AIPS            /AIPS                   advfs   rw,userquota,groupquota 0 2
kaputar#narusers        /narusers               advfs   rw,userquota,groupquota 0 2
kaputar#source          /source                 advfs   rw,userquota,groupquota 0 2
kaputar#SOLARIS2local   /export/SOLARIS2local   advfs   rw,userquota,groupquota 0 2
kaputar#SOLARIS2opt     /export/SOLARIS2opt     advfs   rw,userquota,groupquota 0 2
kaputar#aips++          /aips++                 advfs   rw,userquota,groupquota 0 2
kaputar#ATOMS           /ATOMS                  advfs   rw,userquota,groupquota 0 2
kaputar#www             /www                    advfs   rw,userquota,groupquota 0 2
data#visitors           /data/KAPUTAR_1         advfs   rw,userquota,groupquota 0 2
data#students           /data/KAPUTAR_2         advfs   rw,userquota,groupquota 0 2
data#localdata          /data/KAPUTAR_3         advfs   rw,userquota,groupquota 0 2
#ningadhun:/T/users     /nt/users               nfs     rw,nfsv2
#ningadhun:/T/cad       /nt/cad                 nfs     rw,nfsv2
#ningadhun:/S/pcapps    /nt/apps                nfs     ro,nfsv2
#%aips2.nrao.edu:/export/aips++/master /aips++_master nfs       rw,grpid,hard,intr,retrans=20,tim
eo=60
=====================================================================================

VMS Cluster

The strategy followed at present involves a full backup of the cluster every 8 weeks, with an incremental backup being performed on all other weeks. A full backup takes approximately 2 days to complete, whilst an incremental may take rom half a day to one day and a half, depending on the proximity of a full backup. The current trend is to start a backup on Thursday morning, so that if it takes a second day to complete it will not run into the weekend. See VMS Cluster Backup

Current status of cluster mounted devices:

$ show dev/m

Device                  Device           Error    Volume         Free  Trans Mnt
 Name                   Status           Count     Label        Blocks Count Cnt
 $1$DKA0:        (NOEL)  Mounted              0  VMSSYS0         793821     5   4
 $1$DKA100:      (NOEL)  Mounted              0  $DISK3          106104    58   4
 $2$DKA0:        (LEON)  Mounted              0  VMSSYS1         639485   514   4
 $2$DKA100:      (LEON)  Mounted              0  $DISK0          222690    26   4
 $2$DKA400:      (LEON)  Mounted              0  PAGEDISK          9699     2   1
 $12$DKA0:     (DELPHI)  Mounted wrtlck       0  VAXDOCMAR951    369165     1   4
 $12$DKA200:   (DELPHI)  Mounted              0  $DISK1         1267092     7   4
 $12$DKA300:   (DELPHI)  Mounted              0  DELPHI_1037      47295     1   4
 $15$DKA0:      (DESK2)  Mounted              0  KOALA          1344708     1   4
 $15$DKA300:    (DESK2)  Mounted              0  DESK2_1255       41412     1   4

 Device                  Device           Error
  Name                   Status           Count
  LTA0:                   Offline mounted      0
  RTA1:                   Mounted              0
  RTA2:                   Mounted              0

Shutting down the VMS Cluster

The VMS cluster need to be shutdown in the following order.

desk2
delphi
leon
noel

To re-establish the cluster boot the machines in the reverse order.

Fileservers and critical workstations

Host name Description Operating System Location Purpose Serial Number

kaputar DEC Alpha 1000A 5/400 Digital Unix v 4.0B Computer Room UNIX, Web, Mail server,etc... AY45700625

ningadhun DELL PowerEdge 2300 WinNT Server v4.0 SP5 Computer Room PC server - user data/applications TK2Q7

leon MicroVax 3100-80 Open VMS v 6.1 Computer Room VAX server KA224R6060

noel Vaxstation 4000-60 Open VMS v 6.1 Control Room Compact Array control computer AB22202U7T

atria Digital PC Red Hat Linux v5.2 Correlator Room Correlator data acquisition computer -

SUN workstations

All the currently used SUN workstations are listed below. In the case of hardware problems one may place a service call with SUN Service Centre quoting the serial number and describing the problem to the customer service representative. The serial numbers of the last two workstations are not indicate since they are no longer covered under service warranty.

Host name Description Op. System Location Purpose Serial Number

achilles SUN Ultra 10 Solaris 2.5.6 Computer Room Solaris Server/Workstation HW82004508

medea SUN Ultra 10 Solaris 2.5.6 Computer Room SUN workstation HW82004506

ambrosia SUN Ultra 10 Solaris 2.5.6 Computer Room SUN workstation HW82004503

orpheus SUN Ultra 10 Solaris 2.5.6 Control Room Online imaging FW84750472

poseidon SUN Ultra 10 Solaris 2.5.6 Mark Wieringa's office SUN workstation FW84750488

argos SUN Ultra 10 Solaris 2.5.6 Dave McConnell's office SUN workstation FW84750478

molen SparcStation 5 Solaris 2.5.6 Observer's Area SUN workstation 425F5931

corvus Sparc ULTRA 10 Solaris 2.5.6 Observer's Area SUN workstation FW93230158

vladimir SparcStation 5 SUN OS v4.1.3 Control Room SUN workstation -

Contact Names

Company Name Phone Fax Contact Name(s) and details

Arrow Direct P/L (03) 9763 8433 (03) 9763 8823 Geoff Bull or John Taylor

ComNet Solutions (02) 9899 5700 (02) 9634 1432 Brian Denley

Connections P/L (02) 9552 3088 (02) 9552 3258 David Simmons

COMPAQ Customer Serivce 1300 368 369 - -

COMPAQ Serivce Calls 1300 788 990 - Customer ID: 363250

DELL Sales Representative 1800 803 385 1800 818 341 Abe Khamis or Kevin Keheo

DELL Credit Manager (02) 9930 3355 - Kylee Mace

DELL Technical Support 1800 808 378 - -

DELL Delivery Enquiries 1800 819 339 - -

Epson - Technical Support (02) 9903-9040 - -

HP Direct 131 047 - -

Hunter Digital (02) 4968 4455 - Marty Wilson

Software Spectrum (02) 9418 3811 - Michael van Zoggel

Sun Microsystems Aust. P/L (02) 9466 9466 (02) 9466 9410 Robert Drake

Sun Service Centre 1800 555 786 - -

Service Contract

Location Licenses Box file (System Admin's Office)

Account Number 0195340

Agreement Number 5908200600B

Administrator Paul Cruz

Created: John Giovannis (6-Aug-1998)
Modified: John Giovannis (9-Nov-1999)

Host name	Description	Operating System	Location	Purpose	Serial Number
kaputar	DEC Alpha 1000A 5/400	Digital Unix v 4.0B	Computer Room	UNIX, Web, Mail server,etc...	AY45700625
ningadhun	DELL PowerEdge 2300	WinNT Server v4.0 SP5	Computer Room	PC server - user data/applications	TK2Q7
leon	MicroVax 3100-80	Open VMS v 6.1	Computer Room	VAX server	KA224R6060
noel	Vaxstation 4000-60	Open VMS v 6.1	Control Room	Compact Array control computer	AB22202U7T
atria	Digital PC	Red Hat Linux v5.2	Correlator Room	Correlator data acquisition computer	-

Host name	Description	Op. System	Location	Purpose	Serial Number
achilles	SUN Ultra 10	Solaris 2.5.6	Computer Room	Solaris Server/Workstation	HW82004508
medea	SUN Ultra 10	Solaris 2.5.6	Computer Room	SUN workstation	HW82004506
ambrosia	SUN Ultra 10	Solaris 2.5.6	Computer Room	SUN workstation	HW82004503
orpheus	SUN Ultra 10	Solaris 2.5.6	Control Room	Online imaging	FW84750472
poseidon	SUN Ultra 10	Solaris 2.5.6	Mark Wieringa's office	SUN workstation	FW84750488
argos	SUN Ultra 10	Solaris 2.5.6	Dave McConnell's office	SUN workstation	FW84750478
molen	SparcStation 5	Solaris 2.5.6	Observer's Area	SUN workstation	425F5931
corvus	Sparc ULTRA 10	Solaris 2.5.6	Observer's Area	SUN workstation	FW93230158
vladimir	SparcStation 5	SUN OS v4.1.3	Control Room	SUN workstation	-

Company Name	Phone	Fax	Contact Name(s) and details
Arrow Direct P/L	(03) 9763 8433	(03) 9763 8823	Geoff Bull or John Taylor
ComNet Solutions	(02) 9899 5700	(02) 9634 1432	Brian Denley
Connections P/L	(02) 9552 3088	(02) 9552 3258	David Simmons
COMPAQ Customer Serivce	1300 368 369	-	-
COMPAQ Serivce Calls	1300 788 990	-	Customer ID: 363250
DELL Sales Representative	1800 803 385	1800 818 341	Abe Khamis or Kevin Keheo
DELL Credit Manager	(02) 9930 3355	-	Kylee Mace
DELL Technical Support	1800 808 378	-	-
DELL Delivery Enquiries	1800 819 339	-	-
Epson - Technical Support	(02) 9903-9040	-	-
HP Direct	131 047	-	-
Hunter Digital	(02) 4968 4455	-	Marty Wilson
Software Spectrum	(02) 9418 3811	-	Michael van Zoggel
Sun Microsystems Aust. P/L	(02) 9466 9466	(02) 9466 9410	Robert Drake
Sun Service Centre	1800 555 786	-	-

Location	Licenses Box file (System Admin's Office)
Account Number	0195340
Agreement Number	5908200600B
Administrator	Paul Cruz