Linux Labs – Beowulf Distribution: Codename Nimbus
Cluster Management Overview
I. Architecture Overview
The Beostat daemon has been replaced with the supermon utilities from LANL - this is a very lightweight /proc based system that uses virtually no system resources; as opposed to its rather onerous predecessor.
Wakinyan Monitor: A graphical monitor that both saves on screen space and has ambient temperature output.
2.4.19 Linux kernel for cutting-edge stability and latest feature set (i.e. hyperthreading capability for Xeon-based clusters).
bproc has been updated to the advanced LANL version with the following features:
rarpcatcher replaces beosetup
The status info is put into /etc/beowulf/config
This process runs on startup
It is always running as daemon, so when nodes are added on the fly, the process HUP’s (restarts) the beowulf system and adds in the new nodes.
NEW: all node data is only lost in the case of a complete trashing of your filesystem- this due to ext3 filesystem. We have experienced ZERO corruption in extensive testing. This is opposed to older versions of the software which took a rather cavalier attitude towards node filesystem data.
ALSO: Regarding I. Above, this makes boot much cleaner
NOTE: If one or more of your nodes has important data, issue a sync command before you power cycle
REMEMBER: the only non-persistent data stored on nodes are the libraries and system files that are copied to the node at boot time.
II. Important Cluster utilities
-
bpsh is the primary user interface into bproc. This is a sort of shell program similar to bash or tcsh that allows you to issue commands across all nodes on the network, or to selected nodes as described herein:
bpsh <nodespec> command
bpsh -h (help)
bpsh -n : no redirect, like rsh.
bpsh will accept all rsh syntax, i.e. you could actually issue this string and expect everything to work in order to convert inscript, a rsh based script, to bpsh : (sed -e “s/rsh/bpsh/g” < inscript > outscript)
-
-
bpcp is a bproc equivilent to
rcp.
-
-
Slave Configuration utilities
Master boots like typical RedHat system
Wakinyan monitor
Net console
Batch Scheduling
Advice on booting behavior
Are all the ports flashing on (if you have one) your GIG-E switch? This is GOOD! This means the arps are working.
(NOTE: the error most common is failed attempt to mount unavailable NFS share)
IMPORTANT NOTE: Booting a cluster always seems to take longer to boot than it actually does. Don’t despair! Just standby a bit. Wait a minute. Get a cup of coffee. All is well, 99% of the time!
Would you like to watch a node boot? This is also good for debugging nodes. You are going to monitor the Serial Console!
Minicom on master is ready to go. Run it.
The settings sbould already be TTYS0, 115200, N81 vt100
Find your null modem cable. A null modem cable is shipped with every cluster.
The Leftmost serial port on the master plus into the leftmost serial port on the target slave.
Other important Config files
/etc/beowulf/config.boot the file of last resort — this gives a list of the PCI id’s and driver names
Command “beoboot -p” this program grabs the kernel from /etc/beowulf/config/ and creates new images in the /tftpboot/slave boot directory
If you have problems unsolvable with reboot or halt, toggle the power on/off manually.
Other info
bpsh command execution paths are strictly by canonical directory names- follow:
Are you in /home/sysadmin on the master?
Does this exist on the slave?
Then the process you are executing runs In This Current Working Directory (cwd).
Are you in /home? /home always exists in an NFS mount.
You will be in the same directory if not an NFS mount. For example, if you are in /scratch on the master, you will execute in /scratch on the slave (/scratch exists on all machines)
If you are not in a similar canonical directory your path will be / on the slave.
Mirroring the Master to the secondary master.
Failover procedure for secondary masters:
Connect any RAID devices to the secondary master.
Connect the external net connection of the master to eth0 on the secondary.
Connect eth1 to the booting switch network (plus monitor, keyboard).
Reboot.
If necessary, hit the spacebar to skip PXE boot errors in this procedure.
Some BIOSes require hitting F2 to turn off PXE in the bios (boot menu) and to make the HD the primary boot method.
Want to run PVM ?
Want to run MPI ?
Want to run Lahey FORTRAN compiler?
Lahey resides in /usr/local/lf95
PGI in in /usr/pgi (pgi c, pgi f90, etc.) flexlm and environment variables set by default to just work. See the docs for more info.
Partioning of nodes:
nodes are /dev/hda1 — one fs
/dev/hda2 is swap
primary and secondary masters are - /dev/hda3 (/) ; /dev/hda5 (/var) ; /dev/hda6 (/usr)
Depending on your cluster specification, the primary is pre-setup to also be a slave node.
Rebuilding from source RPMs
Supplimental materials:
External resources