User Tools

Site Tools


nimbus

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

nimbus [2010/04/15 21:18] (current)
Line 1: Line 1:
 +===== Linux Labs – Beowulf Distribution:​ Codename Nimbus =====
 +
 +=== Cluster Management Overview ===
 +
 +
 +=== I. Architecture Overview ===
 +
 +   ​* ​ The **Beostat** daemon has been replaced with the **supermon** utilities from LANL - this is a very lightweight __/proc__ based system that uses virtually no system resources; as opposed to its rather onerous predecessor.
 +   * **Wakinyan** Monitor: A graphical monitor that both saves on screen space and has ambient temperature output.
 +   * **2.4.19 Linux kernel** for cutting-edge stability and latest feature set (i.e. hyperthreading capability for Xeon-based clusters).
 +   * **bproc** has been updated to the advanced LANL version with the following features:
 +    * Unified **__P__**rocess**__ID__**entification space more complete. Bproc system daemons are fully hidden once a node boots.
 +          - __OLD__: when a process spawns on a slave node, it initializes,​ then a new PID is issued
 +          - __NEW__: all system processes disappear, and PIDs are global on all nodes 
 +    * Access control is now available on a node-by-node basis:
 +          - User / Group / Other (i.e. chmod uga) on slave nodes themselves
 +          - Permissions are checked on nodes for job eligibility by users.
 +          - This is useful in a shared cluster where not everyone can use all the nodes.
 +   * **rarpcatcher** replaces **beosetup**
 +     * The status info is put into __/​etc/​beowulf/​config__
 +     * This process runs on startup
 +     * It is always running as daemon, so when nodes are added on the fly, the process HUP’s (restarts) the beowulf system and adds in the new nodes. ​
 +   * **NEW**: all node data is only lost in the case of a complete trashing of your filesystem- this due to ext3 filesystem. We have experienced ZERO corruption in extensive testing. This is opposed to older versions of the software which took a rather cavalier attitude towards node filesystem data.
 +   * **ALSO**: Regarding I. Above, this makes boot much cleaner
 +   * **NOTE**: If one or more of your nodes has important data, issue a sync command before you power cycle
 +   * **REMEMBER**:​ the only non-persistent data stored on nodes are the libraries and system files that are copied to the node at boot time. 
 +
 +
 +=== II. Important Cluster utilities ===
 +
 +  - All commands accept a [[nod|Node specification syntax]]
 +  - **[[bpsh|bpsh]]** is the primary user interface into bproc. This is a sort of shell program similar to bash or                 tcsh that allows you to issue commands across all nodes on the network, or to selected nodes as described herein:
 +    * **bpsh** <​nodespec>​ command
 +    * **bpsh** -h (help)
 +    * **bpsh** -n : no redirect, like __rsh__.
 +    * **bpsh** will accept all rsh syntax, i.e. you could actually issue this string and expect everything to work       in order to convert inscript, a rsh based script, to bpsh : (//sed -e "​s/​rsh/​bpsh/​g"​ < inscript > outscript//​)
 +    * [[bpsh|man page]]
 +    * [[bprun|bpsh run environment]]
 +  - [[bpcp|bpcp]] is a bproc equivilent to **rcp**.
 +  - [[bpstat|bpstat]] Display node status.
 +  - [[bpctl|bpctl]] Change node status ​
 +
 +
 +=== Slave Configuration utilities ===
 +
 +   * [[flash|flash_tool]]
 +   * [[cmos|cmos_util]] ​
 +
 +=== Master boots like typical RedHat system ===
 +
 +[[slave|Slave Booting procedure and sequence]] ​
 +
 +[[http://​supermon.sourceforge.net/​|Supermon System]] ​
 +
 +   * Communicates over TCP/IP
 +   * mon daemon
 +   * supermon daemon
 +   * light weight
 +   * Data format
 +         * Lisp like
 +         * Human readable
 +         * Extansable ​
 +   * Kernel modules
 +         * supermon_proc
 +         * sensors ​
 +   * mon embedded in beoboot
 +   * libsexpr ​
 +
 +=== Wakinyan monitor ===
 +
 +   * This program lives in __/​usr/​bin/​wakinyanmon__
 +   * Part of [[http://​supermon.sourceforge.net/​|Supermon]] system.
 +   * A [[http://​www.gtk.org/​|GTK]] application.
 +   * Node Status Display
 +         * A horizontal yellow line means the node is — down
 +         * diagonal yellow means the node is — booting
 +         * green check means the node is — up
 +         * red X means the node has an — error condition ​
 +   * CPU load
 +   * Disk load
 +   * Memory used
 +   * Swap
 +   ​* ​ Net
 +   * Temperatures
 +         * CPU 0
 +         * CPU 1
 +         * Northbridge ​
 +
 +=== Net console === 
 +   * [[netcons|netconsole]] ​  
 +=== Batch Scheduling ===
 +
 +   * [[raccoon|raccoon-Maui]]
 +   * BJS 
 +
 +=== Advice on booting behavior ===
 +
 +  - Are all the ports flashing on (if you have one) your GIG-E switch? This is GOOD! This means the **arps** are working.
 +  - (NOTE: the error most common is failed attempt to mount unavailable NFS share)
 +  - IMPORTANT NOTE: Booting a cluster always seems to take longer to boot than it actually does. Don’t despair! Just standby a bit. Wait a minute. Get a cup of coffee. All is well, 99% of the time!
 +  - Would you like to watch a node boot? This is also good for debugging nodes. You are going to monitor the __Serial Console__!
 +         * **Minicom** on master is ready to go. Run it.
 +         * The settings sbould already be __TTYS0, 115200, N81 vt100__
 +         * Find your //null modem cable//. A null modem cable is shipped with every cluster.
 +         * The Leftmost serial port on the __master__ plus into the leftmost serial port on the target __slave__. ​
 +
 +=== Other important Config files ===
 +
 +   * __/​etc/​beowulf/​config.boot__ the file of last resort — this gives a list of the PCI id’s and driver names
 +   * Command "​**beoboot -p**" this program grabs the kernel from __/​etc/​beowulf/​config/​__ and creates new images in the **/​tftpboot/​slave** boot directory
 +   * If you have problems unsolvable with reboot or halt, toggle the power on/off manually. ​
 +
 +=== Other info ===
 +
 + ​**bpsh** command execution paths are strictly by canonical directory names- follow:
 +         - Are you in __/​home/​sysadmin__ on the master?
 +         - Does this exist on the slave?
 +         - __Then the process you are executing runs In This Current Working Directory (**cwd**)__.
 +         - Are you in __/home__? __/home__ always exists in an NFS mount.
 +         - You will be in the same directory if not an NFS mount. For example, if you are in /scratch on the master, you will execute in /scratch on the slave (/scratch exists on all machines)
 +         - **If you are not in a similar canonical directory your path will be / on the slave**. ​
 +
 +=== Mirroring the Master to the secondary master. === 
 +  * [[mirr|Imaging the master to a secondary master]]
 +===  Failover procedure for secondary masters: === 
 +
 +
 +   - Connect any RAID devices to the secondary __master__.
 +   - Connect the external net connection of the __master__ to eth0 on the __secondary__.
 +   - Connect eth1 to the booting switch network (plus monitor, keyboard).
 +   - Reboot.
 +   - If necessary, hit the spacebar to skip PXE boot errors in this procedure.
 +   - Some BIOSes require hitting F2 to turn off PXE in the bios (boot menu) and to make the HD the primary boot method. ​
 +
 +===  Want to run PVM ? ===  ​
 +    * simply run **start-pvm** and this launches in all nodes for legacy apps.
 +===  Want to run MPI ? === 
 +
 +   * The newer **[[mpi|MPI]]** (1.5) uses all_cpus=1 rather than "​MPI="​ for using all CPUs.
 +   * Example: **all_cpus=1 progrname params** (linked against MPI)
 +   * Note that the __master__ is node -1 
 +
 +===  Want to run Lahey FORTRAN compiler? === 
 +
 +   * Lahey resides in __/​usr/​local/​lf95__
 +   * PGI in in __/​usr/​pgi__ (pgi c, pgi f90, etc.) **flexlm** and environment variables set by default to just work. See the docs for more info.
 +
 +=== Partioning of nodes:===
 +
 +   * nodes are /dev/hda1 — one fs
 +   * /dev/hda2 is swap
 +   * primary and secondary masters are - /dev/hda3 (/) ; /dev/hda5 (/var) ; /dev/hda6 (/usr)
 +   * Depending on your cluster specification,​ the primary is pre-setup to also be a slave node. 
 +
 +=== Rebuilding from source RPMs ===
 +  * [[rebuild|Summary of RPM build from SRPM]]
 +=== Supplimental materials:​===
 +
 +    * [[http://​www.rpm.org/​max-rpm/​|Maximum RPM]]
 +    * [[http://​rfc.net/​rfc1350.html|RFC 1350: TFTP]]
 +    * [[http://​rfc.net/​rfc2131.html|RFC 2131: DHCP]] ​
 +
 +=== External resources ===
 +
 +    * [[http://​www.clustermatic.org|Clustermatic]]
 +    * [[http://​supermon.sourceforge.net/​|Supermon]]
 +
  
nimbus.txt · Last modified: 2010/04/15 21:18 (external edit)