Raccoon's function is to translate Supermon's s-expression data format into XML required by Ganglia. In addition Raccoon provides a Wiki for the Maui scheduler (NOTE: Wiki is a term for either a user editable web page system, OR a service provided to Maui so that it may gather resource data and launch jobs on a particular system. Wiki in Hawiian = Quick, so many have used it for many different things!). These similar but seperate tasks are performed by a single package to avoid a great deal of duplicated calculations (and thus added system load). This is accomplished through a single multi-threaded daemon (supermon_cm_server) written in python. The primary threads are:
This performs the supermon- > XML translation. In many cases, this process requires calculating a delta between two consecutive samples from Supermon in order to convert from supermon's absolute values to more useful rate values. Ganglia's gmetad in turn reads these values from the daemon and stores them in it's database for use by the web frontend.
In addition to providing XML data to ganglia, translate stores the data in a persistant object store. The wiki accepts several standard requests from Maui. A node info request, a job info request, and job control requests. The wiki maintains it's own persistant object store of jobs submitted for scheduling. It provides that data as well as node to maui on demand. Maui makes it's scheduling decisions and commands the wiki to start or kill jobs as needed to meet system policy. The wiki interfaes with bproc and the standard os job control calls to actually start or kill jobs. In addition, it detects when jobs terminate and informs the user through email.
The xmlrpc thread accepts commands from user invoked utilities to submit or cancel jobs. It's job is simply to insert or modify job objects in the job store used by the wiki thread.
The user submits a job using the cmsubmit command:
cmsubmit [options] -p <#processors> -t <time> program [args]
cmsubmit allows users to submit jobs to the Clubmask system.
-c <queue> : use a special queue
-d <dir> : directory to place stdout file
-g <group_name> : use alternate group for execution
-h : help
-I : get the nodes interactively
-q : do not print any information
-w <dir> : directory to use as the working directory – default is current working directory from where cmsubmit was run
-y : do not ask for confirmation
You must specify all of the following: -p <num> : number of processors
-t <maxtime> : time needed in minutes
It is important to note that the program will be run ON THE MASTER. Racoon will set the environment variables BEOWULF_JOB_MAP and NP to specify where the job should run. MPI programs will understand these variables and do the right thing. Standalone batch jobs or other parallism systems will need to use bpsh to end up on the right node(s) such as:
cmsubmit –p 1t 10 bpsh myprog myargs. The remainder of job control is actually part of the Maui system. In brief:
To see the current job que and status, use showq. To cancel a request (even if the job has started), use canceljob <id> where id is the job id seen in showq. diagnose –n will show Maui's view of the system status.