User Tools

Site Tools


raccoon

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

raccoon [2010/04/15 21:18] (current)
Line 1: Line 1:
 +====Raccoon====
 +
 +
 +
 +**Raccoon**'​s function is to translate Supermon'​s s-expression data format into XML required by Ganglia. In addition Raccoon provides a Wiki for the Maui scheduler (__NOTE__: Wiki is a term for either a user editable web page system, OR a service provided to Maui so that it may gather resource data and launch jobs on a particular system. Wiki in Hawiian = Quick, so many have used it for many different things!). These similar but seperate tasks are performed by a single package to avoid a great deal of duplicated calculations (and thus added system load). ​
 +This is accomplished through a single multi-threaded daemon (supermon_cm_server) written in //python//. The primary threads are: 
 +
 +  *  //​translate//​
 +
 +This performs the **supermon**- > //XML// translation. In many cases, this process requires calculating a delta between two consecutive samples from **Supermon** in order to convert from supermon'​s absolute values to more useful rate values. Ganglia'​s gmetad in turn reads these values from the daemon and stores them in it's database for use by the web frontend. ​
 +
 +  *  //maui wiki// ​
 +
 +In addition to providing //XML// data to ganglia, translate stores the data in a persistant object store. The wiki accepts several standard requests from Maui. A node info request, a job info request, and job control requests. The wiki maintains it's own persistant object store of jobs submitted for scheduling. It provides that data as well as node to maui on demand. //Maui// makes it's scheduling decisions and commands the wiki to start or kill jobs as needed to meet system policy. The wiki interfaes with bproc and the standard os job control calls to actually start or kill jobs. In addition, it detects when jobs terminate and informs the user through email. ​
 +
 +  *  //​xmlrpc// ​
 +
 +The **xmlrpc** thread accepts commands from user invoked utilities to submit or cancel jobs. It's job is simply to insert or modify job objects in the job store used by the wiki thread. ​
 +
 +
 +
 +The user submits a job using the cmsubmit command:
 +
 +**cmsubmit** //[options] -p <#​processors>​ -t <​time>​ program [args]//
 + 
 +**cmsubmit** allows users to submit jobs to the Clubmask system. ​
 +
 +**-c** <​**//​queue//​**>​ :
 +use a special queue 
 +
 +**-d** <​**//​dir//​**>​ :
 +directory to place stdout file 
 +
 +**-g** <​**//​group_name//​**>​ :
 +use alternate group for execution ​
 +
 +**-h** :
 +help 
 +
 +**-I** :
 +get the nodes interactively ​
 +
 +**-q** :
 +do not print any information ​
 +
 +**-w** <​**//​dir//​**>​ :
 +directory to use as the working directory -- default is current working directory from where cmsubmit was run 
 +
 +**-y** :
 +do not ask for confirmation ​
 +
 +You must specify all of the following: ​
 +**-p** <//​**num**//>​ :
 +number of processors ​
 +
 +**-t** <​**//​maxtime//​**>​ :
 +time needed in minutes ​
 +
 +It is important to note that the program will be run ON THE MASTER. ​
 +Racoon will set the environment variables **BEOWULF_JOB_MAP** and **NP** to specify where the job should run. MPI programs will understand these variables and do the right thing. Standalone batch jobs or other parallism systems will need to use bpsh to end up on the right node(s) such as: 
 +
 +**cmsubmit –p** 1t 10 **bpsh** //myprog myargs//​. ​
 +The remainder of job control is actually part of the Maui system. In brief: ​
 +
 +To see the current job que and status, use **showq**. ​
 +To cancel a request (even if the job has started), use **canceljob** <//​id//>​ where id is the job id seen in **showq**. ​
 +**diagnose –n** will show Maui's view of the system status. ​
 +
  
raccoon.txt · Last modified: 2010/04/15 21:18 (external edit)