Batch System - Monitoring the current farm and job status


Here are some of the most useful statements to query the current farm status:

Command

Provided information

qhost

Print out execution host configuration and load

qstat -g c

Print out the current queue utilization

qstat -u <user>

Only show jobs of a special user

qstat -j <job id>

Print out detailed information about the job with the specified job id


Jobs queried by qstat can be in different states:

Status

Explanation

qw

job is waiting for execution

r

job is currently running

Eqw

job has failed, use the command sge-job-error <job id> to determine why. After that either delete the job with qdel <job id> (if it is a permanent error) or clear the error status with qmod -cj <job id> (if the error reason was temporary)

Rq / Rr

job has been requeued / restarted as it was running on a node that crashed

The farm status can also be visualized in the web browser. From the MACBAT overview page more detailed information can be retrieved by clicking on the link for a farm. Please see also the chapter on retrieving Job Status Information.