Batch System - Monitoring the current farm and job status

Here are some of the most useful statements to query the current farm status:


Provided information


Print out execution host configuration and load

qstat -g c

Print out the current queue utilization

qstat -u <user>

Only show jobs of a special user

qstat -j <job id>

Print out detailed information about the job with the specified job id

Jobs queried by qstat can be in different states:




job is waiting for execution


job is currently running


job has failed, use the command sge-job-error <job id> to determine why. After that either delete the job with qdel <job id> (if it is a permanent error) or clear the error status with qmod -cj <job id> (if the error reason was temporary)

Rq / Rr

job has been requeued / restarted as it was running on a node that crashed

The farm status can also be visualized in the web browser. From the MACBAT overview page more detailed information can be retrieved by clicking on the link for a farm. Please see also the chapter on retrieving Job Status Information.