| Computer Center


Batch System - Monitoring the current farm and job status

Computer Center

Batch System - Monitoring the current farm and job status

Here are some of the most useful statements to query the current farm status:

Provided information
Print out execution host configuration and load
qstat -g c
Print out the current queue utilization
qstat -u <user>
Only show jobs of a special user
qstat -j <job id>
Print out detailed information about the job with the specified job id

Jobs queried by qstat can be in different states:

job is waiting for execution
job is transfering to the execution host
job is currently running
job has failed, use the command sge-job-error <job id> to determine why. After that either delete the job with qdel <job id> (if it is a permanent error) or clear the error status with qmod -cj <job id> (if the error reason was temporary)
Rq / Rr
job has been requeued / restarted as it was running on a node that crashed

The farm status can also be visualized in the web browser. From the MACBAT overview page more detailed information can be retrieved by clicking on the link for a farm. Please see also the chapter on retrieving Job Status Information.

Additionally there is a Grafana-based dashboard available visualizing some runtime details of your job. The URL can be retrieved via this command:

sge-job-url <job-id> [task-id]