Job submission

Computer Center

Job submission

All WGS nodes in Zeuthen can be used as HTCondor remote submit node. If you prefer “local” job submission, log in to htc-submit:

[wgs] ~ ssh htc-submit

(or alternatively to the backup system htc-submit2)

1. Simple job submit file

Use a submit file containing the relevant job information with Executable pointing to your job script.

Executable  = /path/to/your/jobscript

Log         = /path/to/log/log_$(Cluster)_$(Process).txt

Output      = /path/to/output/out_$(Cluster)_$(Process).txt

Error       = /path/to/output/err_$(Cluster)_$(Process).txt



# as long as jobs run at DESY Zeuthen batch, please disable file transfer feature

should_transfer_files = no


# request 2GB RAM

request_memory = 2048



Queue 1

Variables like $(Cluster) & $(Process) are replaced as documented here.

Then submit the job with:

[wgs] ~ condor_submit <submit_file>

2. Common submit file options

Short overview of the most commonly used options in HTCondor submit files. For a complete overview, consult the official documentation.

Option	Description	Notes	Best practice
executable = /path/to/jobscript	Job script	No default available, required option!
output = /path/to/file	Job’s STDOUT goes into this file	No default available, required option!
error = /path/to/file	Job’s STDERR goes into this file	No default available, required option!
log = /path/to/file	HTCondor’s job logs go into this file	No default available, required option!
request_memory = 2G	Job’s maximum RAM (rss) usage	Default: 1G	Contact admin in case you need more than 64GB
request_disk = 2G	Job’s scratch space quota in $TMPDIR	Default: 1G	Contact admin in case you need more than 50GB
+RequestRuntime = 10 * $(HOUR)	Job’s (wallclock) runtime	Default: 48 * $(HOUR)	Stay below 2 days. Runtime of up to 7 days will work but is unsupported and discouraged. Jobs requesting longer runtimes will not start
request_cpus = 4	Multicore/Multithreaded job consuming more than 1 cpu core	Default: 1	Contact admin in case you need more than 16 cpu cores
request_gpus = 1	GPU job	Default: 0 (no GPU)
universe = container container_image = /path/to/container	Run job inside container	Default: empty	Find centrally-provided Apptainer images here: `/project/apptainer/images/*.sif`
notification = <Always\|Complete\|Error>	Send mail on job events		Not recommended for mass jobs
queue 100	Submit multiple jobs at once	Default: 1, queue statement required!	Also consider usage of max_materialize !

3. GPU jobs

The HTCondor batch farm provides access to several nVidia GPU devices. A current overview of the available resources can be obtained via:

[wgs] ~ condor_status -compact -constraint 'TotalGPUs>0' -af:h Machine TotalGPUs GPUs_DeviceName

4. Interactive job submissions

Interactive job submissions to a farm node are supported:

[wgs] ~ condor_submit -i

5. DAG jobs

As DAG jobs keep a watchdog process running on the submit node, it can suffer from expired Kerberos tickets during its runtime. Therefore do not use Kerberos authentication for long-lasting DAGs but switch to IDTOKEN-based authentication for such jobs.

Find here an example script to be used instead of condor_dag_submit directly. This has to be run on host htc-submit.zeuthen.desy.de or htc-submit2.zeuthen.desy.de. It will generate an IDTOKEN valid for 7 days (adapt to your demands!) and use it to submit the DAG:


#!/bin/bash



export _condor_SEC_TOKEN_DIRECTORY=$(mktemp -d)

condor_token_fetch -lifetime $((7*24*60*60)) -token dag



condor_submit_dag "$@"

exit $?