DV-Zeuthen

| Computer Center

Job submission

Computer Center

Job submission

All WGS nodes in Zeuthen can be used as HTCondor remote submit node. If you prefer “local” job submission, log in to htc-submit:

[wgs] ~ ssh htc-submit

(or alternatively to the backup system htc-submit2)

1. Simple job submit file

Use a submit file containing the relevant job information with Executable pointing to your job script.

Executable  = /path/to/your/jobscript
Log         = /path/to/log/log_$(Cluster)_$(Process).txt
Output      = /path/to/output/out_$(Cluster)_$(Process).txt
Error       = /path/to/output/err_$(Cluster)_$(Process).txt

# as long as jobs run at DESY Zeuthen batch, please disable file transfer feature
should_transfer_files = no
# request 2GB RAM
request_memory = 2048

Queue 1

Variables like $(Cluster) & $(Process) are replaced as documented here.

Then submit the job with:

[wgs] ~ condor_submit <submit_file>

2. Common submit file options

Short overview of the most commonly used options in HTCondor submit files. For a complete overview, consult the official documentation.

Option
Description
Notes
Best practice
executable = /path/to/jobscript
Job script
No default available, required option!
 
output = /path/to/file
Job’s STDOUT goes into this file
No default available, required option!
 
error = /path/to/file
Job’s STDERR goes into this file
No default available, required option!
 
log = /path/to/file
HTCondor’s job logs go into this file
No default available, required option!
 
request_memory = 2G
Job’s maximum RAM (rss) usage
Default: 1G
Contact admin in case you need more than 64GB
request_disk = 2G
Job’s scratch space quota in $TMPDIR
Default: 1G
Contact admin in case you need more than 50GB
+RequestRuntime = 10 * $(HOUR)
Job’s (wallclock) runtime
Default: 48 * $(HOUR)
Stay below 2 days. Runtime of up to 7 days will work but is unsupported and discouraged. Jobs requesting longer runtimes will not start
request_cpus = 4
Multicore/Multithreaded job consuming more than 1 cpu core
Default: 1
Contact admin in case you need more than 16 cpu cores
request_gpus = 1
GPU job
Default: 0 (no GPU)
 
universe = container
container_image = /path/to/container
Run job inside container
Default: empty
Find centrally-provided Apptainer images here: /project/apptainer/images/*.sif
notification = <Always|Complete|Error>
Send mail on job events   Not recommended
for mass jobs

queue 100

Submit multiple jobs at once
Default: 1, queue statement required!
Also consider usage of
max_materialize !

3. GPU jobs

The HTCondor batch farm provides access to several nVidia GPU devices. A current overview of the available resources can be obtained via:

[wgs] ~ condor_status -compact -constraint 'TotalGPUs>0' -af:h Machine TotalGPUs GPUs_DeviceName


4. Interactive job submissions

Interactive job submissions to a farm node are supported:

[wgs] ~ condor_submit -i


5. DAG jobs

As DAG jobs keep a watchdog process running on the submit node, it can suffer from expired Kerberos tickets during its runtime. Therefore do not use Kerberos authentication for long-lasting DAGs but switch to IDTOKEN-based authentication for such jobs.

Find here an example script to be used instead of condor_dag_submit directly. This has to be run on host htc-submit.zeuthen.desy.de or htc-submit2.zeuthen.desy.de. It will generate an IDTOKEN valid for 7 days (adapt to your demands!) and use it to submit the DAG:

#!/bin/bash

export _condor_SEC_TOKEN_DIRECTORY=$(mktemp -d)
condor_token_fetch -lifetime $((7*24*60*60)) -token dag

condor_submit_dag "$@"
exit $?