Job submission
Computer Center
Job submission
All WGS nodes in Zeuthen can be used as HTCondor remote submit node. If you prefer “local” job submission, log in to htc-submit:
[wgs] ~ ssh htc-submit
(or alternatively to the backup system htc-submit2
)
1. Simple job submit file
Use a submit file containing the relevant job information with Executable
pointing to your job script.
Executable = /path/to/your/jobscript
Log = /path/to/log/log_$(Cluster)_$(Process).txt
Output = /path/to/output/out_$(Cluster)_$(Process).txt
Error = /path/to/output/err_$(Cluster)_$(Process).txt
# as long as jobs run at DESY Zeuthen batch, please disable file transfer feature
should_transfer_files = no
# request 2GB RAM
request_memory = 2048
Queue 1
Variables like $(Cluster)
& $(Process)
are replaced as documented here.
Then submit the job with:
[wgs] ~ condor_submit <submit_file>
2. Common submit file options
Short overview of the most commonly used options in HTCondor submit files. For a complete overview, consult the official documentation.
Option |
Description |
Notes |
Best practice |
---|---|---|---|
executable = /path/to/jobscript |
Job script |
No default available, required option! |
|
output = /path/to/file |
Job’s STDOUT goes into this file |
No default available, required option! |
|
error = /path/to/file |
Job’s STDERR goes into this file |
No default available, required option! |
|
log = /path/to/file |
HTCondor’s job logs go into this file |
No default available, required option! |
|
request_memory = 2G |
Job’s maximum RAM (rss) usage |
Default: 1G |
Contact admin in case you need more than 64GB |
request_disk = 2G |
Job’s scratch space quota in $TMPDIR |
Default: 1G |
Contact admin in case you need more than 50GB |
|
Job’s (wallclock) runtime |
Default: |
Stay below 2 days. Runtime of up to 7 days will work but is unsupported and discouraged. Jobs requesting longer runtimes will not start |
request_cpus = 4 |
Multicore/Multithreaded job consuming more than 1 cpu core |
Default: 1 |
Contact admin in case you need more than 16 cpu cores |
request_gpus = 1 |
GPU job |
Default: 0 (no GPU) |
|
universe = container container_image = /path/to/container |
Run job inside container |
Default: empty |
Find centrally-provided Apptainer images here: /project/apptainer/images/*.sif |
notification = <Always|Complete|Error> |
Send mail on job events |
Not recommended for mass jobs |
|
Submit multiple jobs at once |
Default: 1, queue statement required! |
Also consider usage of max_materialize ! |
3. GPU jobs
The HTCondor batch farm provides access to several nVidia GPU devices. A current overview of the available resources can be obtained via:
[wgs] ~ condor_status -compact -gpus
More information in the official documentation.
4. Interactive job submissions
Interactive job submissions to a farm node are supported:
[wgs] ~ condor_submit -i
5. DAG jobs
As DAG jobs keep a watchdog process running on the submit node, it can suffer from expired Kerberos tickets during its runtime. Therefore do not use Kerberos authentication for long-lasting DAGs but switch to IDTOKEN-based authentication for such jobs.
Find here an example script to be used instead of condor_dag_submit
directly. This has to be run on host htc-submit.zeuthen.desy.de or htc-submit2.zeuthen.desy.de. It will generate an IDTOKEN valid for 7 days (adapt to your demands!) and use it to submit the DAG:
#!/bin/bash
export _condor_SEC_TOKEN_DIRECTORY=$(mktemp -d)
condor_token_fetch -lifetime $((7*24*60*60)) -token dag
condor_submit_dag "$@"
exit $?