Batch System - Overview

1. Available farm nodes
2. Submission hosts
3. Job runtime

1. Available farm nodes

node name

number of systems

CPU

clock frequency

cores

memory

scratch space in $TMPDIR

comment

bladeb*

16

Intel Xeon X5660

1.8GHz

12

48GB

480GB

 

bladec* / tcx17*

32

Intel Xeon X5675

3.08GHz

12

48GB

1.2TB

 

bladed*

8

Intel Xeon X5650

2.67GHz

12

48GB

480GB

1x nVidia Tesla M2090 GPGPU per node

blade{e,f}*

32

Intel Xeon E5-2660

2.2GHz

16

64GB

1.2TB

 

kepler{00..15}

16

Intel Xeon E5-2660

2.2GHz

16

64GB

480GB

2x nVidia Kepler K20 GPGPU per node

kepler{16..26}

11

Intel Xeon E5-2640 v3

2.6GHz

16

64GB

480GB

2x nVidia Kepler K80 GPGPU per node

blade{g,h}*

32

Intel Xeon E5-2640 v3

2.6GHz

16

64GB

1.2TB

 

bladei*

16

Intel Xeon E5-2640 v4

2.4GHz

20

64GB

1.2TB

 

pascal*

6

Intel Xeon Gold 6130

2.1GHz

32

384GB

1.2TB

6x nVidia Tesla P4 GPGPU per node

For an up-to-date overview, see also output of qhost command - details can be found under job monitoring.

2. Submission hosts 

  • Public login machines (pub[1..6])
  • Workgroup server (Linux)
  • Linux desktops
 

3. Job runtime

The batch farm is configured to optimize job throughput while providing some kind of interactive availability. Therefor we prefer job runtimes of 1-12 hours. Jobs running for more than 12 hours can only fill up the compute farm up to a certain percentage.

The maximum job runtime is currently limited to 48 hours: Jobs requesting a longer runtime won't ever start!

job runtime

description

0-30 minutes

allows a slight cpu oversubscription - this means, a job can start although all available slots are currently filled. Expect a slightly worse cpu performance! It should be used for test purposes only therefor.

30 minutes - 12 hours

The preferred job runtime. Allows the maximum farm usage while keeping an overall good "interactivity" i.e. fast job turnaround

12-24 hours

number of simultaneously running jobs is limited to 75% of the available slots

24-48 hours

number of simultaneously running jobs is limited to 66% of the available slots

A job runtime of less than 10 minutes should be avoided to keep a good ratio between overhead at job start/end and the actual job payload.