Torque batch system
This website shows information regarding the following topics:
Commands for Torque
The command to submit jobs is called qsub
. To submit a batch job use
qsub <further options> [<job script>]
The job script may be omitted for interactive jobs (see below). After submission, qsub will output the Job ID of your job. It can later be used for identification purposes and is also available as the environment variable $PBS_JOBID
in job scripts (see below). These are the most important options for the qsub
command:
Option | Meaning |
---|---|
-N <job name> |
Specifies the name which is shown with qstat . If the option is omitted, the name of the batch script file is used. |
-l nodes=<# of nodes>:ppn=<nn> |
Specifies the number of nodes requested. All current clusters (except the SandyBridge partition within Woody) require you to always request full nodes. Thus, for Emmy you always need to specify :ppn=40 , and for Woody (usually) :ppn=4 . For other clusters, see the documentation of the respective clusters for the correct ppn values. |
-l walltime=HH:MM:SS |
Specifies the required wall clock time (runtime). When the job reaches the walltime given here it will be sent a TERM signal. After a few seconds, if the job has not ended yet, it will be sent KILL . If you omit the walltime option, a – very short – default time will be used. Please specify a reasonable runtime, since the scheduler bases its decisions also on this value (short jobs are preferred). |
-M x@y -m abe |
You will get e-mail to x@y when the job is aborted (a), starting (b), and ending (e). You can choose any subset of abe for the -m option. If you omit the -M option, the default mail address assigned to your RRZE account will be used. |
-o <standard output file> |
File name for the standard output stream. If this option is omitted, a name is compiled from the job name (see -N ) and the job ID. |
-e <error output file> |
File name for the standard error stream. If this option is omitted, a name is compiled from the job name (see -N ) and the job ID. |
-I |
Interactive job. It is still allowed to specify a job script, but it will be ignored, except for the PBS options it might contain. No code will be executed. Instead, the user will get an interactive shell on one of the allocated nodes and can execute any command there. In particular, you can start a parallel program with mpirun . |
-X |
Enable X11 forwarding. If the $DISPLAY environment variable is set when submitting the job, an X program running on the compute node(s) will be displayed at the user’s screen. This makes sense only for interactive jobs (see -I option). |
-W depend:<dependency list> |
Makes the job depend on certain conditions. E.g., with -W depend=afterok:12345 the job will only run after Job 12345 has ended successfully, i.e. with an exit code of zero. Please consult the qsub man page for more information. |
-q <queue> |
Specifies the Torque queue (see above); default queue is route . Usually it is not required to use this parameter as the route queue automatically forwards the job to an appropriate execution queue. |
There are several Torque commands for job inspection and control. The following table gives a short summary:
Command | Purpose | Options |
---|---|---|
qstat [<options>] [<JobID>|<queue>] |
Displays information on jobs. Only the user’s own jobs are displayed. For information on the overall queue status see the section on job priorities. | -a display “all” jobs in user-friendly format-f extended job info-r display only running jobs |
qdel <JobID> ... |
Removes job from queue | – |
qalter <qsub-options> |
Changes job parameters previously set by qsub . Only certain parameters may be changed after the job has started. |
see qsub and the qalter manual page |
qcat [<options>] <JobID> |
Displays stdout/stderr from a running job | -o display stdout (default)-e display stderr-f output appended data as the job is running (like tail -f |
The scheduler typically sets environment variables to tell the job about what resources were allocated to it. These can also be used in batch scripts. The most useful are given below:
Job ID | $PBS_JOBID |
Directory from which the job was submitted | $PBS_O_WORKDIR |
List of nodes on which job runs (filename) | cat $PBS_NODEFILE |
Number of nodes allocated to job | $PBS_NUM_NODES |
Batch scripts for Torque
To submit a batch job you have to write a shell script that contains all the commands to be executed. Job parameters like estimated runtime and required number of nodes/CPUs can also be specified there (instead of on the command line):
#!/bin/bash -l # # allocate 4 nodes (80 Cores / 160 SMT threads) for 6 hours #PBS -l nodes=4:ppn=40,walltime=06:00:00 # # job name #PBS -N Sparsejob_33 # # first non-empty non-comment line ends PBS options #load required modules (compiler, MPI, ...) module load example1 # jobs always start in $HOME - # change to work directory cd ${PBS_O_WORKDIR} # uncomment the following lines to use $FASTTMP # mkdir ${FASTTMP}/$PBS_JOBID # cd ${FASTTMP}/$PBS_JOBID # copy input file from location where job was submitted # cp ${PBS_O_WORKDIR}/inputfile . # run, using only physical cores mpirun -n 80 a.out -i inputfile -o outputfile |
#!/bin/bash -l # # allocate 1 node (4 Cores) for 6 hours #PBS -l nodes=1:ppn=4,walltime=06:00:00 # # job name #PBS -N Sparsejob_33 # # first non-empty non-comment line ends PBS options #load required modules (compiler, ...) module load intel64 # jobs always start in $HOME - # change to work directory cd ${PBS_O_WORKDIR} export OMP_NUM_THREADS=4 # run ./a.out |
The comment lines starting with #PBS
are ignored by the shell but interpreted by Torque as options for job submission (see above for an options summary). These options can all be given on the qsub
command line as well. The example also shows the use of the $FASTTMP
and $HOME
variables. $PBS_O_WORKDIR
contains the directory where the job was submitted. All batch scripts start executing in the user’s $HOME
so some sort of directory change is always in order.
If you have to load modules from inside a batch script, you can do so. The only requirement is that you have to use either a csh
-based shell or bash
with the -l
switch, like in the example above.
Interactive Jobs with Torque
For testing purposes or when running applications that require some manual intervention (like GUIs), Torque offers interactive access to the compute nodes that have been assigned to a job. To do this, specify the -I
option to the qsub
command and omit the batch script. When the job is scheduled, you will get a shell on the master node (the first in the assigned job node list). It is possible to use any command, including mpirun
, there. If you need X forwarding, use the -X
option in addition to -I
.
Note that the starting time of an interactive batch job cannot reliably be determined; you have to wait for it to get scheduled. Thus we recommend to always run such jobs with wallclock time limits less than one hour, so that the job will be routed to the devel
queue for which a number of nodes is reserved during working hours.
Interactive batch jobs do not produce stdout
and stderr
files. If you want a protocol of what’s happened, use e.g. the UNIX script
command.