Running Jobs on Titan
Table of Contents
- Titan's Job Scheduler - SLURM
- Documentation
- Translating to SLURM commands from other workload managers
- Basic SLURM Commands
- squeue
- sinfo
- scontrol
- sbatch
- scancel
- Titan's Environment Module System - LMOD
- Listing all available modules on Titan
- Loading a module into your environment
- Listing all modules in your environment
- Removing modules from your environment
- Sample Jobs
- Non-Parallel Job
- Parallel Job
Titan's Job Scheduler - SLURM
Titan uses SLURM to manage jobs on the cluster. The Simple Linux Utility for Resource Management (SLURM) is an open-source, scalable cluster management and job scheduling system. It is used on about 60% of the largest compute clusters in the world.
Learning SLURM
Online: Official SLURM documentation
On Titan: Use the man command to learn more about the commands.
Example: Typing man sbatch will give you the manual page for the sbatch command.
Got scripts for other workload managers?
If you have scripts written for other workload managers like PBS/Torque, LSF, etc., please refer to this conversion guide for the most common SLURM commands, environment variables, and job specification options.
Information on jobs (squeue)
The squeue command provides information about jobs running on Titan in the following format: Job ID, Partition, Name, User, Job State, Time, Nodes, Reason.
Typing squeue lists all current (and pending) jobs in the queue.
Typing squeue -u <username> lists all jobs in the queue for the specified user.
Typing squeue -u <username> -t PENDING lists all the pending jobs for the specified
user.
Typing squeue -u <username> -t RUNNING lists all the running jobs for the specified
user.
More information about the squeue command can be found on the SLURM online documentation or by typing man squeue on Titan.
Partition Information (sinfo)
The sinfo command provides information about the partitions (or queues) you have access to on Titan. The information is displayed in the following format: Partition Name, Availability, Time Limit, Nodes, State, Node List.
Typing sinfo provides information on all the queues you are assigned access to.
Typing sinfo -p <partition name> provides information for the specified queue.
More information about the sinfo command can be found on the SLURM online documentation or by typing man sinfo on Titan.
Debugging Jobs (scontrol)
The scontrol command can be used for getting configuration details about a job, node or partition, and is especially useful when debugging jobs.
scontrol show job <job number> gives details about the particular job.
scontrol show partition provides configuration details (example: priority, allowed
accounts, allowed memory per node, etc.) of all available partitions.
More information about the scontrol command can be found on the SLURM online documentation or by typing man scontrol on Titan.
Submitting Batch Jobs (sbatch)
The sbatch command is used to submit jobs to SLURM. The SBATCH command within a script file can be used to specify resource parameters, such as: job name, output file, run time, etc. The Sample Jobs section below goes over some basic sbatch commands and SBATCH flags.
More information about the sbatch command can be found on the SLURM online documentation or by typing man sbatch on Titan.
Cancelling Jobs (scancel)
Use the scancel command to cancel pending and running jobs.
scancel <job number> cancels the specified job
scancel -u <username> cancels all jobs for the specified user
scancel -t PENDING -u <username> cancels all pending jobs for the specified user
More information about the scancel command can be found on the SLURM online documentation or by typing man scancel on Titan.
Titan's Environment Module System - LMOD
Titan hosts a large number of software, compilers and libraries to meet the needs of our users. Often times, we have to deploy multiple versions of the same software or compilers. Lmod helps manage such installations by setting up modules. Users can specify their custom environment by only loading modules that they need.
LMOD modules are displayed in the following form: <application name>/<application version>-<compiler type>
Example: NAMD/2.11-intel-2016a-mpi
Software that has not been compiled will not have the <compiler type> information. Example: MATLAB/2015a
Listing all available modules on Titan
Typing module avail gives you a list of all the modules installed on Titan. The list
will be in the following format: <application name>/<application version>-<compiler
type>.
Default Modules
You will notice some modules have a (D) next to them. This is for modules that have
multiple versions available for it. The (D) indicates that particular version of the
module has been designated as the default module, and is the version that gets loaded
in your environment if you do not specify the application version.
Example:
The Autoconf/2.69-intel-2016a has been designated as the default module for Autoconf
on Titan.
Autoconf/2.69-GCC-4.9.2
Autoconf/2.69-GCC-4.9.3-2.25
Autoconf/2.69-GNU-4.9.3-2.25
Autoconf/2.69-goolf-1.4.10
Autoconf/2.69-intel-2016a (D)
Loading a module
Typing module load <application name> will load the default version of the module
AND its dependencies.
Typing module load <application name>/<application version>-<compiler type> will load
that specific module AND its dependencies
Example:
The following versions of Autoconf have been installed on Titan:
Autoconf/2.69-GCC-4.9.2
Autoconf/2.69-GCC-4.9.3-2.25
Autoconf/2.69-GNU-4.9.3-2.25
Autoconf/2.69-goolf-1.4.10
Autoconf/2.69-intel-2016a (D)
Typing module load Autoconf will load Autoconf/2.69-intel-2016a
If you wish to load Autoconf/2.69-GCC-4.9.2, you will have to type module load
Autoconf/2.69-GCC-4.9.2
Listing all loaded modules in your environment
Typing module list displays all the modules currently loaded in your environment
Removing modules from your environment
To remove specific modules, type module unload <application name>
To remove ALL modules, type module purge
Sample Jobs
A typical job script has two parts: requesting resources and job steps. Requesting resources include specifying the amount of CPU needed, time to run, memory, etc. This is done within your script by using the SBATCH command. Job steps describe tasks that must be performed.
Non-Parallel Job
The following code is a simple non-parallel job that uses the hostname command to
get the name of the node that executed this job. You can create or edit this file
with your favorite editor. If you don't have a favorite editor, we suggest nano. The
filename of the submit script can be anything you like, but we suggest the extension
.sbatch, to distinguish it from other shell scripts. In this example, let's call it
single-test.sbatch.
#!/bin/bash # #SBATCH --partition=normal #SBATCH --ntasks=1 #SBATCH --mem=1024 #SBATCH --output=jobname_%J_stdout.txt #SBATCH --error=jobname_%J_stderr.txt #SBATCH --time=12:00:00 #SBATCH --job-name=jobname #SBATCH --mail-user=youremailaddress@yourinstitution.edu #SBATCH --mail-type=ALL #SBATCH --chdir=/home/yourusername/directory_to_run_in # ################################################# hostname |
After you have saved this file -- here called single-test.sbatch -- you will need to make it executable with the command
chmod +x single-test.sbatch
And then you can submit your job with
sbatch single-test.sbatch
Code Walkthrough
The SBATCH directive below says the name of the partition to be used. In most cases, you should use the queue named normal.
#SBATCH --partition=normal
The SBATCH directive below says to use 1 CPU core of 1 CPU chip on 1 compute node, meaning that this batch jobs is non-parallel (serial).
#SBATCH --ntasks=1
The SBATCH directive below indicates to the scheduler the amount of memory your job will use in megabytes. This is critical information for scheduling of nonexclusive jobs, since it prevents the scheduler from assigning more jobs to a given compute node than that node has memory for.
#SBATCH --mem=1024
The default unit is MB, but you can also specify GB, for example --mem=8G
The SBATCH directives below tells SLURM to send ouput and error messages to the filenames
listed below. Note that, in these filenames, %J will be replaced by the batch job
ID number.
#SBATCH --output=jobname_%J_stdout.txt
#SBATCH --error=jobname_%J_stderr.txt
The SBATCH directive below says to run for up to 12 hours (and zero minutes and zero seconds)
#SBATCH --time=12:00:00
The maximum time limit for most partitions is 48h, which can be specified as 48:00:00 or 2-00:00:00
The SBATCH directive below says the name of the batch job. This name will appear in the batch partition when you do an squeue command. You can rename jobname to any name you like.
#SBATCH --job-name=jobname
The SBATCH directive below says the e-mail address to send notifications to, which should be changed to your e-mail address.
#SBATCH --mail-user=youremailaddress@yourinstitution.edu
The SBATCH directive below says to e-mail a notification when the batch job either completes or fails. If you do not include this SBATCH directive, you will only get an e-mail if the batch job fails.
#SBATCH --mail-type=ALL
Change to the directory that you want to run in.
#SBATCH --chdir=/home/yourusername/directory_to_run_in
This directory needs to exist before the job is submitted.
This command gets the name of the compute node that runs the job. This is just a very simple example. You would put your actual executable there, or your job loop, or whatever payload you need to run.
hostname
Parallel Job
The following code is a simple parallel job that runs on 40 MPI processes at 20 MPI
processes per node.
Download the sample code from Wikipedia and save it as mpi_example.c in your working directory
Compile it using the following commands:
module load OpenMPI
mpicc mpi_example.c -o hello.mpi
Now run the following code (see the non-parallel job example above for more information about batch script naming and submission):
#!/bin/bash module load OpenMPI |
Code Walkthrough
The SBATCH directive below says the name of the partition to be used. In most cases, you should use the queue named normal.
#SBATCH --partition=normal
The SBATCH directive below says to request exclusive access on the participating compute nodes, so that other batch jobs (for example, those submitted by other users) don't run on the same compute nodes as this batch job, and therefore don't interfere with it.
#SBATCH --exclusive
Use 80 MPI processes at 40 MPI processes per node, which is to say 2 nodes in the
case of the normal partition.
Please use the following pattern for nodes in the normal partition:
For ntasks <= 40, please use ntasks-per-node equal to n unless you have a very
good reason to do otherwise.
For ntasks >= 40, please use ntasks-per-node equal to 20 unless you have a very
good reason to do otherwise.
This is because each compute node has 2 chips and each chip has 20 cores, for a total
of 40 cores per node. We recommend using the same number of MPI processes per node
as cores, unless you've benchmarked your code's performance and found that you take
fewer node hours by using fewer than 40 per node.
#SBATCH --nodes=2
#SBATCH --ntasks=80
#SBATCH --ntasks-per-node=40
The SBATCH directives below tells SLURM to send ouput and error messages to the filenames
listed below. Note that, in these filenames, %J will be replaced by the batch job
ID number.
#SBATCH --output=jobname_%J_stdout.txt
#SBATCH --error=jobname_%J_stderr.txt
The SBATCH directive below says to run for up to 10 minutes.
#SBATCH --time=10:00
The SBATCH directive below says the name of the batch job. This name will appear in the batch partition when you do an squeue command. You can rename jobname to any name you like.
#SBATCH --job-name=jobname
The SBATCH directive below says the e-mail address to send notifications to, which should be changed to your e-mail address.
#SBATCH --mail-user=youremailaddress@yourinstitution.edu
The SBATCH directive below says to e-mail a notification when the batch job either completes or fails. If you do not include this SBATCH directive, you will only get an e-mail if the batch job fails.
#SBATCH --mail-type=ALL |
Change to the directory that you want to run in.
#SBATCH --chdir=/home/yourusername/directory_to_run_in |
This command loads the modules needed to execute the program. In this case, we are using OpenMPI.
module load OpenMPI |
This command executes the hello.mpi file we compiled earlier.
mpirun hello.mpi |
Upon successfully completing your job, your output file should look like this:
We have 80 processes.
Process 1 reporting for duty.
Process 2 reporting for duty.
Process 3 reporting for duty.
Process 4 reporting for duty.
Process 5 reporting for duty.
Process 6 reporting for duty.
Process 7 reporting for duty.
Process 8 reporting for duty.
Process 9 reporting for duty.
Process 10 reporting for duty.
and so on