The job scheduler for the Kestrel cluster is SLURM, Simple Linux Utility Resource Manager. If you are porting jobs from another cluster with a different job scheduler, you will have to revise your submission script to comply with the standard SLURM requirements.
A job is a list of commands that are executed by SLURM on behalf of the user. These commands go into a job script which is submitted to a job queue. A program cannot be sent directly to SLURM to be executed, instead it must be called from within a job script. SLURM supports bash, csh and python scripts as job scripts.
The sbatch command is used to submit jobs. The sbatch command is covered exhaustively on YouTube from the SLURM Developers which is located here.
A job script has a header section which specifies the resources that are required to run the job as well as the commands that must be executed. Here’s an example of a basic job script.
Job submission priority is setup to queue lower wall time jobs first.
- User submits job script specifying resources needed
- When resources are available on the cluster the job is allocated to a group of nodes
- The system runs the job script on the first node of the group
- The STDOUT and STDERR of the job script are saved to the user directory
#Account and Email Information
##SBATCH -A quick
# Specify parition (queue)
# Join output and errors into output
#SBATCH -o sim_1_slurm.o%j
#SBATCH -e sim_1_slurm.e%j
# Specify job not to be rerunable
# Job Name
# Specify walltime
# Specify number of requested nodes:
#SBATCH -N 1
# Specify the total number of requested procs:
#SBATCH -n 8
# Specify the procs per node:
#Exclusively check out a node
module load gcc/4.8.1
module load openmpi/cuda75/gcc4.8.1/2.0.1
srun –mpi=pmi2 $your_executable
#SBATCH –mail-type=end –Specifies when SLURM will send an email to the user.
#SBATCH –email@example.com — Specifies the email address to send the email to in SLURM
#SBATCH –partition=batch –Specifies the queue for the user’s job. There are several different queues with different priorities.
#SBATCH -o filename.o%j –Specifies the file to send the output from STDOUT.
#SBATCH -e filename.e%j –Specifies the file to send the output from STDERR
#SBATCH –no-requeue –Specifies if the job is rerunnable
#SBATCH –job-name=”job_name“ –Sets the name for the user’s job.
#SBATCH -N 1 –Specifies the number of Nodes to reserve for a user’s job.
#SBATCH -n 8 –Specify the total number of processors requested for the user’s job.
#SBATCH –ntasks-per-node=8 –Specifies the number of processors per a node.
#SBATCH –exclusive –Specify if the user wants exclusive rights to the node. This will prevent job sharing on the same node.
module load – Loads the necessary software environment needed for the job. For a list of available applications use module available from the terminal prompt.
cd $SLURM_SUBMIT_DIR –Changes the nodes working directory to place where the job is submitted from.
srun –mpi=pmi2 $your_executable – Execute/run your code
squeue Simple one line listing of all job status.
squeue –U username Displays only the users jobs.
The command scancel is used to delete a job from the queue.