User Documentation

Skip to end of metadata
Go to start of metadata

Preparation

Create a slurm user on all nodes of the cluster

groupadd slurm
useradd -m slurm -g slurm

for i in `seq -f "pp%02g" 5 16`;do
  scp /etc/passwd /etc/shadow /etc/group /etc/gshadow $i:/etc/.
done

Synchronize clock on all nodes of the cluster

xdsh head2,pp05-pp16 ntpdate 172.30.205.1

Build munge

Do this on the same OS that will run on the nodes

wget -c https://munge.googlecode.com/files/munge-0.5.11.tar.bz2
yum install -y bzip2-devel openssl-devel rpm-build
rpmbuild -tb --clean munge-0.5.11.tar.bz2 

RPMS are found under rpmbuild/RPMS/x86_64 in the current directory

Build slurm

Extra features
Provide support for MySQL accounting
Provide support for lua plugins
Provide support for GUI (sview)
yum install -y mysql-devel lua lua-devel gtk2-devel rpm-build readline-devel perl-ExtUtils-MakeMaker pam-devel freeipmi-devel hwloc-devel

For redhad 6.1 use:

yum localinstall ftp://ftp-stud.fht-esslingen.de/pub/Mirrors/centos/6.5/os/x86_64/Packages/lua-devel-5.1.4-4.1.el6.x86_64.rpm

Add a new file .rpmmacros in /root which includes

echo '%_with_lua     "--with-lua"' > ~/.rpmmacros
yum localinstall ~/rpmbuild/RPMS/x86_64/munge-*.rpm -y

export VER=14.03.7
wget -c http://www.schedmd.com/download/latest/slurm-$VER.tar.bz2
rpmbuild -tb --clean slurm-$VER.tar.bz2

Copy RPMS to deployment server

 
scp ~/rpmbuild/RPMS/x86_64/{munge,slurm}-*rpm dep1:/install/post/otherpkgs/centos6.4/x86_64/

Install munge on all nodes

Create a munge.key on the headnode and propagate it on all nodes. Ensure this file is permissioned 0400 and owned by the same user that the munged daemon will run as.

dd if=/dev/urandom bs=1 count=1024 > /etc/munge/munge.key

for i in `seq -f "pp%02g" 5 16`;do
  scp /etc/munge/munge.key $i:/etc/munge/
done

Start munged on all nodes

xdsh head2,pp05-pp16 service munge start

Install slurm on all nodes

Create a slurm configuration file using http://slurm.schedmd.com/configurator.html

Make sure the version installed is the one used at above link. Actual file can be found at /usr/share/doc/slurm-*/html/configurator.html

Sample configuration file
# slurm.conf file generated by configurator.html.
# Put this file on all nodes of your cluster.
# See the slurm.conf man page for more information.
#
ControlMachine=head2
#ControlAddr=
#BackupController=
#BackupAddr=
# 
AuthType=auth/munge
CacheGroups=0
#CheckpointType=checkpoint/none 
CryptoType=crypto/munge
#DisableRootJobs=NO 
#EnforcePartLimits=NO 
#Epilog=
#EpilogSlurmctld= 
#FirstJobId=1 
#MaxJobId=999999 
#GresTypes= 
#GroupUpdateForce=0 
#GroupUpdateTime=600 
#JobCheckpointDir=/var/slurm/checkpoint 
#JobCredentialPrivateKey=
#JobCredentialPublicCertificate=
#JobFileAppend=0 
#JobRequeue=1 
#JobSubmitPlugins=1 
#KillOnBadExit=0 
#Licenses=foo*4,bar 
#MailProg=/bin/mail 
#MaxJobCount=5000 
#MaxStepCount=40000 
#MaxTasksPerNode=128 
MpiDefault=none
#MpiParams=ports=#-# 
#PluginDir= 
#PlugStackConfig= 
#PrivateData=jobs 
ProctrackType=proctrack/pgid
#Prolog=
#PrologSlurmctld= 
#PropagatePrioProcess=0 
#PropagateResourceLimits= 
#PropagateResourceLimitsExcept= 
ReturnToService=1
#SallocDefaultCommand= 
SlurmctldPidFile=/var/run/slurmctld.pid
SlurmctldPort=6817
SlurmdPidFile=/var/run/slurmd.pid
SlurmdPort=6818
SlurmdSpoolDir=/tmp/slurmd
SlurmUser=slurm
#SlurmdUser=root 
#SrunEpilog=
#SrunProlog=
StateSaveLocation=/home/slurm
SwitchType=switch/none
#TaskEpilog=
TaskPlugin=task/none
#TaskPluginParam=
#TaskProlog=
#TopologyPlugin=topology/tree 
#TmpFs=/tmp 
#TrackWCKey=no 
#TreeWidth= 
#UnkillableStepProgram= 
#UsePAM=0 
# 
# 
# TIMERS 
#BatchStartTimeout=10 
#CompleteWait=0 
#EpilogMsgTime=2000 
#GetEnvTimeout=2 
#HealthCheckInterval=0 
#HealthCheckProgram= 
InactiveLimit=0
KillWait=30
#MessageTimeout=10 
#ResvOverRun=0 
MinJobAge=300
#OverTimeLimit=0 
SlurmctldTimeout=120
SlurmdTimeout=300
#UnkillableStepTimeout=60 
#VSizeFactor=0 
Waittime=0
# 
# 
# SCHEDULING 
#DefMemPerCPU=0 
FastSchedule=1
#MaxMemPerCPU=0 
#SchedulerRootFilter=1 
#SchedulerTimeSlice=30 
SchedulerType=sched/backfill
SchedulerPort=7321
SelectType=select/linear
#SelectTypeParameters=
# 
# 
# JOB PRIORITY 
#PriorityType=priority/basic 
#PriorityDecayHalfLife= 
#PriorityCalcPeriod= 
#PriorityFavorSmall= 
#PriorityMaxAge= 
#PriorityUsageResetPeriod= 
#PriorityWeightAge= 
#PriorityWeightFairshare= 
#PriorityWeightJobSize= 
#PriorityWeightPartition= 
#PriorityWeightQOS= 
# 
# 
# LOGGING AND ACCOUNTING 
#AccountingStorageEnforce=0 
#AccountingStorageHost=
#AccountingStorageLoc=
#AccountingStoragePass=
#AccountingStoragePort=
AccountingStorageType=accounting_storage/none
#AccountingStorageUser=
AccountingStoreJobComment=YES
ClusterName=cluster
#DebugFlags= 
#JobCompHost=
#JobCompLoc=
#JobCompPass=
#JobCompPort=
JobCompType=jobcomp/none
#JobCompUser=
JobAcctGatherFrequency=30
JobAcctGatherType=jobacct_gather/none
SlurmctldDebug=3
#SlurmctldLogFile=
SlurmdDebug=3
#SlurmdLogFile=
#SlurmSchedLogFile= 
#SlurmSchedLogLevel= 
# 
# 
# POWER SAVE SUPPORT FOR IDLE NODES (optional) 
#SuspendProgram= 
#ResumeProgram= 
#SuspendTimeout= 
#ResumeTimeout= 
#ResumeRate= 
#SuspendExcNodes= 
#SuspendExcParts= 
#SuspendRate= 
#SuspendTime= 
# 
# 
# COMPUTE NODES 
NodeName=pp[05-16] CPUs=8 RealMemory=32107 Sockets=2 CoresPerSocket=4 ThreadsPerCore=1 State=UNKNOWN 
PartitionName=debug Nodes=pp[05-16] Default=YES MaxTime=INFINITE State=UP

Progate the configuration file on all nodes

for i in `seq -f "pp%02g" 5 16`;do
  scp /etc/slurm/slurm.conf $i:/etc/slurm/
done

Start slurmctl and slurmd on all nodes

xdsh head2,pp05-pp16 service slurm start

Configure PAM

  1. Enable SLURM's use of PAM by setting UsePAM=1 in slurm.conf.
  2. Establish a PAM configuration file for slurm in /etc/pam.d/slurm. A basic configuration you might use is:
    auth     required  pam_localuser.so
    account  required  pam_unix.so
    session  required  pam_limits.so
    
Copy /etc/pam.d/slurm file on all nodes

Configure MEMLOCK limits to unlimited, needed by MPI

  1. Set the desired limits in /etc/security/limits.conf. For example, to set the locked memory limit to unlimited for all users:
    *   hard   memlock   unlimited
    *   soft   memlock   unlimited
    
Copy /etc/security/limits.conf file on all nodes
  1. You need to disable SLURM's forwarding of the limits from the session from which the srun initiating the job ran. By default all resource limits are propagated from that session. For example, adding the following line to slurm.conf will prevent the locked memory limit from being propagated:PropagateResourceLimitsExcept=MEMLOCK.

Configure pam_slurm plugin

Add the following line in /etc/pam.d/sshd in all compute nodes

account    required     pam_slurm.so
#%PAM-1.0
auth       required     pam_sepermit.so
auth       include      password-auth
account    required     pam_slurm.so
account    required     pam_nologin.so
account    include      password-auth
password   include      password-auth

If you always want to allow access for an administrative group (eg, wheel), stack the pam_access module ahead of pam_slurm:

account    sufficient   pam_access.so
account    required     pam_slurm.so

Then edit the pam_access configuration file (/etc/security/access.conf):

+:wheel:ALL
-:ALL:ALL

Accounting

Accounting using text files

Put the following in slurm.conf

AccountingStorageLoc=/var/log/slurm/accounting
AccountingStorageType=accounting_storage/filetxt
AccountingStoreJobComment=YES
ClusterName=cluster
JobCompLoc=/var/log/slurm/job_completions
JobCompType=jobcomp/filetxt
JobAcctGatherType=jobacct_gather/linux

Create the directory /var/log/slurm/ and chown it to slurm user

mkdir /var/log/slurm
chown slurm: /var/log/slurm

Accounting using MySQL Database

Install mysql server

yum install -y mysql
yum install -y mysql-server

Verify the installation

mysqladmin --version

Add the following line in /etc/my.cnf under the [mysqld] reference

innodb_buffer_pool_size=64M

Start the service and start on boot

service mysqld start
chkconfig mysqld on

Change blank password

mysqladmin -u root password "password";

Slurm accounting configuration

Add the following entries in slurm.conf

# LOGGING AND ACCOUNTING 
#AccountingStorageEnforce=associations,limits,qos,wckeys 
AccountingStorageHost=head2
#AccountingStorageLoc=/var/log/slurm/accounting
#AccountingStoragePass=
AccountingStoragePort=6819
AccountingStorageType=accounting_storage/slurmdbd
#AccountingStorageUser=
AccountingStoreJobComment=YES
ClusterName=testcluster
#DebugFlags= 
#JobCompHost=
JobCompLoc=/var/log/slurm/job_completions
#JobCompPass=
#JobCompPort=
JobCompType=jobcomp/filetxt
#JobCompUser=
JobAcctGatherFrequency=30
JobAcctGatherType=jobacct_gather/linux
SlurmctldDebug=3
SlurmctldLogFile=/var/log/slurm/slurmctld.log
SlurmdDebug=3
#SlurmdLogFile=/var/log/slurm/slurmd
SlurmSchedLogFile=/var/log/slurm/slurmSched.log
SlurmSchedLogLevel=3

SlurrmDBD Configuration

Add the following entries in slurmdbd.conf

#
# Example slurmdbd.conf file.
#
# See the slurmdbd.conf man page for more information.
#
# Archive info
#ArchiveJobs=yes
#ArchiveDir="/tmp"
#ArchiveSteps=yes
#ArchiveScript=
#JobPurge=12
#StepPurge=1
#
# Authentication info
AuthType=auth/munge
#AuthInfo=/var/run/munge/munge.socket.2
#
# slurmDBD info
DbdAddr=localhost
DbdHost=head2
DbdPort=6819
SlurmUser=slurm
#MessageTimeout=300
DebugLevel=4
#DefaultQOS=normal,standby
LogFile=/var/log/slurm/slurmdbd.log
PidFile=/var/run/slurmdbd.pid
#PluginDir=/usr/lib/slurm
#PrivateData=accounts,users,usage,jobs
#TrackWCKey=yes
#
# Database info
StorageType=accounting_storage/mysql
StorageHost=head2
StoragePort=3306
StoragePass=password
StorageUser=slurm
StorageLoc=slurm_acct_db

MySQL Configuration

In mysql grant privileges to slurm user

grant all on slurm_acct_db.* TO 'slurm'@'localhost' identified by 'password' with grant option;

Start slurmdbd

service slurmdbd start

Database configuration

  • Add cluster
    sacctmgr add cluster testcluster
    
  • Add accounts
    sacctmgr add account test Cluster=testcluster
    
  • Add users
    sacctmgr add user andreas DefaultAccount=test
    
  • Add CPU allocation to account
    sacctmgr modify account maths set GrpCPUMins=60000
    

Report utilization

  • Show account utilization by user
    sreport cluster AccountUtilizationByUser start=05/01/14 end=05/28/14
    
  • Show top users
    sreport user topusage start=05/01/14 end=05/28/14
    

Job Scheduling

Job Submit Plugin

To use the lua job_submit plugin the following steps are required:
1. Modify the job_submit.lua script which can be found under /root/slurm-14.03.3-2/contribs/lua as seen below:

--      LOGIC FOR CYI STARTS HERE
        local new_partition = nil
        
        
        if job_desc.max_nodes >= 1 and job_desc.max_nodes <= 4 then
                log_info("Partition: CPUs")
                new_partition = "cpus"
        elseif job_desc.max_nodes >= 5 and job_desc.max_nodes <=8 then
                log_info("Partition: CPUm")
                new_partition = "cpum"
        elseif job_desc.max_nodes >= 9 and job_desc.max_nodes <=12 then
                log_info("Partition: CPUl")
                new_partition = "cpul"
        else
                log_user("Error: Number of requested nodes is not available.")
                log_info("slurm_job_submit: Error Invalid number of nodes requested.")
        end

        log_info("slurm_job_submit: job from uid %d, setting partition value: %s", job_desc.user_id, new_partition)
        job_desc.partition = new_partition


--      LOGIC FOR CYI ENDS HERE

2. Copy the job_submit.lua plugin in the /etc/slurm directory (i.e. the directory where slurm.conf resides)
3. Set the following option in slurm.conf so that the lua plugin will be used during the job submission:

JobSubmitPlugins=lua

4. Restart slurm service on the head node

Partitions

The partitions are defined in slurm.conf as seen below:

#Partitions
PartitionName=batch Nodes=pp[05-16] Default=YES MaxTime=INFINITE State=UP
PartitionName=cpus Nodes=pp[05-16] MinNodes=1 MaxNodes=4 MaxTime=24:00:00 STATE=UP
PartitionName=cpum Nodes=pp[05-16] MinNodes=5 MaxNodes=8 MaxTime=14:00:00 STATE=UP
PartitionName=cpul Nodes=pp[05-16] MinNodes=9 MaxNodes=12 MaxTime=08:00:00 STATE=UP

The following option needs to be added in slurm.conf to enforce the partition limits

EnforcePartLimits=YES

Reservations

Reservation creation in SLURM:

scontrol create reservation=test1 duration=INFINITE start=NOW user=andreas nodes=all

Reservation deletion:

scontrol delete reservation=test1

For standing reservations a flag should be used to define the frequency of the reservation:

scontrol create reservation=test1 duration=INFINITE start=NOW user=andreas nodes=all flags=daily

QoS

Qos creation:

sacctmgr create qos name=low 

Change QoS priority:

sacctmgr modify qos name=low set Priority=0

Set default QoS for a user:

sacctmgr modify user name=thekla set QoSLevel=low
Labels:
None
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.