对于我这个初学者而言,SLURM 学习还是有一定难度。本文参考:slurm入门,在此谢谢作者!
1. SLURM 的安装和使用
# 安装slurm及其依赖
sudo apt-get install slurm-llnl
2. 配置slurm
可使用
在线
是slurm 配置器,由它为我生成基于表单数据的配置文件。注意:需根据自己超算的情况进行修改。
面向单节点集群的SLURM 配置文件,这个还没弄,到时候再看!
# slurm.conf file generated by configurator.html.
# Put this file on all nodes of your cluster.
# See the slurm.conf man page for more information.
#
ControlMachine=mtj-VirtualBox
#
AuthType=auth/none
CacheGroups=0
CryptoType=crypto/openssl
MpiDefault=none
ProctrackType=proctrack/pgid
ReturnToService=1
SlurmctldPidFile=/var/run/slurmctld.pid
SlurmctldPort=6817
SlurmdPidFile=/var/run/slurmd.pid
SlurmdPort=6818
SlurmdSpoolDir=/tmp/slurmd
SlurmUser=slurm
StateSaveLocation=/tmp
SwitchType=switch/none
TaskPlugin=task/none
#
# TIMERS
InactiveLimit=0
KillWait=30
MinJobAge=300
SlurmctldTimeout=120
SlurmdTimeout=300
Waittime=0
#
# SCHEDULING
FastSchedule=1
SchedulerType=sched/backfill
SchedulerPort=7321
SelectType=select/linear
#
# LOGGING AND ACCOUNTING
AccountingStorageType=accounting_storage/none
ClusterName=cluster
JobCompType=jobcomp/none
JobCredentialPrivateKey = /usr/local/etc/slurm.key
JobCredentialPublicCertificate = /usr/local/etc/slurm.cert
JobAcctGatherFrequency=30
JobAcctGatherType=jobacct_gather/none
SlurmctldDebug=3
SlurmdDebug=3
#
# COMPUTE NODES
NodeName=mtj-VirtualBox State=UNKNOWN
PartitionName=debug Nodes=mtj-VirtualBox default=YES MaxTime=INFINITE State=UP
最后一步说是生成一组作业凭证秘钥,且使用openssl 作为其凭证秘钥。
清单2. 为slurm 创建凭证
$ sudo openssl genrsa -out /usr/local/etc/slurm.key 1024
Generating RSA private key, 1024 bit long modulus
.................++++++
............................................++++++
e is 65537 (0x10001)
$ sudo openssl rsa -in /usr/local/etc/slurm.key -pubout -out /usr/local/etc/slurm.cert
writing RSA key
完成凭证后,可以启动slurm 并与其交互。
3. 启动slurm
$ sudo /etc/init.d/slurm-llnl start
清单3. 使用sinfo命令来查看集群
$ info
================================================================
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
debug* up infinite 1 idle mtj-VirtualBox
================================================================
4. 更多的slurm命令
scontrol
清单4 . 用scontrol 了解集群详细信息
网友评论