The Omnivector Slurm Distribution is built on a suite of automations called "charms". Charms are the operational components that describe the lifecycle of a Slurm cluster. A full Slurm deployment comes in the form of multiple charms, one for each component of Slurm. A "bundle" is a YAML file where multiple charms can be defined. We use bundles to describe the interconnectivity and configuration of groups of charms.
OSD provisions Slurm to operate in configless mode. In this mode, the
slurmctld
process does the work of distributing the slurm.conf
file to
the nodes running slurmd
.
The slurm-charms are the components that encapsulate the operational know-how and automation needed to facilitate the lifecycle of a Slurm cluster.
The slurm-bundles define the base Slurm deployment configurations for different clouds and operating systems.
The Omnivector Slurm Distribution supports the following charm components as part of the Slurm-core offering:
-
: Compute and login nodes (running
slurmd
) -
: Slurm database node (running
slurmdbd
) -
: Slurm control node (running
slurmctld
) -
: Slurm REST service (running
slurmrestd
)
Additionally we require the Node Health Check (NHC) with a minimal configuration and checks to
ensure the slurm
and munge
processes are active. The cluster
administrator mus provide the tar.gz
for nhc
. It is possible, and
recommended, that the cluster administrator extends these checks. Check
:ref:`nhc` section for details on how to configure it.