Skip to content

Latest commit

 

History

History
59 lines (46 loc) · 2.34 KB

architecture.rst

File metadata and controls

59 lines (46 loc) · 2.34 KB

OSD Architecture

The Omnivector Slurm Distribution is built on a suite of automations called "charms". Charms are the operational components that describe the lifecycle of a Slurm cluster. A full Slurm deployment comes in the form of multiple charms, one for each component of Slurm. A "bundle" is a YAML file where multiple charms can be defined. We use bundles to describe the interconnectivity and configuration of groups of charms.

OSD provisions Slurm to operate in configless mode. In this mode, the slurmctld process does the work of distributing the slurm.conf file to the nodes running slurmd.

Slurm Charms

The slurm-charms are the components that encapsulate the operational know-how and automation needed to facilitate the lifecycle of a Slurm cluster.

Slurm Bundles

The slurm-bundles define the base Slurm deployment configurations for different clouds and operating systems.

OSD Components

The Omnivector Slurm Distribution supports the following charm components as part of the Slurm-core offering:

  • slurmd-badge : Compute and login nodes (running slurmd)
  • slurmdbd-badge : Slurm database node (running slurmdbd)
  • slurmctld-badge : Slurm control node (running slurmctld)
  • slurmrestd-badge : Slurm REST service (running slurmrestd)

Additionally we require the Node Health Check (NHC) with a minimal configuration and checks to ensure the slurm and munge processes are active. The cluster administrator mus provide the tar.gz for nhc. It is possible, and recommended, that the cluster administrator extends these checks. Check :ref:`nhc` section for details on how to configure it.