Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore(merge): slurm-ops-manager into main #2

Merged
merged 6 commits into from
Jun 28, 2024
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
52 changes: 52 additions & 0 deletions .github/workflows/ci.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
# Copyright 2024 Canonical Ltd.
# See LICENSE file for licensing details.

name: hpc-libs tests
on:
workflow_call:
pull_request:

jobs:
inclusive-naming-check:
name: Inclusive naming check
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Run tests
uses: get-woke/woke-action@v0
with:
fail-on-error: true

lint:
name: Lint
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Install dependencies
run: python3 -m pip install tox
- name: Run linters
run: tox -e lint

integration-test:
name: Integration tests
runs-on: ubuntu-latest
needs:
- inclusive-naming-check
- lint
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Set up LXD
uses: canonical/[email protected]
with:
channel: 5.21/stable
- name: Set up gambol
run: |
wget https://github.com/NucciTheBoss/gambol/releases/download/v0.1.0-rc2/gambol_0.1.0_amd64-rc2.snap
sudo snap install ./gambol_*.snap --dangerous
sudo snap connect gambol:lxd lxd:lxd
sudo snap connect gambol:dot-gambol
- name: Run tests
run: tox -e integration
NucciTheBoss marked this conversation as resolved.
Show resolved Hide resolved
83 changes: 8 additions & 75 deletions charmcraft.yaml
Original file line number Diff line number Diff line change
@@ -1,85 +1,18 @@
# This file configures Charmcraft.
# See https://juju.is/docs/sdk/charmcraft-config for guidance.
# Copyright 2024 Canonical Ltd.
# See LICENSE file for licensing details.

# (Required)
# The charm package name, no spaces
# See https://juju.is/docs/sdk/naming#heading--naming-charms for guidance.
name: hpc-libs


# (Required)
# The charm type, either 'charm' or 'bundle'.
type: charm


# (Recommended)
title: Charm Template


# (Required)
summary: A very short one-line summary of the charm.


# (Required)
title: HPC Libs
summary: Collection of Charm libraries to manage HPC related services.
description: |
A single sentence that says what the charm is, concisely and memorably.

A paragraph of one to three short sentences, that describe what the charm does.

A third paragraph that explains what need the charm meets.

Finally, a paragraph that describes whom the charm is useful for.


# (Required for 'charm' type)
# A list of environments (OS version and architecture) where charms must be
# built on and run on.
A placeholder charm that contains helpful charm libraries curated by the
HPC team for use when authoring charms that need to manage HPC related services;
Slurm, Munge, etc.
type: charm
bases:
- build-on:
- name: ubuntu
channel: "22.04"
run-on:
- name: ubuntu
channel: "22.04"


# (Optional) Configuration options for the charm
# This config section defines charm config options, and populates the Configure
# tab on Charmhub.
# More information on this section at https://juju.is/docs/sdk/charmcraft-yaml#heading--config
# General configuration documentation: https://juju.is/docs/sdk/config
config:
options:
# An example config option to customise the log level of the workload
log-level:
description: |
Configures the log level of gunicorn.

Acceptable values are: "info", "debug", "warning", "error" and "critical"
default: "info"
type: string


# The containers and resources metadata apply to Kubernetes charms only.
# See https://juju.is/docs/sdk/metadata-reference for a checklist and guidance.
# Remove them if not required.


# Your workload’s containers.
containers:
httpbin:
resource: httpbin-image


# This field populates the Resources tab on Charmhub.
resources:
# An OCI image resource for each container listed above.
# You may remove this if your charm will run without a workload sidecar container.
httpbin-image:
type: oci-image
description: OCI image for httpbin
# The upstream-source field is ignored by Juju. It is included here as a
# reference so the integration testing suite knows which image to deploy
# during testing. This field is also used by the 'canonical/charming-actions'
# Github action for automated releasing.
upstream-source: kennethreitz/httpbin
2 changes: 2 additions & 0 deletions dev-requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
pytest ~= 7.2
pytest-order ~= 1.1
209 changes: 209 additions & 0 deletions lib/charms/hpc_libs/v0/slurm_ops.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,209 @@
# Copyright 2024 Canonical Ltd.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.


"""Library to manage the Slurm snap.

This library contains the `SlurmManager` class, which offers interfaces to use and manage
the Slurm snap inside charms.

### General usage

For starters, the `SlurmManager` constructor receives a `Service` enum as a parameter, which
helps the manager determine things like the correct service to enable, or the correct settings
key to mutate.

```
from charms.hpc_libs.v0.slurm_ops import (
Service,
SlurmManager,
)

class ApplicationCharm(CharmBase):
# Application charm that needs to use the Slurm snap.

def __init__(self, *args, **kwargs) -> None:
super().__init__(*args, **kwargs)

# Charm events defined in the NFSRequires class.
self._slurm_manager = SlurmManager(Service.SLURMCTLD)
self.framework.observe(
self.on.install,
self._on_install,
)

def _on_install(self, _) -> None:
self._slurm_manager.install()
self.unit.set_workload_version(self._slurm_manager.version())
self._slurm_manager.set_config("cluster-name", "cluster")
```
"""

import base64
import enum
import functools
import logging
import os
import subprocess
import tempfile

import yaml

_logger = logging.getLogger(__name__)

# The unique Charmhub library identifier, never change it
LIBID = "541fd767f90b40539cf7cd6e7db8fabf"

# Increment this major API version when introducing breaking changes
LIBAPI = 0

# Increment this PATCH version before using `charmcraft publish-lib` or reset
# to 0 if you are raising the major API version
LIBPATCH = 1


PYDEPS = ["pyyaml>=6.0.1"]


def _call(cmd: str, *args: [str]) -> bytes:
"""Call a command with logging.

Raises:
subprocess.CalledProcessError: Raised if the command fails.
"""
cmd = [cmd, *args]
_logger.debug(f"Executing command {cmd}")
try:
return subprocess.check_output(cmd, stderr=subprocess.PIPE, text=False)
except subprocess.CalledProcessError as e:
_logger.error(f"`{' '.join(cmd)}` failed")
_logger.error(f"stderr: {e.stderr.decode()}")
raise


def _snap(*args) -> str:
"""Control snap by via executed `snap ...` commands.

Raises:
subprocess.CalledProcessError: Raised if snap command fails.
"""
return _call("snap", *args).decode()


_get_config = functools.partial(_snap, "get", "slurm")
_set_config = functools.partial(_snap, "set", "slurm")


class Service(enum.Enum):
"""Type of Slurm service that will be managed by `SlurmManager`."""

SLURMD = "slurmd"
SLURMCTLD = "slurmctld"
SLURMDBD = "slurmdbd"
SLURMRESTD = "slurmrestd"

@property
def config_name(self) -> str:
"""Configuration name on the slurm snap for this service type."""
if self is Service.SLURMCTLD:
return "slurm"
return self.value


class SlurmManager:
"""Slurm snap manager.

This class offers methods to manage the Slurm snap for a certain service type.
The list of available services is specified by the `Service` enum.
"""

def __init__(self, service: Service):
self._service = service

def install(self):
"""Install the slurm snap in this system."""
# TODO: Pin slurm to the stable channel
_snap("install", "slurm", "--channel", "latest/candidate", "--classic")

def start(self):
"""Start and enables the managed slurm service and the munged service."""
_snap("start", "--enable", "slurm.munged")
_snap("start", "--enable", f"slurm.{self._service.value}")

def restart(self):
"""Restart the managed slurm service."""
_snap("restart", f"slurm.{self._service.value}")

def restart_munged(self):
"""Restart the munged service."""
_snap("restart", "slurm.munged")

def disable(self):
"""Disable the managed slurm service and the munged service."""
_snap("stop", "--disable", "slurm.munged")
_snap("stop", "--disable", f"slurm.{self._service.value}")

def set_config(self, key: str, value: str):
"""Set a snap config for the managed slurm service.

See the configuration section from the [Slurm readme](https://github.com/charmed-hpc/slurm-snap#configuration)
for a list of all the available configurations.

Note that this will only allow configuring the settings that are exclusive to
the specific managed service. (the slurmctld service uses the slurm parent key)
"""
_set_config(f"{self._service.config_name}.{key}={value}")

def get_config(self, key: str) -> str:
"""Get a snap config for the managed slurm service.

See the configuration section from the [Slurm readme](https://github.com/charmed-hpc/slurm-snap#configuration)
for a list of all the available configurations.

Note that this will only allow fetching the settings that are exclusive to
the specific managed service. (the slurmctld service uses the slurm parent key)
"""
# Snap returns the config value with an additional newline at the end.
return _get_config(f"{self._service.config_name}.{key}").strip()

def generate_munge_key(self) -> bytes:
"""Generate a new cryptographically secure munged key."""
handle, path = tempfile.mkstemp()
try:
_call("mungekey", "-f", "-k", path)
os.close(handle)
with open(path, "rb") as f:
return f.read()
finally:
os.remove(path)

def set_munge_key(self, key: bytes):
"""Set the current munged key."""
# TODO: use `slurm.setmungekey` when implemented
# subprocess.run(["slurm.setmungekey"], stdin=key)
key = base64.b64encode(key).decode()
_set_config(f"munge.key={key}")

def get_munge_key(self) -> bytes:
"""Get the current munged key."""
# TODO: use `slurm.setmungekey` when implemented
# key = subprocess.run(["slurm.getmungekey"])
key = _get_config("munge.key")
return base64.b64decode(key)

def version(self) -> str:
"""Get the installed Slurm version of the snap."""
info = yaml.safe_load(_snap("info", "slurm"))
version: str = info["installed"]
return version.split(maxsplit=1)[0]
8 changes: 4 additions & 4 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,9 @@ target-version = ["py38"]
# Linting tools configuration
[tool.ruff]
line-length = 99
extend-exclude = ["__pycache__", "*.egg_info"]

[tool.ruff.lint]
select = ["E", "W", "F", "C", "N", "D", "I001"]
extend-ignore = [
"D203",
Expand All @@ -32,11 +35,8 @@ extend-ignore = [
"D413",
]
ignore = ["E501", "D107"]
extend-exclude = ["__pycache__", "*.egg_info"]
per-file-ignores = {"tests/*" = ["D100","D101","D102","D103","D104"]}

[tool.ruff.mccabe]
max-complexity = 10
mccabe = { "max-complexity" = 10}

[tool.codespell]
skip = "build,lib,venv,icon.svg,.tox,.git,.mypy_cache,.ruff_cache,.coverage"
Expand Down
Loading
Loading