You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Our current UX for the Slurm charms is not as ideal as we'd hope; some actions have names that are a bit opaque on what they do, some have multiple bugs, and some others are clunky to use correctly.
We should look into trying to improve the user experience of all the actions and config options of the charms.
set-state - modify the current state of the nodes.
Have an enum with the valid states that can be set.
set-node-config - override the default node configuration.
Put a big ole’ warning over it that this option is only meant for experienced Slurm administrators, and 99.9% of the time Charmed HPC will get the correct configuration value. Validate with slurmutils
Don’t allow users to change the NodeName, or NodeAddr.
show-nhc-config - show the current NHC configuration file.
show-slurm-config - show the current slurm.conf configuration file.
Will need to look under /run/conf/slurm/slurm.conf.
Configuration options list:
default-state - default state to bring nodes up with after being deployed.
default-nhc-config - Default NHC configuration for partition.
set-state action for modifying the current state of compute nodes in bulk:
Leader context.
$ juju run slurmd/leader set-state state=IDLE nodes=all
$ juju run slurmd/leader set-state state=DOWN nodes="juju-abc123-[1-10]"
Non-leader context.
$ juju run slurmd/3 set-state state=IDLE
$ juju run slurmd/3 set-state state=DOWN nodes=all # Raises an error.
default-state configuration option for setting the state that nodes will start in:
# Bring up new nodes in the IDLE state.
$ juju deploy slurmd --base [email protected] --config default-state=IDLE
# Scale partition, but bring up nodes in `DOWN` state for post-processing
$ juju config slurmd default-state=DOWN
$ juju add-unit slurmd -n 10
# After completing provisioning operations, activate nodes.
$ juju run slurmd/leader set-state state=IDLE nodes="juju-abc123-[11-20]"
Move the action to slurmctld, then access it by application name instead:
$ juju run slurmctld/leader set-state IDLE application=slurmd
If we need more granularity, we can add a way to reference a unit
$ juju run slurmctld/leader set-state IDLE unit=slurmd/0
Even ranges should probably work
$ juju run slurmctld/leader set-state IDLE unit=slurmd/[0-50]
Maybe we even skip the prefix and just deduce the type by the input?
$ juju run slurmctld/leader set-state IDLE slurmd # set state of all units in application
$ juju run slurmctld/leader set-state IDLE slurmd/0 # set state of single unit
$ juju run slurmctld/leader set-state IDLE slurmd/[0-50] # set state of range of units
EDIT: Not an option because cross-model relations bork the application name. :(
In that case, maybe simplify some things in the arguments:
Leader context.
$ juju run slurmd/leader set-state state=IDLE # No node range implies all
$ juju run slurmd/leader set-state state=DOWN range=0-5 # set the state of nodes 1 to 5
Non-leader context.
$ juju run slurmd/3 set-state state=IDLE
$ juju run slurmd/3 set-state state=DOWN nodes=all # Raises an error.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Our current UX for the Slurm charms is not as ideal as we'd hope; some actions have names that are a bit opaque on what they do, some have multiple bugs, and some others are clunky to use correctly.
We should look into trying to improve the user experience of all the actions and config options of the charms.
node-configured
redesign ideas@nuccitheboss
Action list:
set-state
- modify the current state of the nodes.set-node-config
- override the default node configuration.slurmutils
NodeName
, orNodeAddr
.show-nhc-config
- show the current NHC configuration file.show-slurm-config
- show the current slurm.conf configuration file.Configuration options list:
default-state
- default state to bring nodes up with after being deployed.default-nhc-config
- Default NHC configuration for partition.default-partition-config
- Default partition configuration.set-state
action for modifying the current state of compute nodes in bulk:Leader context.
Non-leader context.
default-state
configuration option for setting the state that nodes will start in:@jedel1043
Move the action to
slurmctld
, then access it by application name instead:If we need more granularity, we can add a way to reference a unit
Even ranges should probably work
Maybe we even skip the prefix and just deduce the type by the input?
EDIT: Not an option because cross-model relations bork the application name. :(
In that case, maybe simplify some things in the arguments:
Leader context.
Non-leader context.
Beta Was this translation helpful? Give feedback.
All reactions