You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Summary
I've shared the output of the command sinfo -N -r --format="%19N %23P %66G %5a %5c %8z %65f %50g" in a .txt. Add a new command (independent of slurm_sync) that processes the node information available in that .txt.
Details
The output should be:
New Resources of ResourceType 'Compute Node' are created that are named after the nodes in the NODELIST column. Each node should be unique - one entry per node name. Note that there is one line per node/partition combination in the list.
'Compute Node' Resources should by default be marked as is_allocatable=False.
Partitions in the PARTITION column are listed as linked_resources values for the node objects.
The GRES column contains information about GPUs, structured as e.g. gpu:nvidia_a100-sxm4-80gb:3(S:0-1),gpu:nvidia_a100_1g.10gb:7(S:0). The number after the second colon for each comma-separated entry is the count. Those should be tallied up and stored as a 'GPU Count' ResourceAttribute.
The S:C:T column contains socket/core/thread counts for the node. Save the core count as a 'Core Count' ResourceAttribute.
Try to intuit the value for a 'Owner' ResourceAttribute from the GROUPS column. The 'Owner' value should be either FASRC or the title of an existent Project. Ignore the GROUPS values a node gets from its serial_requeue* partition association and see what remaining groups have access. If the groups list is cluster_users,cluster_users_2,slurm-admin, FASRC is the owner. If the list is group_name,slurm-admin, group_name is the owner, provided that it matches a Project in ColdFront. If it doesn't (e.g., if the group name is seas, an Owner ResourceAttribute with an empty value should be saved for the object. This will be used as a cue to prompt manual entry of this value for the Resource later on
IMPORTANT: Ensure that in the latter case, new Owner ResourceAttributes for nodes that require manual Owner assignments don't get continuously created when the existing Owner ResourceAttribute is updated.
AVAIL_FEATURES column values should be saved as a 'Features' ResourceAttribute
If the nodes in the output are updated or removed, the corresponding objects should be updated accordingly. If the node is removed, the corresponding Resource's is_available field should be assigned a value of False and a 'ServiceEnd' ResourceAttribute should be created with the current date as its value.
The text was updated successfully, but these errors were encountered:
Summary
I've shared the output of the command
sinfo -N -r --format="%19N %23P %66G %5a %5c %8z %65f %50g"
in a .txt. Add a new command (independent of slurm_sync) that processes the node information available in that .txt.Details
The output should be:
gpu:nvidia_a100-sxm4-80gb:3(S:0-1),gpu:nvidia_a100_1g.10gb:7(S:0)
. The number after the second colon for each comma-separated entry is the count. Those should be tallied up and stored as a 'GPU Count' ResourceAttribute.cluster_users,cluster_users_2,slurm-admin
, FASRC is the owner. If the list isgroup_name,slurm-admin
, group_name is the owner, provided that it matches a Project in ColdFront. If it doesn't (e.g., if the group name isseas
, an Owner ResourceAttribute with an empty value should be saved for the object. This will be used as a cue to prompt manual entry of this value for the Resource later onThe text was updated successfully, but these errors were encountered: