Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Write a pipeline to CRUD ColdFront Resource objects from sinfo output #2

Open
claire-peters opened this issue Feb 25, 2025 · 0 comments

Comments

@claire-peters
Copy link
Collaborator

claire-peters commented Feb 25, 2025

Summary
I've shared the output of the command sinfo -N -r --format="%19N %23P %66G %5a %5c %8z %65f %50g" in a .txt. Add a new command (independent of slurm_sync) that processes the node information available in that .txt.

Details
The output should be:

  • New Resources of ResourceType 'Compute Node' are created that are named after the nodes in the NODELIST column. Each node should be unique - one entry per node name. Note that there is one line per node/partition combination in the list.
    • 'Compute Node' Resources should by default be marked as is_allocatable=False.
  • Partitions in the PARTITION column are listed as linked_resources values for the node objects.
  • The GRES column contains information about GPUs, structured as e.g. gpu:nvidia_a100-sxm4-80gb:3(S:0-1),gpu:nvidia_a100_1g.10gb:7(S:0). The number after the second colon for each comma-separated entry is the count. Those should be tallied up and stored as a 'GPU Count' ResourceAttribute.
  • The S:C:T column contains socket/core/thread counts for the node. Save the core count as a 'Core Count' ResourceAttribute.
  • Try to intuit the value for a 'Owner' ResourceAttribute from the GROUPS column. The 'Owner' value should be either FASRC or the title of an existent Project. Ignore the GROUPS values a node gets from its serial_requeue* partition association and see what remaining groups have access. If the groups list is cluster_users,cluster_users_2,slurm-admin, FASRC is the owner. If the list is group_name,slurm-admin, group_name is the owner, provided that it matches a Project in ColdFront. If it doesn't (e.g., if the group name is seas, an Owner ResourceAttribute with an empty value should be saved for the object. This will be used as a cue to prompt manual entry of this value for the Resource later on
    • IMPORTANT: Ensure that in the latter case, new Owner ResourceAttributes for nodes that require manual Owner assignments don't get continuously created when the existing Owner ResourceAttribute is updated.
  • AVAIL_FEATURES column values should be saved as a 'Features' ResourceAttribute
  • If the nodes in the output are updated or removed, the corresponding objects should be updated accordingly. If the node is removed, the corresponding Resource's is_available field should be assigned a value of False and a 'ServiceEnd' ResourceAttribute should be created with the current date as its value.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant