You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is a little speculative but it's something we probably want for integrating sonar data properly with Slurm data. For sonar slurm we extract information about jobs that have completed in the last hour. But we probably want the dashboard, which has profiling data, to have some slurm data while the job is running. So probably we want a lightweight-ish capability to extract information at the state changes (created)->PENDING, PENDING->RUNNING, and (whatever)->(completed). For the PENDING->RUNNING transition we want to have information about allocated resources, in particular, gpu cards - this information is lost once the job has completed. Not sure that slurm gives us anything else interesting at those stages, though clearly the info we get with sonar slurm would be interesting to have once the job reaches the completed state.
This job runs only on a single host on the cluster (maybe with some redundancy) and will not normally overload the compute nodes, it can run on a login or admin node. It should run as often as sonar ps and ideally roughly at the same time.
The text was updated successfully, but these errors were encountered:
This is a little speculative but it's something we probably want for integrating sonar data properly with Slurm data. For
sonar slurm
we extract information about jobs that have completed in the last hour. But we probably want the dashboard, which has profiling data, to have some slurm data while the job is running. So probably we want a lightweight-ish capability to extract information at the state changes (created)->PENDING, PENDING->RUNNING, and (whatever)->(completed). For the PENDING->RUNNING transition we want to have information about allocated resources, in particular, gpu cards - this information is lost once the job has completed. Not sure that slurm gives us anything else interesting at those stages, though clearly the info we get withsonar slurm
would be interesting to have once the job reaches the completed state.This job runs only on a single host on the cluster (maybe with some redundancy) and will not normally overload the compute nodes, it can run on a login or admin node. It should run as often as
sonar ps
and ideally roughly at the same time.The text was updated successfully, but these errors were encountered: