-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added support to execute sacct to get job historic data for metrics #857
base: master
Are you sure you want to change the base?
Conversation
Struggling to make it Slurm agnostic. Currently it returns the raw response from Slurm as an array of hashes. |
Sample response from sacct command:
|
I don't think OnDemand supports getting job step data, so maybe need |
Thanks Trey. We are only interested in overall job data, but we need the data in the job/batch steps for the memory usage. We could do the merging inside the adapter code and create a partially populated
We will be looking at using Grafana for other metrics after completing the MVP for the Slurm metrics widget. |
lib/ood_core/job/adapters/slurm.rb
Outdated
def sacct_metrics(job_ids: [], states: [], from: nil, to: nil) | ||
@slurm.sacct_metrics(job_ids, states, from, to) | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should have an actual historic_info
API on the adapter class itself to mimic and extend the info
API.
Not 100% sure on the method signature here, but these are all keywords so it should be OK for now.
Note that this API should probably respond with an array of Info objects (currently returns an array of hashes?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made some changes to return an array of info objects and added support to disable job steps.
Still when job steps are enabled, it will return them as regular info objects. Not sure this is the best approach at this point, but for our use case, we need the steps for memory metrics calculations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I also wanted the top level historic_info
on the Adapter class that this is then the implementation for.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK - will make the changes and see how that looks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated the PR with a first implementation to add the historic info interface
https://osc.github.io/ood-documentation/latest/customizations.html?highlight=grafana#grafana-support https://grafana.com/grafana/dashboards/12093-ondemand-clusters/ |
The |
Draft implementation to add support to the Slurm adapter to execute sacct command to get historic job data to calculate metrics
Fixes: #856