Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MGS could record a metric for the number of crash dumps on each SP #6791

Open
hawkw opened this issue Oct 7, 2024 · 1 comment
Open

MGS could record a metric for the number of crash dumps on each SP #6791

hawkw opened this issue Oct 7, 2024 · 1 comment
Assignees
Labels
enhancement New feature or request. Metrics

Comments

@hawkw
Copy link
Member

hawkw commented Oct 7, 2024

@mkeeter points out to me that it could be quite useful for the MGS metrics subsystem to report a counter metric tracking the number of task crash dumps present on a SP. This would allow us to easily see which SPs on the rack have crash dumps, as well as providing an indication of the time1 at which a new crash dump appeared, which provides an approximation of when the task crash occurred.

Footnotes

  1. Wall clock time as understood by MGS.

@hawkw hawkw self-assigned this Oct 7, 2024
@hawkw hawkw added enhancement New feature or request. Metrics labels Oct 7, 2024
@hawkw
Copy link
Member Author

hawkw commented Oct 7, 2024

This would probably be recorded under a new target, rather than the hardware_component target used by sensor metrics, as it's scoped to the whole SP rather than a particular hardware device known to the SP.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request. Metrics
Projects
None yet
Development

No branches or pull requests

1 participant