-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: mgr/devicehealth: added life_expectancy_response() #61
base: wip-devicehealth
Are you sure you want to change the base?
Conversation
git rebase HEAD^ --onto liewegas/wip-devicehealth |
] | ||
DEFAULTS = { | ||
'active': True, | ||
'scrape_frequency': '86400', | ||
'retention_period': '86400*14', | ||
'pool_name': 'device_health_metrics', | ||
'mark_out_threshold': 86400*14, # TODO: change back to string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
str(86400*14)
etc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
5b0b1db
to
c138c85
Compare
'MGR_DEVICE_HEALTH': { | ||
'severity': 'warning', | ||
'summary': 'Imminent failure was detected on some devices.' | ||
' Run "ceph health detail".', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't need this line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
self.set_health_checks({ | ||
'MGR_DEVICE_HEALTH': { | ||
'severity': 'warning', | ||
'summary': 'Imminent failure was detected on some devices.' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Imminent failure anticipated for device(s)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
r, outb, outs = result.wait() | ||
# TODO what else should we do in case marking out failed? | ||
if r != 0: | ||
self.log.error('Error: Could not mark OSD %s out. r: [%s], outb: [%s], outs: [%s]' % (osd_ids, r, outb, outs)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
log a warning, but this isn't a critical error (ceph osd out will enver really fail.. and if it does, we'll do it again the next time)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed
Let's avoid "SMART" since it's misleading (it refers specifically to ATA). Signed-off-by: Sage Weil <[email protected]>
Signed-off-by: Sage Weil <[email protected]>
This is more useful than the current local device name. Clean up some formatting. Signed-off-by: Sage Weil <[email protected]>
This is slightly different than the usual pattern because it is parameterized. I want to avoid fetching *all* devices if we don't need it. Signed-off-by: Sage Weil <[email protected]>
Signed-off-by: Sage Weil <[email protected]>
- command to fetch smart info - command to scrape a device and store the metrics - command to scrape all devices (and store) - command to dump stored metrics - purging of old metrics This is based on code originally written by Yaarit. Signed-off-by: Yaarit Hatuka [email protected] Signed-off-by: Sage Weil <[email protected]>
… stored Signed-off-by: Sage Weil <[email protected]>
Signed-off-by: Sage Weil <[email protected]>
9053199
to
5a44d4f
Compare
- if mark_out_threshold is met we write to log.warn instead of raising a health warning. - check that OSD is 'in' before calling mark_out(). - raise a health warning in case OSD is marked 'out' but still has PGs attached to it. - cast thresholds default values to string. - add SCSI multipath support to health warning message. - change health warning message. Signed-off-by: Yaarit Hatuka [email protected]
Signed-off-by: Yaarit Hatuka [email protected]
9c4880f
to
dd6ad72
Compare
sub-menu component, --shadow-black custom property, icons.
No description provided.