Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: mgr/devicehealth: added life_expectancy_response() #61

Open
wants to merge 10 commits into
base: wip-devicehealth
Choose a base branch
from

Conversation

yaarith
Copy link

@yaarith yaarith commented Jun 11, 2018

No description provided.

@liewegas
Copy link
Owner

git rebase HEAD^ --onto liewegas/wip-devicehealth

]
DEFAULTS = {
'active': True,
'scrape_frequency': '86400',
'retention_period': '86400*14',
'pool_name': 'device_health_metrics',
'mark_out_threshold': 86400*14, # TODO: change back to string
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

str(86400*14)
etc

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@liewegas liewegas force-pushed the wip-devicehealth branch 2 times, most recently from 5b0b1db to c138c85 Compare June 11, 2018 15:16
'MGR_DEVICE_HEALTH': {
'severity': 'warning',
'summary': 'Imminent failure was detected on some devices.'
' Run "ceph health detail".',
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't need this line

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

self.set_health_checks({
'MGR_DEVICE_HEALTH': {
'severity': 'warning',
'summary': 'Imminent failure was detected on some devices.'
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Imminent failure anticipated for device(s)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

r, outb, outs = result.wait()
# TODO what else should we do in case marking out failed?
if r != 0:
self.log.error('Error: Could not mark OSD %s out. r: [%s], outb: [%s], outs: [%s]' % (osd_ids, r, outb, outs))
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

log a warning, but this isn't a critical error (ceph osd out will enver really fail.. and if it does, we'll do it again the next time)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed

liewegas added 8 commits June 12, 2018 08:27
Let's avoid "SMART" since it's misleading (it refers specifically to ATA).

Signed-off-by: Sage Weil <[email protected]>
This is more useful than the current local device name.

Clean up some formatting.

Signed-off-by: Sage Weil <[email protected]>
This is slightly different than the usual pattern because it is
parameterized.  I want to avoid fetching *all* devices if we don't need
it.

Signed-off-by: Sage Weil <[email protected]>
- command to fetch smart info
- command to scrape a device and store the metrics
- command to scrape all devices (and store)
- command to dump stored metrics
- purging of old metrics

This is based on code originally written by Yaarit.

Signed-off-by: Yaarit Hatuka [email protected]
Signed-off-by: Sage Weil <[email protected]>
- if mark_out_threshold is met we write to log.warn instead of raising a
  health warning.
- check that OSD is 'in' before calling mark_out().
- raise a health warning in case OSD is marked 'out' but still has PGs
  attached to it.
- cast thresholds default values to string.
- add SCSI multipath support to health warning message.
- change health warning message.

Signed-off-by: Yaarit Hatuka [email protected]
@liewegas liewegas force-pushed the wip-devicehealth branch 6 times, most recently from 9c4880f to dd6ad72 Compare June 23, 2018 22:02
liewegas pushed a commit that referenced this pull request Nov 5, 2021
sub-menu component, --shadow-black custom property, icons.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants