Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(fleet/dashboards) fix some dashboards and add new ones #387

Open
wants to merge 13 commits into
base: master
Choose a base branch
from

Conversation

fbegyn
Copy link
Contributor

@fbegyn fbegyn commented May 10, 2024

The grafana dashboards are using specific datasources in the first design. This causes the UUID to break when importing these through provisioning files into other grafana instances. The fix for this is to use a datasource variable.

@fbegyn fbegyn requested a review from jhoblitt May 10, 2024 16:50
@fbegyn fbegyn self-assigned this May 10, 2024
@fbegyn fbegyn changed the title (fleet/dashboards) fix snmp dashboars (fleet/dashboards) fix snmp dashboards May 10, 2024
Copy link
Member

@jhoblitt jhoblitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments including screenshots are in this thread on slack: https://lsstc.slack.com/archives/C05JLDDAPAR/p1715360161601749

@jhoblitt
Copy link
Member

The commit message also contains a typo: "dashboars"

@fbegyn fbegyn changed the title (fleet/dashboards) fix snmp dashboards (fleet/dashboards) fix some dashboards and add new ones May 14, 2024
@fbegyn
Copy link
Contributor Author

fbegyn commented May 14, 2024

I've cleaned up a lot of dashboards to bring them back into working state I believe. The only 2 broken parts are the PDU montoring and Base to summit temporary. I'd prefer to get this PR merged in the current state, further tweaks to those 2 and other dashboards (improving network overview) can be done in subsequent PRs.

@fbegyn fbegyn requested a review from jhoblitt May 14, 2024 17:01
@fbegyn fbegyn force-pushed the IT-5331-fix-snmp-dashboards branch from 4343112 to a2ef94f Compare May 26, 2024 22:18
Copy link
Member

@jhoblitt jhoblitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only a partial re-review. There are still 8 remain dashboards I haven't look at yet and won't have time to look at until tomorrow morning.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The interface status plots don't render unless I scale my browse display size down to 80%. At that point they render but the interface names are illegible. E.g.
image

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks OK. Is there a way to control the order in which grafana sorts lists?
image

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have to scale my browser fonts down to 80% to get this to render. The device names are sort of legible but its difficult to read and we have a lot of more switches to add.

image

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to only be working to list the number of OSDs on a host.
image

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The mds_hosts and rgw_hosts drop downs don't have any values.

image

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The physical device metrics aren't working for me on ruka and I don't understand why. The metrics are present in prometheus.
image
image

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have to scale the fontsizes way down to get remove horizontal scroll bars from the latency tables.

image

At 60% the values become visible:

image

Copy link
Member

@jhoblitt jhoblitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have completed inspect all dashboard within this PR and made comments where I have concerns. WRT the Ceph dashboards. If the titles are direct copies of upstream dashboards, I think its OK to not rename them to have "Ceph" in the title but I think they need to be in a Ceph (or Rook) folder to logically group them together.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pool overview table is is hard to use. There are a lot of columns of duplicated or unimportant data that can be removed. The important information is on the right side of the table. The vertical height should be increased as we have a few clusters with a fair number of pools.

image

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This dashboard appears to be broken on both ruka and ayekan.

image

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This dashboard seems to be broken on ruka and ayekan for all clusters.

image

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't currently have rgw zone replication enabled on any cluster. We have tested it in the past and will probably use it later this year. It currently isn't possible to test with dashboard.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This dashboard seems to be broken on both ruka and ayekan.

image

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tables on this dashboard are difficult to read with my browser fonts at 100%. I have to scale it down to 60% to get all the data to render without horizontal scroll bars.

100%:
image

60%:
image

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This dashboard seems to be broken on both ruka and ayekan for all cluster / rgw_server combos I tried.

image

fleet/lib/grafana-dashboards/dashboards/rook-ceph.json Outdated Show resolved Hide resolved
@jhoblitt
Copy link
Member

@KrisBuytaert @fbegyn What is the status of this PR? Is it ready for re-review?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants