Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade from grafonnet-lib to grafonnet 10.4.0 #90

Merged
merged 39 commits into from
Apr 12, 2024

Conversation

yuvipanda
Copy link
Collaborator

@yuvipanda yuvipanda commented Oct 7, 2023

Update

This PR was worked by Yuvi and Erik, with notes from Erik in #90 (comment).

Original PR description

It's mostly a mechanical translation. I'll do it one at a time, and eventually it'll be done.

  • Custom layout code in our deployer is no longer required, as grafonnet supports automatic grid
    layout! But it doesn't support mixed widths (boo)

  • It will also get rid of this ugly notice:
    image

  • Remove the google cloud specific bits in our labeling, and make them work with at EKS provisioned with eksctl and AKS on Azure

    image Here we see that on AWS too we will get info split out per-nodegroup, rather than all aggregated together!
  • Everything that displays just node names will now also have the nodepool it is a part of displayed alongside. Makes identification much easier, especially on AWS with its just-ip nodenames

    image

    Would've been practically impossible to see that there was an OOM kill in a core node here.

TODO

Files to migrate

  • cluster.jsonnet
  • jupyterhub.jsonnet
  • jupyterhub.libsonnet
  • support.jsonnet
  • usage-report.jsonnet
  • user.jsonnet
  • global-usage-stats.jsonnet

Remaining work

  • Review visual changes
    • cluster.jsonnet
    • jupyterhub.jsonnet
    • jupyterhub.libsonnet
    • support.jsonnet
    • usage-report.jsonnet
    • user.jsonnet
    • global-usage-stats.jsonnet
  • Fix remaining commented out things like sort, thresholds, etc.

Things to catch

Fixes #89

@yuvipanda
Copy link
Collaborator Author

I am tempted to reorganize some of the panels while at it, but in order to make review and merging easier, I'll hold off and do that separately!

yuvipanda added a commit to yuvipanda/grafana-dashboards that referenced this pull request Oct 19, 2023
Currently, on latest grafana, the dashboards will currently
show 'no data' because the `$hub` variable is not set correctly
(see 2i2c-org/infrastructure#3237 for
details). The longer term solution to this is:

1. Move away from the deprecated graphPanel to timeseries objects
   for panel
2. jupyterhub#89

Both of these are being undertaken in
jupyterhub#90,
but in the meantime, this PR implements
[this
workaround](jupyterhub#90)
so end users can at least see data.
GeorgianaElena added a commit to GeorgianaElena/grafana-dashboards that referenced this pull request Feb 8, 2024
yuvipanda added a commit to yuvipanda/grafana-dashboards that referenced this pull request Feb 29, 2024
A follow-up to jupyterhub#91,
as the openscapes community reported this is causing issues
not just for the $hub variable but everything else as well.

jupyterhub#90 is
the long term fix
@consideRatio consideRatio force-pushed the grafonnet-migration branch 4 times, most recently from 3e8a781 to 44d77f3 Compare April 5, 2024 13:39
@consideRatio
Copy link
Member

@yuvipanda I went ahead and rebased this on main after merging some pre-commit stuff to resolve the merge conflicts, and I pushed two commits to ensure linting of the new common.libsonnet file and fix a typo in a docstring that otherwise caused black to format things weirdly.

@consideRatio consideRatio force-pushed the grafonnet-migration branch 3 times, most recently from 5b662f4 to affa248 Compare April 6, 2024 19:04
@consideRatio consideRatio force-pushed the grafonnet-migration branch 3 times, most recently from 65b5ec8 to 9f1c40a Compare April 6, 2024 19:27
@consideRatio
Copy link
Member

I've migrated all dashboards and panels so they render now, with some commented out options still.

I've not done any visual testing yet of how dashboards look before after change, and instead relied on ensuring jsonnet files render at all, and that jsonnetfmt and jsonnet-lint are happy (which now makes pre-commit pass).

Considering how I should best review changes visually, thinking about for example:

  • use deploy.py against a 2i2c cluster
  • port-forward a 2i2c clusters prometheus server and run grafana locally in a docker container

I figure I'll go for the slow but known-to-work strategy of using deploy.py.

@consideRatio consideRatio force-pushed the grafonnet-migration branch 4 times, most recently from 2d2941f to ed3efc3 Compare April 7, 2024 15:11
@consideRatio
Copy link
Member

consideRatio commented Apr 7, 2024

I'm currently testing this at https://grafana.pilot.2i2c.cloud and can conclude there are things to fix still, but now its easy to compare side by side with the main branch's changes as I've deployed these to a dedicated folder.

 export GRAFANA_TOKEN=<...>
./deploy.py --folder-name "dev dashboards" --folder-uid "70E5EE84-1217-4021-A89E-1E3DE0566D94" https://grafana.pilot.2i2c.cloud && ./deploy.py --folder-name "dev dashboards" --folder-uid "70E5EE84-1217-4021-A89E-1E3DE0566D94" --dashboards-dir global-dashboards https://grafana.pilot.2i2c.cloud

image

@consideRatio
Copy link
Member

consideRatio commented Apr 8, 2024

Observations

Stacking bug fixed

Stacking could misbehave previously if one value went from something to null or similar, but that issue seems to go away after the upgrade. In the gif below, note 7+1+1+1 should be 10, but is shown as either 7 as it is before this PR, and 10 after.

stacking-fail-old-but-not-new

  • Check this to indicate its OK now

Discrepancy - what is right?

The numbers shown in the global dashboard differ between grafana dashboards using the same prometheus query - what is going on here. Are one correct and one wrong, and if so, which is which - or are both wrong?

Before After
image image
  • Check this to indicate its OK now

Defaults to UTC now

Previously the dashboards showed local time by default, now they go with UTC time by default.

  • Check this to indicate its OK now

Decisions

  • To consistently stack things counted (running users, node count)
  • To consisntely not stack separate things measured in percent (node cpu utilization etc)

consideRatio and others added 2 commits April 12, 2024 07:54
I think we could manage to get prometheus server cpu/metrics etc
presented with the same helper functions if we retain these things.
@consideRatio
Copy link
Member

Thank you for reviewing this @GeorgianaElena!!!!

I started by checking out the code in the jsonnet files, but felt overwhelmed, so I ended up doing a side-by-side visual inspection of the panels. I feel confident that we should merge this though, as the panels either rendered the same, or were fixed by this PR.

If you also feel confident in the changes here @consideRatio, I believe you should feel empowered to go for a merge 🚀

Thank you for going for the review anyhow - I think the visual review is the key thing here as its the actual result. I think this is in a mergable state, and I opened #116 and #117 to track related work discussed in comments.

Thank you @yuvipanda for starting this work!

@consideRatio consideRatio merged commit 40a094f into jupyterhub:main Apr 12, 2024
2 of 3 checks passed
yuvipanda added a commit to yuvipanda/grafana-dashboards that referenced this pull request Jul 25, 2024
Currently, on latest grafana, the dashboards will currently
show 'no data' because the `$hub` variable is not set correctly
(see 2i2c-org/infrastructure#3237 for
details). The longer term solution to this is:

1. Move away from the deprecated graphPanel to timeseries objects
   for panel
2. jupyterhub#89

Both of these are being undertaken in
jupyterhub#90,
but in the meantime, this PR implements
[this
workaround](jupyterhub#90)
so end users can at least see data.
yuvipanda added a commit to yuvipanda/grafana-dashboards that referenced this pull request Jul 25, 2024
A follow-up to jupyterhub#91,
as the openscapes community reported this is causing issues
not just for the $hub variable but everything else as well.

jupyterhub#90 is
the long term fix
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Move from grafonnet-lib to grafonnet
3 participants