Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: updated Promethus dashboard #7

Merged
merged 5 commits into from
Sep 28, 2019
Merged

feat: updated Promethus dashboard #7

merged 5 commits into from
Sep 28, 2019

Conversation

FUSAKLA
Copy link
Owner

@FUSAKLA FUSAKLA commented Sep 10, 2019

resolves #5

@christoph-buente I ended up doing bigger refactoring of the dashboard.

PTAL if it works for you, possibly I'd be glad for any opinions on it.

Mainly it now supports displaying data for multiple instances at once.
Also there is whole bunch of new information I managed to find in them metrics which I believe are useful so you will see.

The new JSON can be found here

@christoph-buente
Copy link
Contributor

Thanks for the effort you put in. Indeed we use more than one prometheus server and it's nice to switch between those. However, the variable $custom_label and custom_label_name. confuses me. And Grafana actually complains about it.

max(prometheus_tsdb_head_series{instance=~"$instance",$custom_label_name=~"$custom_label_value"}) by (instance,${custom_label_name})

It says parse error at char 65: unexpected character inside braces: '$'

So right now the whole dashboards seems to be broken.
See attached screenshot
Screenshot 2019-09-11 09 30 01

@FUSAKLA
Copy link
Owner Author

FUSAKLA commented Sep 14, 2019

Hi, sorry for the delay. Hm.. Yeah this is unfortunate, Grafana does not provide any way of documenting the import process.

To explain what is it used for. The dashboard needs to distinguish Prometheus instances from each other. It uses the instance label but if for example you have multiple prometheis instances each scraping itself they would all have value of the instance label of localhost:9090 so you need additional label.
And that is the custom label.

The dashboard loads all values of the instance label and of the label specified as custom_label_name from the prometheus_build_info metric and uses them as values of those variables.

So.. I see you have no options in the prtscr of custom label, which means that you have set the custom_label_name to label which probably does not exist in the prom build metric?

Default should be job since that has every metric added by Prometheus or try setting it to instance.

But I have to think of a way to document it or make more robust. Thanks a lot for testing!
Please let me know if that helps.

@FUSAKLA
Copy link
Owner Author

FUSAKLA commented Sep 15, 2019

So I found a time to look at the issue more and found out that you probably use dashboard provisioning right? There is bit of an issue with exporting dashboards which does not show up when importing using GUI but it does when using provisioning which I managed to reproduce. The issue grafana/grafana#10786.

So now hopefully it should work for you
https://raw.githubusercontent.com/FUSAKLA/Prometheus2-grafana-dashboard/f4f81e9e1cf33ad72ed91890fc93163684035b8b/dashboard/prometheus2-dashboard.json

@christoph-buente
Copy link
Contributor

This is awesome @FUSAKLA !!!!
The dashboard works now, buy just copy&paste the JSON into grafana. I guess the same will be true, once you released a new revision on grafana.com for the existing dashboard.
There are good insights now. One hint, you might wanna add the 0.95 percentile as a selector. This seems to be a popular one. Thanks again, great stuff!

@christoph-buente
Copy link
Contributor

One thing though: The rule valuation errors are not there anymore. But i guess people want to know, if the rule evaluations fail and for what reason. Would be good to bring it back.

@FUSAKLA
Copy link
Owner Author

FUSAKLA commented Sep 17, 2019

Hi, those are great news! :)

Ad rule evaluation errors
All errors are now shown in the panel Prometheus errors in ... in one place if that is ok with you?
You should see there any issue happening in your Prometheus instance.

Ad 0.95 percentile
Yeah, I totally understand that demand. To explain how it works, most of the metrics exposed by the Prometheus are summaries with already predefined quantiles and less of the latnecies are histograms where you use the histogram_quantile function and can choose the quantile which you prefer. This variable controls both so those summaries and histograms. Values it offers come from summaries since those are more restrictive. So if you would use 0.95 quantile most of the graphs won't work :/ Not sure if this is acceptable limitation in exchange for simplicity?

@christoph-buente
Copy link
Contributor

christoph-buente commented Sep 17, 2019

All errors are now shown in the panel Prometheus errors in ... in one place if that is ok with you?

Absolutely, yeah!

So if you would use 0.95 quantile most of the graphs won't work :/ Not sure if this is acceptable limitation in exchange for simplicity?

Understood. I wasn't aware it's a limitation of the metrics. I assumed it was a simple change in the collection of available quantiles in grafana.

Ready for Prod, I'd say :)

@FUSAKLA
Copy link
Owner Author

FUSAKLA commented Sep 18, 2019

Great, I'll release it ASAP just found some issues with importing still.

Could you please verify hopefully for the last time this works for you? :)
https://raw.githubusercontent.com/FUSAKLA/Prometheus2-grafana-dashboard/f13edfefece2a0b988fa9326a6d4fc2e14a82129/dashboard/prometheus2-dashboard.json

Just that you can import it

@christoph-buente
Copy link
Contributor

Import worked perfectly. I just realized that the error panel does not have a very nice tooltool legend. Maybe it is enough to show {{metric_name}} cause job and instance have been selected from the variable dropdown. Other than that, very happy with the new dashboard.
Screenshot 2019-09-19 11 26 34

@christoph-buente
Copy link
Contributor

christoph-buente commented Sep 24, 2019

@FUSAKLA: I found another little glitch. The top left graph does not use the $datasource variable but the default data source. Just found out by accident, when graphs looked different than expected.

@FUSAKLA
Copy link
Owner Author

FUSAKLA commented Sep 28, 2019

Thanks, I'll fix the datasource right a way.

Yes, the error panel legend is a bit unfortunate, since it shows multiple different metrics each having different labels I can't use the legend formatting. Removing the instance and custom labels from it is definitely possible but then when you will have selected multiple Prometheus instances you won't know which is having these issues and would have to go through all of them one by one to find it out.

Not sure about the trade off 😕

@FUSAKLA
Copy link
Owner Author

FUSAKLA commented Sep 28, 2019

@christoph-buente thanks a lot for testing and feedback! Much appreciated :)
I just released the new version on the Grafana.com.

@FUSAKLA FUSAKLA merged commit dc58a8f into master Sep 28, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Upgrade dashboard to support Prometheus 2.11.x and newer
2 participants