This document aim to helps users solve common problems they might encounter while installing or using the rating-operator. Problems listed below might be fixed by coming releases, the document will be updated accordingly. If you encounter an error not documented here, feel free to open an issue to discuss it, and we will add it here.
-
When installing the dependencies, is is STRONGLY recommended to use the exact same version as mentionned. Kubernetes and operators in general being in a fast growing environment, we fixed the versions used for each external components. Not respecting this can generate undocumented imcompatibility error, do at your own risks. Here is a list of the versions we use:
- Helm 3.1.2
- Grafana 7.0.3
- JSON Datasource plugin 0.2.0
- Rook-ceph 1.2.6
- Prometheus latest
- Metering-operator 4.2
-
After installing the
metering-operator
, it is STRONGLY recommended to wait for the firstReports
to be generated before installing therating
. -
After
rating-operator
installation, it is avised to wait approximately 10 minutes before starting to use it. The initialization time of the rating-operator-api can be long depending on the allocated resources. -
To test the ability of the rating-operator-api to answer, you can try:
$> kubectl get pods -l app.kubernetes.io/component=api -o name | cut -d/ -f2 | xargs -I{} kubectl port-forward {} 5012:5012
Forwarding from 127.0.0.1:5012 -> 5012
Forwarding from [::1]:5012 -> 5012
# From another terminal
$> curl http://localhost:5012/alive
I'm alive!
- After adding a new configuration, always verify that it is accepted. If the message below does not appear, the
RatingRules
have not been validated and thus will not be used.
$> kubectl -n $RATING_NAMESPACE describe ratingrules.rating.smile.fr test-rules
[...]
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Logging 43m kopf RatingRule test-rules created, valid from 2020-04-22T12:46:41Z.
- Do NOT create or modify
RatedMetrics
yourself, it's not designed to be used that way. - If a custom resource is stuck while deleting, the cause is probably the finalizer method. To solve this problem, we can use the
patch
command of kubectl:
$ kubectl delete ratingrules.rating.smile.fr rating-rating-default-rules
# The command hangs
# On another terminal
$ kubectl get ratingrules.rating.smile.fr
NAME AGE
rating-rating-default-rules 3d
$ kubectl patch ratingrules.rating.smile.fr/rating-rating-default-rules -p '{"metadata":{"finalizers":[]}}' --type=merge
ratingrule.rating.smile.fr/rating-rating-default-rules patched
# Then
$ kubectl get ratingrules.rating.smile.fr
No resources found in rating namespace.
The rook operator pod is not running correctly
You might see this message:
$ kubectl -n rook-ceph describe pods -l app=rook-ceph-operator
[...]
Message: failed to run operator. Error starting agent daemonset: error starting agent daemonset: failed to create rook-ceph-agent daemon set. DaemonSet.apps "rook-ceph-agent" is invalid: spec.template.spec.containers[0].securityContext.privileged: Forbidden: disallowed by cluster policy
[...]
The fix is to run kube-apiserver
with the --allow-privileged
flag.
This configuration detail is specific to rook-ceph and you may not need it with other storage plugins.
In our case with Juju:
$ juju config kubernetes-master allow-privileged=true
If that's not the issue you have, look here:
- https://github.com/rook/rook/blob/master/Documentation/common-issues.md
- https://github.com/rook/rook/blob/master/Documentation/ceph-common-issues.md
- https://www.ibm.com/support/knowledgecenter/en/SSBS6K_3.2.0/troubleshoot/rook_ts.html
- https://www.cloudops.com/2019/05/the-ultimate-rook-and-ceph-survival-guide/
I just installed the rating and nothing is happening
As the rating is watching the Reports
generated by the metering-operator
, you might have to wait as long as an hour to start seing metrics.
Reports
are generated every hour at HH:00, so you can expect to get RatedMetrics
soon after (seconds, in our case, 500 frames per metrics takes 1.4 seconds).
I waited until the next hour but nothing is happening
It might be related to the configuration versionning system. If you deploy the rating
before having Reports
generated, the operator will keep trying to get frames from a timeframe where none will ever exist.
Natively, the operator looks from frames with a timestamp between 1970/01/01T00:00:000 and the moment the first RatingRules
was deployed (the installation time, by default), and try to rate those with the base configuration. If no frames are found, the operator will just wait.
You can fix this situation by removing and recreating the base RatingRules
(rating-rating-default-rules by default).
To have better understanding of why this happens, read this.
I cannot connect to Grafana, what is the password ?
If you use the Grafana installed by the Prometheus operator, the credentials are:
- Login: admin
- Password: prom-operator
In case it doesn't work for you, use the following:
$ kubectl get secret prometheus-grafana -o yaml -n monitoring
apiVersion: v1
kind: Secret
data:
admin-password: cHJvbS1vcGVyYXRvcg==
admin-user: YWRtaW4=
[...]
$ echo "cHJvbS1vcGVyYXRvcg==" | base64 -d
prom-operator
After configuration, I don't see any data in Grafana
If you can list and query the endpoints in Grafana but do not get any results, check on the top right corner of the Grafana screen. The scalable rating produce data frames that always have a round timestamp, and the time parameter of Grafana is non inclusive. To see data, query AT LEAST data from the last 3 hours. You will never encounter this problem if you are using the reactive mechanism.
I don't succeed in using multi-tenancy through Grafana
We use cookie based sessions to authenticate user queries to the rating-operator-api.
You have to log through the /login
endpoint of the rating-operator-api, THEN log into grafana.
If you have configured the datasource properly, enabled the session cookie and activated Basic authentication, you can verift the cookie presence after login, through your web browser's interface.
If you cannot, go back to configuring Grafana or check your browser's cookie settings.