You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Activate a Clustered license type here. You can fetch the kube-system UID via: kubectl get ns kube-system -o jsonpath='{.metadata.uid}'.
Next, make sure to configure and create a TVK Target for backups storage.
Then, create a TVK Namespaced backup for Prometheus (default namespace is monitoring as per Starter Kit).
Wait for the backup to complete successfully, then delete the Prometheus Helm release: helm delete kube-prom-stack -n monitoring
Initiate a restore directly from the S3 Target using the TVK web management console.
Expected Results
The monitoring namespace applications (including Prometheus) backup and restore process should go smoothly, and without any issues. All Prometheus stack components should be up and running (Pods, Services, etc).
Actual Results
The restore process completes successfully, but the Prometheus Operator (or kube-prome-operator) is refusing to start. Running kubectl get pods -n monitoring yields:
NAME READY STATUS RESTARTS AGE
kube-prom-szubu-grafana-5754d5b7b7-v97v2 2/2 Running 0 16m
kube-prom-szubu-kube-prome-operator-8649bb7b47-9qs8j 0/1 ContainerCreating 0 16m
kube-prom-szubu-kube-state-metrics-7f6f67d67f-8zfkh 1/1 Running 0 16m
kube-prom-szubu-prometheus-node-exporter-dlb44 1/1 Running 0 16m
kube-prom-szubu-prometheus-node-exporter-wktv7 1/1 Running 0 16m
Going further, and issuing kubectl describe pod/kube-prom-szubu-kube-prome-operator-8649bb7b47-9qs8j -n monitoring yields:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 6m6s default-scheduler Successfully assigned monitoring/kube-prom-szubu-kube-prome-operator-8649bb7b47-9qs8j to flux-test-mt-pool-ug7di
Warning FailedMount 116s (x10 over 6m6s) kubelet MountVolume.SetUp failed for volume "tls-secret" : secret "kube-prom-szubu-kube-prome-admission" not found
Warning FailedMount 106s (x2 over 4m3s) kubelet Unable to attach or mount volumes: unmounted volumes=[tls-secret], unattached volumes=[tls-secret kube-api-access-bngnb]: timed out waiting for the condition
Seems that kube-prome-operator fails to find the secret named kube-prom-szubu-kube-prome-admission. Listing all the secrets from the monitoring namespace viakubectl get secrets -n monitoring, yields (notice that there's a secret named kube-prom-stack-kube-prome-admission which seems to be the right one):
NAME TYPE DATA AGE
alertmanager-kube-prom-szubu-kube-prome-alertmanager Opaque 1 19m
default-token-tsjk5 kubernetes.io/service-account-token 3 98m
kube-prom-stack-kube-prome-admission Opaque 3 97m
kube-prom-szubu-grafana Opaque 3 19m
...
Looking at the Prometheus Operator deployment via kubectl get deployment kube-prom-szubu-kube-prome-operator -o yaml, you can notice that the secret name was changed to kube-prom-szubu-kube-prome-admission (TVK replaced stack with szubu):
Next, after editing the deployment via kubectl edit deployment kube-prom-szubu-kube-prome-operator -n monitoring and replacing the secret name with the proper one kube-prom-stack-kube-prome-admission, the Prometheus Operator starts successfully:
Everything seems back to normal now, as seen above.
After analysing everything that happened so far, it seems that TVK is renaming the Kubernetes resources in the backup/restore process using some internal logic or naming convention, but when restoring there are consistency problems.
The text was updated successfully, but these errors were encountered:
@v-ctiutiu Thank you for raising this issue. TVK Engg team is looking into it.
We will reproduce it in-house with kube-prom-stack and keep you updated with the progress on it.
We have found the root cause of the issue. It seems to be a logic error in our restore hooks for native Helm chart support.
We will be fixing this in the upcoming patch release v2.6.4 and as of today we will update our release notes with this as a 'known issue'
@v-ctiutiu Thank you for your patience on this one.
We have fixed this issue end-to-end and now the helm application will be restored the helm way and it would be ready to be used after the restore is complete. This fix is released as a part of TVK 2.7.1 release. Here are the release notes.
Let me know if you face any issues.
Problem Description
When trying to restore a full backup that includes
Prometheus
as one of the backed up components, thekube-prome-operator
component fails to start.Impacted Areas
TrilioVault for Kubernetes
Namespaced
orMulti-Namespaced
restore operations.Prerequisites
Prometheus must be deployed in your DOKS cluster as per Starter Kit guide.
Steps to Reproduce
Prometheus
instance running in yourDOKS
cluster.TrilioVault for Kubernetes
installed and configured, as described in Installing TrilioVault for Kubernetes chapter.kubectl get ns kube-system -o jsonpath='{.metadata.uid}'
.monitoring
as per Starter Kit).helm delete kube-prom-stack -n monitoring
S3 Target
using the TVK web management console.Expected Results
The
monitoring
namespace applications (including Prometheus) backup and restore process should go smoothly, and without any issues. All Prometheus stack components should be up and running (Pods
,Services
, etc).Actual Results
The restore process completes successfully, but the
Prometheus Operator
(orkube-prome-operator
) is refusing to start. Runningkubectl get pods -n monitoring
yields:Going further, and issuing
kubectl describe pod/kube-prom-szubu-kube-prome-operator-8649bb7b47-9qs8j -n monitoring
yields:Seems that
kube-prome-operator
fails to find the secret namedkube-prom-szubu-kube-prome-admission
. Listing all the secrets from the monitoring namespace viakubectl get secrets -n monitoring
, yields (notice that there's a secret namedkube-prom-stack-kube-prome-admission
which seems to be the right one):Looking at the
Prometheus Operator
deployment viakubectl get deployment kube-prom-szubu-kube-prome-operator -o yaml
, you can notice that the secret name was changed tokube-prom-szubu-kube-prome-admission
(TVK replacedstack
withszubu
):Next, after editing the deployment via
kubectl edit deployment kube-prom-szubu-kube-prome-operator -n monitoring
and replacing the secret name with the proper onekube-prom-stack-kube-prome-admission
, thePrometheus Operator
starts successfully:The output looks like below:
Everything seems back to normal now, as seen above.
After analysing everything that happened so far, it seems that
TVK
isrenaming
the Kubernetes resources in the backup/restore process using some internal logic or naming convention, but when restoring there are consistency problems.The text was updated successfully, but these errors were encountered: