You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If I have a CR in a chart and remove its definition from the cluster, it may result in a broken operator state:
If I have a single revision, the operator constantly prints the error:
rollback failed: release: not found: original upgrade error: unable to build kubernetes objects from current release manifest: [resource mapping not found for name: "stackrox-central" namespace: "" from "": no matches for kind "PodSecurityPolicy" in version "policy/v1beta1" ensure CRDs are installed first, resource mapping not found for name: "stackrox-scanner" namespace: "" from "": no matches for kind "PodSecurityPolicy" in version "policy/v1beta1" ensure CRDs are installed first]
If I have more than 1 revision and the previous revision also contains a CR, then the rollback fails and the release gets stuck in pending-rollback state. The change Allow marking releases stuck in a pending state as failed #116 recovers the release.
We've discovered this issue with clusters that have been upgraded to 1.25 and have had the PSPs removed. However, this applies to any CRD.
Root cause
getReleaseState calls actionClient.Upgradewith the DryRun flag. This function tries to infer whether release was changed in storage based on return value of Upgrade.Run. From the comment it seems to me that it is expected that the returned release should not be nil with DryRun but apparently that is not the case (at least with Helm v3.12.1):
Problem
If I have a CR in a chart and remove its definition from the cluster, it may result in a broken operator state:
pending-rollback
state. The change Allow marking releases stuck in a pending state as failed #116 recovers the release.We've discovered this issue with clusters that have been upgraded to 1.25 and have had the PSPs removed. However, this applies to any CRD.
Root cause
getReleaseState
callsactionClient.Upgrade
with theDryRun
flag. This function tries to infer whether release was changed in storage based on return value ofUpgrade.Run
. From the comment it seems to me that it is expected that the returned release should not be nil withDryRun
but apparently that is not the case (at least with Helm v3.12.1):helm-operator-plugins/pkg/client/actionclient.go
Lines 237 to 241 in a775742
Thus, when the dry-run upgrade fails, action client performs a non-dry-run rollback to the previous revision.
From the Helm upgrade source code:
https://github.com/helm/helm/blob/main/pkg/action/upgrade.go#L293-L298
The text was updated successfully, but these errors were encountered: