Skip to content

Commit

Permalink
✨ topology: implement BeforeClusterUpgrade annotation hook (#11922)
Browse files Browse the repository at this point in the history
* topology: implement BeforeClusterUpgrade annotation hook

* review fixes

* add e2e test coverage

* review fixes
  • Loading branch information
chrischdi authored Mar 6, 2025
1 parent 45974cd commit 9ed56b1
Show file tree
Hide file tree
Showing 6 changed files with 221 additions and 12 deletions.
7 changes: 7 additions & 0 deletions api/v1beta1/common_types.go
Original file line number Diff line number Diff line change
Expand Up @@ -200,6 +200,13 @@ const (

// CRDMigrationObservedGenerationAnnotation indicates on a CRD for which generation CRD migration is completed.
CRDMigrationObservedGenerationAnnotation = "crd-migration.cluster.x-k8s.io/observed-generation"

// BeforeClusterUpgradeHookAnnotationPrefix annotation specifies the prefix we search each annotation
// for during the before-upgrade lifecycle hook to block propagating the new version to the control plane.
// This hook can be used to execute pre-upgrade add-on tasks and block upgrades of the ControlPlane and Workers.
// Note: While the upgrade is blocked changes made to the Cluster Topology will be delayed propagating to the underlying
// objects while the object is waiting for upgrade.
BeforeClusterUpgradeHookAnnotationPrefix = "before-upgrade.hook.cluster.cluster.x-k8s.io"
)

// MachineSetPreflightCheck defines a valid MachineSet preflight check.
Expand Down
9 changes: 5 additions & 4 deletions docs/book/src/reference/api/labels-and-annotations.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
|:------------------------------------------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:------------|:-------------------------|
| cluster.x-k8s.io/cluster-name | It is set on machines linked to a cluster and external objects(bootstrap and infrastructure providers). | User | Machines |
| cluster.x-k8s.io/control-plane | It is set on machines or related objects that are part of a control plane. | Cluster API | Machines |
| cluster.x-k8s.io/control-plane-name | It is set on machines if they're controlled by a control plane. The value of this label may be a hash if the control plane name is longer than 63 characters. | Cluster API | Machines |
| cluster.x-k8s.io/control-plane-name | It is set on machines if they're controlled by a control plane. The value of this label may be a hash if the control plane name is longer than 63 characters. | Cluster API | Machines |
| cluster.x-k8s.io/deployment-name | It is set on machines if they're controlled by a MachineDeployment. | Cluster API | Machines |
| cluster.x-k8s.io/drain | If set with the value "skip" on a Pod in the workload cluster, the Pod will not be evicted during Node drain. | User | Pods (workload cluster) |
| cluster.x-k8s.io/interruptible | It is used to mark the nodes that run on interruptible instances. | User | Nodes (workload cluster) |
Expand All @@ -20,6 +20,7 @@

| Annotation | Note | Managed By | Applies to |
|:-----------------------------------------------------------------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-------------------------|:-----------------------------------------------|
| before-upgrade.hook.cluster.cluster.x-k8s.io | It specifies the prefix we search each annotation for during the before-upgrade lifecycle hook to block propagating the new version to the control plane. These hooks will prevent propagation of changes made to the Cluster Topology to the underlying objects. | User | Clusters |
| cluster.x-k8s.io/cloned-from-groupkind | It is the annotation that stores the group-kind of the template from which the current resource has been cloned from. | Cluster API | All Cluster API objects cloned from a template |
| cluster.x-k8s.io/cloned-from-name | It is the annotation that stores the name of the template from which the current resource has been cloned from. | Cluster API | All Cluster API objects cloned from a template |
| cluster.x-k8s.io/cluster-name | It is set on nodes identifying the name of the cluster the node belongs to. | Cluster API | Nodes (workload cluster) |
Expand Down Expand Up @@ -53,9 +54,9 @@
| machineset.cluster.x-k8s.io/skip-preflight-checks | It can be applied on MachineDeployment and MachineSet resources to specify a comma-separated list of preflight checks that should be skipped during MachineSet reconciliation. Supported preflight checks are: All, KubeadmVersionSkew, KubernetesVersionSkew, ControlPlaneIsStable. | User | MachineDeployments, MachineSets |
| pre-drain.delete.hook.machine.cluster.x-k8s.io | It specifies the prefix we search each annotation for during the pre-drain.delete lifecycle hook to pause reconciliation of deletion. These hooks will prevent removal of draining the associated node until all are removed. | User | Machines |
| pre-terminate.delete.hook.machine.cluster.x-k8s.io | It specifies the prefix we search each annotation for during the pre-terminate.delete lifecycle hook to pause reconciliation of deletion. These hooks will prevent removal of an instance from an infrastructure provider until all are removed. | User | Machines |
| topology.cluster.x-k8s.io/defer-upgrade | It can be used to defer the Kubernetes upgrade of a single MachineDeployment topology. If the annotation is set on a MachineDeployment topology in Cluster.spec.topology.workers, the Kubernetes upgrade for this MachineDeployment topology is deferred. It doesn't affect other MachineDeployment topologies. | Cluster API | MachineDeployments in Cluster.topology |
| topology.cluster.x-k8s.io/dry-run | It is an annotation that gets set on objects by the topology controller only during a server side dry run apply operation. It is used for validating update webhooks for objects which get updated by template rotation (e.g. InfrastructureMachineTemplate). When the annotation is set and the admission request is a dry run, the webhook should deny validation due to immutability. By that the request will succeed (without any changes to the actual object because it is a dry run) and the topology controller will receive the resulting object. | Cluster API | Template rotation objects |
| topology.cluster.x-k8s.io/hold-upgrade-sequence | It can be used to hold the entire MachineDeployment upgrade sequence. If the annotation is set on a MachineDeployment topology in Cluster.spec.topology.workers, the Kubernetes upgrade for this MachineDeployment topology and all subsequent ones is deferred. | Cluster API | MachineDeployments in Cluster.topology |
| topology.cluster.x-k8s.io/defer-upgrade | It can be used to defer the Kubernetes upgrade of a single MachineDeployment topology. If the annotation is set on a MachineDeployment topology in Cluster.spec.topology.workers, the Kubernetes upgrade for this MachineDeployment topology is deferred. It doesn't affect other MachineDeployment topologies. | Cluster API | MachineDeployments in Cluster.topology |
| topology.cluster.x-k8s.io/dry-run | It is an annotation that gets set on objects by the topology controller only during a server side dry run apply operation. It is used for validating update webhooks for objects which get updated by template rotation (e.g. InfrastructureMachineTemplate). When the annotation is set and the admission request is a dry run, the webhook should deny validation due to immutability. By that the request will succeed (without any changes to the actual object because it is a dry run) and the topology controller will receive the resulting object. | Cluster API | Template rotation objects |
| topology.cluster.x-k8s.io/hold-upgrade-sequence | It can be used to hold the entire MachineDeployment upgrade sequence. If the annotation is set on a MachineDeployment topology in Cluster.spec.topology.workers, the Kubernetes upgrade for this MachineDeployment topology and all subsequent ones is deferred. | Cluster API | MachineDeployments in Cluster.topology |
| topology.cluster.x-k8s.io/upgrade-concurrency | It can be used to configure the maximum concurrency while upgrading MachineDeployments of a classy Cluster. It is set as a top level annotation on the Cluster object. The value should be >= 1. If unspecified the upgrade concurrency will default to 1. | Cluster API | Clusters |
| unsafe.topology.cluster.x-k8s.io/disable-update-class-name-check | It can be used to disable the webhook check on update that disallows a pre-existing Cluster to be populated with Topology information and Class. | User | Clusters |
| unsafe.topology.cluster.x-k8s.io/disable-update-version-check | It can be used to disable the webhook checks on update that disallows updating the .topology.spec.version on certain conditions. | User | Clusters |
31 changes: 31 additions & 0 deletions exp/topology/desiredstate/desired_state.go
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,8 @@ package desiredstate
import (
"context"
"fmt"
"slices"
"strings"

"github.com/pkg/errors"
corev1 "k8s.io/api/core/v1"
Expand Down Expand Up @@ -531,6 +533,35 @@ func (g *generator) computeControlPlaneVersion(ctx context.Context, s *scope.Sco
}

if feature.Gates.Enabled(feature.RuntimeSDK) {
var hookAnnotations []string
for key := range s.Current.Cluster.Annotations {
if strings.HasPrefix(key, clusterv1.BeforeClusterUpgradeHookAnnotationPrefix) {
hookAnnotations = append(hookAnnotations, key)
}
}
if len(hookAnnotations) > 0 {
slices.Sort(hookAnnotations)
message := fmt.Sprintf("annotations [%s] are set", strings.Join(hookAnnotations, ", "))
if len(hookAnnotations) == 1 {
message = fmt.Sprintf("annotation [%s] is set", strings.Join(hookAnnotations, ", "))
}
// Add the hook with a response to the tracker so we can later update the condition.
s.HookResponseTracker.Add(runtimehooksv1.BeforeClusterUpgrade, &runtimehooksv1.BeforeClusterUpgradeResponse{
CommonRetryResponse: runtimehooksv1.CommonRetryResponse{
// RetryAfterSeconds needs to be set because having only hooks without RetryAfterSeconds
// would lead to not updating the condition. We can rely on getting an event when the
// annotation gets removed so we set twice of the default sync-period to not cause additional reconciles.
RetryAfterSeconds: 20 * 60,
CommonResponse: runtimehooksv1.CommonResponse{
Message: message,
},
},
})

log.Info(fmt.Sprintf("Cluster upgrade to version %q is blocked by %q hook (via annotations)", desiredVersion, runtimecatalog.HookName(runtimehooksv1.BeforeClusterUpgrade)), "hooks", strings.Join(hookAnnotations, ","))
return *currentVersion, nil
}

// At this point the control plane and the machine deployments are stable and we are almost ready to pick
// up the desiredVersion. Call the BeforeClusterUpgrade hook before picking up the desired version.
hookRequest := &runtimehooksv1.BeforeClusterUpgradeRequest{
Expand Down
30 changes: 29 additions & 1 deletion exp/topology/desiredstate/desired_state_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -758,6 +758,7 @@ func TestComputeControlPlaneVersion(t *testing.T) {
name string
hookResponse *runtimehooksv1.BeforeClusterUpgradeResponse
topologyVersion string
clusterModifier func(c *clusterv1.Cluster)
controlPlaneObj *unstructured.Unstructured
upgradingMachineDeployments []string
upgradingMachinePools []string
Expand Down Expand Up @@ -868,7 +869,7 @@ func TestComputeControlPlaneVersion(t *testing.T) {
expectedVersion: "v1.2.3",
},
{
name: "should return the controlplane.spec.version if the BeforeClusterUpgrade hooks returns a blocking response",
name: "should return the controlplane.spec.version if a BeforeClusterUpgradeHook returns a blocking response",
hookResponse: blockingBeforeClusterUpgradeResponse,
topologyVersion: "v1.2.3",
controlPlaneObj: builder.ControlPlane("test1", "cp1").
Expand Down Expand Up @@ -906,6 +907,30 @@ func TestComputeControlPlaneVersion(t *testing.T) {
expectedVersion: "v1.2.2",
wantErr: true,
},
{
name: "should return the controlplane.spec.version if a BeforeClusterUpgradeHook annotation is set",
hookResponse: nonBlockingBeforeClusterUpgradeResponse,
topologyVersion: "v1.2.3",
controlPlaneObj: builder.ControlPlane("test1", "cp1").
WithSpecFields(map[string]interface{}{
"spec.version": "v1.2.2",
"spec.replicas": int64(2),
}).
WithStatusFields(map[string]interface{}{
"status.version": "v1.2.2",
"status.replicas": int64(2),
"status.updatedReplicas": int64(2),
"status.readyReplicas": int64(2),
"status.unavailableReplicas": int64(0),
}).
Build(),
clusterModifier: func(c *clusterv1.Cluster) {
c.Annotations = map[string]string{
clusterv1.BeforeClusterUpgradeHookAnnotationPrefix + "/test": "true",
}
},
expectedVersion: "v1.2.2",
},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
Expand All @@ -930,6 +955,9 @@ func TestComputeControlPlaneVersion(t *testing.T) {
UpgradeTracker: scope.NewUpgradeTracker(),
HookResponseTracker: scope.NewHookResponseTracker(),
}
if tt.clusterModifier != nil {
tt.clusterModifier(s.Current.Cluster)
}
if len(tt.upgradingMachineDeployments) > 0 {
s.UpgradeTracker.MachineDeployments.MarkUpgrading(tt.upgradingMachineDeployments...)
}
Expand Down
Loading

0 comments on commit 9ed56b1

Please sign in to comment.