Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Panic in OpenStackMachineReconciler if OpenStackCluster.Status.Network is nil (Hosted Control Plane scenario) #2380

Open
bnallapeta opened this issue Jan 21, 2025 · 1 comment · May be fixed by #2381
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@bnallapeta
Copy link

bnallapeta commented Jan 21, 2025

/kind bug

What steps did you take and what happened:

In a “hosted control plane” setup (where the control plane runs outside of OpenStack, and only worker nodes are provisioned in OpenStack), OpenStackCluster.Status.Network can remain nil. Currently, the CAPO code in OpenStackMachineReconciler.getOrCreateMachineServer() assumes openStackCluster.Status.Network is always non-nil. This leads to a nil pointer dereference (panic) when calling:

machineServerSpec := openStackMachineSpecToOpenStackServerSpec(
    &openStackMachine.Spec,
    identityRef,
    compute.InstanceTags(&openStackMachine.Spec, openStackCluster),
    failureDomain,
    userDataRef,
    getManagedSecurityGroup(openStackCluster, machine),
    openStackCluster.Status.Network.ID,  // <- panic if .Network is nil
)

The controller then crashes, making it impossible to provision worker nodes.

  • In HPC scenarios, there is no control-plane node running in OpenStack, so CAPO never populates OpenStackCluster.Status.Network.
  • The machine reconciliation panics in openstackmachine_controller.go due to a nil pointer dereference on openStackCluster.Status.Network.ID.

Logs:

0116 03:44:55.377796       1 openstackmachine_controller.go:361] "Reconciling Machine" controller="openstackmachine" controllerGroup="infrastructure.cluster.x-k8s.io" controllerKind="OpenStackMachine" OpenStackMachine="kcm-system/openstack-dev-hosted-cp-md-fcpqk-8l4q5" namespace="kcm-system" name="openstack-dev-hosted-cp-md-fcpqk-8l4q5" reconcileID="b00cfcbb-ae39-4bb9-aa87-0bcde7cb350d" openStackMachine="openstack-dev-hosted-cp-md-fcpqk-8l4q5" machine="openstack-dev-hosted-cp-md-fcpqk-8l4q5" cluster="openstack-dev-hosted-cp" openStackCluster="openstack-dev-hosted-cp"
I0116 03:44:55.378942       1 controller.go:110] "Observed a panic in reconciler: runtime error: invalid memory address or nil pointer dereference" controller="openstackmachine" controllerGroup="infrastructure.cluster.x-k8s.io" controllerKind="OpenStackMachine" OpenStackMachine="kcm-system/openstack-dev-hosted-cp-md-fcpqk-8l4q5" namespace="kcm-system" name="openstack-dev-hosted-cp-md-fcpqk-8l4q5" reconcileID="b00cfcbb-ae39-4bb9-aa87-0bcde7cb350d"
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
        panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x10 pc=0x1baafba]

goroutine 357 [running]:
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile.func1()
        /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:111 +0x1e5
panic({0x1dccbe0?, 0x362a670?})
        /usr/local/go/src/runtime/panic.go:770 +0x132
sigs.k8s.io/cluster-api-provider-openstack/controllers.(*OpenStackMachineReconciler).getOrCreateMachineServer(0xc00043a2a0, {0x2440550, 0xc0005b7230}, 0xc0004deb08, 0xc0006bc508, 0xc0008fa008)
        /workspace/controllers/openstackmachine_controller.go:586 +0x35a
sigs.k8s.io/cluster-api-provider-openstack/controllers.(*OpenStackMachineReconciler).reconcileMachineServer(0x24467c8?, {0x2440550?, 0xc0005b7230?}, 0xc0006d1560, 0x13?, 0x0?, 0x0?)
        /workspace/controllers/openstackmachine_controller.go:544 +0x3d
sigs.k8s.io/cluster-api-provider-openstack/controllers.(*OpenStackMachineReconciler).reconcileNormal(0xc00043a2a0, {0x2440550, 0xc0005b7230}, 0xc0006d1560, {0xc000059500, 0x22}, 0xc0004deb08, 0xc0008fa008, 0xc0006bc508)
        /workspace/controllers/openstackmachine_controller.go:363 +0x178
sigs.k8s.io/cluster-api-provider-openstack/controllers.(*OpenStackMachineReconciler).Reconcile(0xc00043a2a0, {0x2440550, 0xc0005b7230}, {{{0xc0006b5576?, 0x0?}, {0xc00059d050?, 0xc0008f1d10?}}})
        /workspace/controllers/openstackmachine_controller.go:161 +0xbd8
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile(0x24467c8?, {0x2440550?, 0xc0005b7230?}, {{{0xc0006b5576?, 0xb?}, {0xc00059d050?, 0x0?}}})
        /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:114 +0xb7
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler(0xc0004f2160, {0x2440588, 0xc00022f810}, {0x1e96420, 0xc00003d920})
        /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:311 +0x3bc
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0xc0004f2160, {0x2440588, 0xc00022f810})
        /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:261 +0x1be
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2()
        /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:222 +0x79
created by sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2 in goroutine 203
        /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:218 +0x486

What did you expect to happen:
That CAPO would handle the absence of status.network gracefully—e.g. by marking the OpenStackMachine with a condition or requeueing—rather than panicking.

Environment:

  • Cluster API Provider OpenStack version (Or git rev-parse HEAD if manually built):
  • Cluster-API version:
  • OpenStack version:
  • Minikube/KIND version:
  • Kubernetes version (use kubectl version):
  • OS (e.g. from /etc/os-release):
@k8s-ci-robot k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Jan 21, 2025
@bnallapeta
Copy link
Author

/assign bnallapeta

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
Status: Inbox
2 participants