Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should CSIStorageCapacity ownerReference use node when node-deployment=true #1337

Closed
yibozhuang opened this issue Jan 29, 2025 · 3 comments
Closed
Assignees

Comments

@yibozhuang
Copy link

Today the logic that sets the ownerReference for CSIStorageCapacity is here:
https://github.com/kubernetes-csi/external-provisioner/blob/master/cmd/csi-provisioner/csi-provisioner.go#L462-L479

Where it uses the configured capacity-ownerref-level and traverses from the pod (2 for Deployment and 1 for StatefulSet/DaemonSet)

However, when using node-deployment=true for deploying the external-provisioner as a DaemonSet for handling node-local volumes, what ends up happening is that the CSIStorageCapacity would end up having the DaemonSet as its ownerReference

for e.g.

apiVersion: storage.k8s.io/v1
kind: CSIStorageCapacity
capacity: 20437544Ki
maximumVolumeSize: 10218772Ki
metadata:
  labels:
    csi.storage.k8s.io/drivername: foo
    csi.storage.k8s.io/managed-by: external-provisioner-node1
  name: csisc-cpzrc
  namespace: kube-system
  ownerReferences:
  - apiVersion: apps/v1
    controller: true
    kind: DaemonSet
    name: foo-csi-controller
    uid: d2c8a05e-6395-42a3-8881-6f407210cfdd
nodeTopology:
  matchLabels:
    kubernetes.io/hostname: node1
storageClassName: foo

And as a result if the node gets deleted or consolidated by cluster-autoscaler for e.g. this CSIStorageCapacity resource becomes orphaned as it won't be garbage-collected (since the DaemonSet still exists) but will never be used as the node is gone and when cluster-autoscaler brings up a new node, then it likely won't have the same name.

So in the case of using node-deployment=true with enable-capacity=true, should the CSIStorageCapacity's ownerReference use node resource instead? This way, it will be properly garbage collected when the node no longer exists.

@yibozhuang
Copy link
Author

The change should be pretty simple. I can make a PR.

/assign

@pohly
Copy link
Contributor

pohly commented Jan 29, 2025

A simpler fix would be to use the pod as owner. Then if the driver gets uninstalled, the capacity objects also get removed, which isn't the case when the node is the owner.

I vaguely remember that this was how node-local storage was meant to be managed. It might even have code already.

@yibozhuang
Copy link
Author

A simpler fix would be to use the pod as owner. Then if the driver gets uninstalled, the capacity objects also get removed, which isn't the case when the node is the owner.

I vaguely remember that this was how node-local storage was meant to be managed. It might even have code already.

Thanks @pohly ! I just tested this with using --capacity-ownerref-level=0 to set Pod as the owner and it is working well. I will close the PR and the issue then. Thanks for the suggestion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants