Should CSIStorageCapacity ownerReference use node when node-deployment=true #1337

yibozhuang · 2025-01-29T01:42:39Z

Today the logic that sets the ownerReference for CSIStorageCapacity is here:
https://github.com/kubernetes-csi/external-provisioner/blob/master/cmd/csi-provisioner/csi-provisioner.go#L462-L479

Where it uses the configured capacity-ownerref-level and traverses from the pod (2 for Deployment and 1 for StatefulSet/DaemonSet)

However, when using node-deployment=true for deploying the external-provisioner as a DaemonSet for handling node-local volumes, what ends up happening is that the CSIStorageCapacity would end up having the DaemonSet as its ownerReference

for e.g.

apiVersion: storage.k8s.io/v1
kind: CSIStorageCapacity
capacity: 20437544Ki
maximumVolumeSize: 10218772Ki
metadata:
  labels:
    csi.storage.k8s.io/drivername: foo
    csi.storage.k8s.io/managed-by: external-provisioner-node1
  name: csisc-cpzrc
  namespace: kube-system
  ownerReferences:
  - apiVersion: apps/v1
    controller: true
    kind: DaemonSet
    name: foo-csi-controller
    uid: d2c8a05e-6395-42a3-8881-6f407210cfdd
nodeTopology:
  matchLabels:
    kubernetes.io/hostname: node1
storageClassName: foo

And as a result if the node gets deleted or consolidated by cluster-autoscaler for e.g. this CSIStorageCapacity resource becomes orphaned as it won't be garbage-collected (since the DaemonSet still exists) but will never be used as the node is gone and when cluster-autoscaler brings up a new node, then it likely won't have the same name.

So in the case of using node-deployment=true with enable-capacity=true, should the CSIStorageCapacity's ownerReference use node resource instead? This way, it will be properly garbage collected when the node no longer exists.

The text was updated successfully, but these errors were encountered:

yibozhuang · 2025-01-29T01:43:34Z

The change should be pretty simple. I can make a PR.

/assign

pohly · 2025-01-29T06:32:53Z

A simpler fix would be to use the pod as owner. Then if the driver gets uninstalled, the capacity objects also get removed, which isn't the case when the node is the owner.

I vaguely remember that this was how node-local storage was meant to be managed. It might even have code already.

yibozhuang · 2025-01-29T18:30:18Z

A simpler fix would be to use the pod as owner. Then if the driver gets uninstalled, the capacity objects also get removed, which isn't the case when the node is the owner.

I vaguely remember that this was how node-local storage was meant to be managed. It might even have code already.

Thanks @pohly ! I just tested this with using --capacity-ownerref-level=0 to set Pod as the owner and it is working well. I will close the PR and the issue then. Thanks for the suggestion.

k8s-ci-robot assigned yibozhuang Jan 29, 2025

yibozhuang mentioned this issue Jan 29, 2025

Set ownerRef of CSIStorageCapacity to node when using node deployment #1338

Closed

yibozhuang closed this as completed Jan 29, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Should CSIStorageCapacity ownerReference use node when node-deployment=true #1337

Should CSIStorageCapacity ownerReference use node when node-deployment=true #1337

yibozhuang commented Jan 29, 2025

yibozhuang commented Jan 29, 2025

pohly commented Jan 29, 2025

yibozhuang commented Jan 29, 2025

Should CSIStorageCapacity ownerReference use node when node-deployment=true #1337

Should CSIStorageCapacity ownerReference use node when node-deployment=true #1337

Comments

yibozhuang commented Jan 29, 2025

yibozhuang commented Jan 29, 2025

pohly commented Jan 29, 2025

yibozhuang commented Jan 29, 2025