Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(controller): Fix race of csi controller calls on the same volume #588

Open
wants to merge 2 commits into
base: develop
Choose a base branch
from

Conversation

Lucaber
Copy link

@Lucaber Lucaber commented Sep 17, 2024

Why is this PR required? What issue does it fix?:

When a CreateVolume requests takes longer then 10 seconds the request runs into a client-side timeout and gets retied by kubelet. While processing the new request, the old request is still being canceled on the server-side and the created ZFSVolume CR gets deleted.

1 2
CreateVolume call
ZFSVolume CR created
waiting for volume creation
client timeout + context cancel
CreateVolume call
ZFSVolume CR already exists
ZFSVolume CR deleted
Return ERR Return OK without creating a ZFSVolume
I0916 15:01:49.227721       1 grpc.go:72] GRPC call: /csi.v1.Controller/CreateVolume requests {"accessibility_requirements":{"preferred":[{"segments":{"beta.kubernetes.io/arch":"amd64","beta.kubernetes.io/os":"linux","kubernetes.io/arch":"amd64","kubernetes.io/hostname":"i04-1-b-node-i041b-0-0","kubernetes.io/os":"linux","m3.services/cluster":"i04-1-b","m3.services/worker-group":"i041b-0","node.systems.mittwald.cloud/placement-group":"i041b-0","openebs.io/nodeid":"i04-1-b-node-i041b-0-0","openebs.io/nodename":"i04-1-b-node-i041b-0-0"}}],"requisite":[{"segments":{"beta.kubernetes.io/arch":"amd64","beta.kubernetes.io/os":"linux","kubernetes.io/arch":"amd64","kubernetes.io/hostname":"i04-1-b-node-i041b-0-0","kubernetes.io/os":"linux","m3.services/cluster":"i04-1-b","m3.services/worker-group":"i041b-0","node.systems.mittwald.cloud/placement-group":"i041b-0","openebs.io/nodeid":"i04-1-b-node-i041b-0-0","openebs.io/nodename":"i04-1-b-node-i041b-0-0"}}]},"capacity_range":{"required_bytes":5368709120},"name":"pvc-eb11c446-f26c-4333-b3da-4ecb9c5b2457","parameters":{"compression":"off","csi.storage.k8s.io/pv/name":"pvc-eb11c446-f26c-4333-b3da-4ecb9c5b2457","csi.storage.k8s.io/pvc/name":"test-pvc0","csi.storage.k8s.io/pvc/namespace":"default","dedup":"off","fstype":"zfs","poolname":"kluster-pool/i04-1-b-128k","recordsize":"128k","shared":"yes","thinprovision":"yes"},"volume_capabilities":[{"AccessType":{"Mount":{"fs_type":"zfs"}},"access_mode":{"mode":1}}]}
I0916 15:01:58.853180       1 controller.go:289] zfs: trying volume creation kluster-pool/i04-1-b-128k/pvc-eb11c446-f26c-4333-b3da-4ecb9c5b2457 on node [i04-1-b-node-i041b-0-0]
I0916 15:01:59.104207       1 volume.go:139] zfs: waiting for volume kluster-pool/i04-1-b-128k/pvc-eb11c446-f26c-4333-b3da-4ecb9c5b2457 to be created on nodeid i04-1-b-node-i041b-0-0
I0916 15:02:00.105273       1 volume.go:170] zfs: volume kluster-pool/i04-1-b-128k/pvc-eb11c446-f26c-4333-b3da-4ecb9c5b2457 provisioning failed on nodeid i04-1-b-node-i041b-0-0 err: zfs: context deadline reached
I0916 15:02:00.235113       1 grpc.go:72] GRPC call: /csi.v1.Controller/CreateVolume requests {"accessibility_requirements":{"preferred":[{"segments":{"beta.kubernetes.io/arch":"amd64","beta.kubernetes.io/os":"linux","kubernetes.io/arch":"amd64","kubernetes.io/hostname":"i04-1-b-node-i041b-0-0","kubernetes.io/os":"linux","m3.services/cluster":"i04-1-b","m3.services/worker-group":"i041b-0","node.systems.mittwald.cloud/placement-group":"i041b-0","openebs.io/nodeid":"i04-1-b-node-i041b-0-0","openebs.io/nodename":"i04-1-b-node-i041b-0-0"}}],"requisite":[{"segments":{"beta.kubernetes.io/arch":"amd64","beta.kubernetes.io/os":"linux","kubernetes.io/arch":"amd64","kubernetes.io/hostname":"i04-1-b-node-i041b-0-0","kubernetes.io/os":"linux","m3.services/cluster":"i04-1-b","m3.services/worker-group":"i041b-0","node.systems.mittwald.cloud/placement-group":"i041b-0","openebs.io/nodeid":"i04-1-b-node-i041b-0-0","openebs.io/nodename":"i04-1-b-node-i041b-0-0"}}]},"capacity_range":{"required_bytes":5368709120},"name":"pvc-eb11c446-f26c-4333-b3da-4ecb9c5b2457","parameters":{"compression":"off","csi.storage.k8s.io/pv/name":"pvc-eb11c446-f26c-4333-b3da-4ecb9c5b2457","csi.storage.k8s.io/pvc/name":"test-pvc0","csi.storage.k8s.io/pvc/namespace":"default","dedup":"off","fstype":"zfs","poolname":"kluster-pool/i04-1-b-128k","recordsize":"128k","shared":"yes","thinprovision":"yes"},"volume_capabilities":[{"AccessType":{"Mount":{"fs_type":"zfs"}},"access_mode":{"mode":1}}]}
I0916 15:02:00.449153       1 volume.go:221] zfs: deleted the volume pvc-eb11c446-f26c-4333-b3da-4ecb9c5b2457
I0916 15:02:00.449299       1 controller.go:466] created the volume kluster-pool/i04-1-b-128k/pvc-eb11c446-f26c-4333-b3da-4ecb9c5b2457 on node i04-1-b-node-i041b-0-0
I0916 15:02:00.553778       1 grpc.go:81] GRPC response: {"volume":{"accessible_topology":[{"segments":{"openebs.io/nodeid":"i04-1-b-node-i041b-0-0"}}],"capacity_bytes":5368709120,"volume_context":{"openebs.io/cas-type":"localpv-zfs","openebs.io/poolname":"kluster-pool/i04-1-b-128k"},"volume_id":"pvc-eb11c446-f26c-4333-b3da-4ecb9c5b2457"}}

What this PR does?: Adds a per volume mutex to csi calls to prevent simultaneous requests

Does this PR require any upgrade changes?: No

If the changes in this PR are manually verified, list down the scenarios covered::

Any additional information for your reviewer? :
Mention if this PR is part of any design or a continuation of previous PRs

Checklist:

  • Fixes #
  • PR Title follows the convention of <type>(<scope>): <subject>
  • Has the change log section been updated?
  • Commit has unit tests
  • Commit has integration tests
  • (Optional) Are upgrade changes included in this PR? If not, mention the issue/PR to track:
  • (Optional) If documentation changes are required, which issue on https://github.com/openebs/website is used to track them:

When a CreateVolume requests takes longer then 10 seconds the request
runs into a client-side timeout and gets retied by kubelet.
While processing the new request, the old request is still being canceled
on the server-side and the created ZFSVolume CR gets deleted.

1|2
CreateVolume|
ZFSVolume CR created|
waiting for volume creation|
client timeout + context cancel|
|CreateVolume
|ZFSVolume CR already exists
ZFSVolume CR deleted|
Return ERR|Return OK without creating a ZFSVolume

```
I0916 15:01:49.227721       1 grpc.go:72] GRPC call: /csi.v1.Controller/CreateVolume requests {"accessibility_requirements":{"preferred":[{"segments":{"beta.kubernetes.io/arch":"amd64","beta.kubernetes.io/os":"linux","kubernetes.io/arch":"amd64","kubernetes.io/hostname":"i04-1-b-node-i041b-0-0","kubernetes.io/os":"linux","m3.services/cluster":"i04-1-b","m3.services/worker-group":"i041b-0","node.systems.mittwald.cloud/placement-group":"i041b-0","openebs.io/nodeid":"i04-1-b-node-i041b-0-0","openebs.io/nodename":"i04-1-b-node-i041b-0-0"}}],"requisite":[{"segments":{"beta.kubernetes.io/arch":"amd64","beta.kubernetes.io/os":"linux","kubernetes.io/arch":"amd64","kubernetes.io/hostname":"i04-1-b-node-i041b-0-0","kubernetes.io/os":"linux","m3.services/cluster":"i04-1-b","m3.services/worker-group":"i041b-0","node.systems.mittwald.cloud/placement-group":"i041b-0","openebs.io/nodeid":"i04-1-b-node-i041b-0-0","openebs.io/nodename":"i04-1-b-node-i041b-0-0"}}]},"capacity_range":{"required_bytes":5368709120},"name":"pvc-eb11c446-f26c-4333-b3da-4ecb9c5b2457","parameters":{"compression":"off","csi.storage.k8s.io/pv/name":"pvc-eb11c446-f26c-4333-b3da-4ecb9c5b2457","csi.storage.k8s.io/pvc/name":"test-pvc0","csi.storage.k8s.io/pvc/namespace":"default","dedup":"off","fstype":"zfs","poolname":"kluster-pool/i04-1-b-128k","recordsize":"128k","shared":"yes","thinprovision":"yes"},"volume_capabilities":[{"AccessType":{"Mount":{"fs_type":"zfs"}},"access_mode":{"mode":1}}]}
I0916 15:01:58.853180       1 controller.go:289] zfs: trying volume creation kluster-pool/i04-1-b-128k/pvc-eb11c446-f26c-4333-b3da-4ecb9c5b2457 on node [i04-1-b-node-i041b-0-0]
I0916 15:01:59.104207       1 volume.go:139] zfs: waiting for volume kluster-pool/i04-1-b-128k/pvc-eb11c446-f26c-4333-b3da-4ecb9c5b2457 to be created on nodeid i04-1-b-node-i041b-0-0
I0916 15:02:00.105273       1 volume.go:170] zfs: volume kluster-pool/i04-1-b-128k/pvc-eb11c446-f26c-4333-b3da-4ecb9c5b2457 provisioning failed on nodeid i04-1-b-node-i041b-0-0 err: zfs: context deadline reached
I0916 15:02:00.235113       1 grpc.go:72] GRPC call: /csi.v1.Controller/CreateVolume requests {"accessibility_requirements":{"preferred":[{"segments":{"beta.kubernetes.io/arch":"amd64","beta.kubernetes.io/os":"linux","kubernetes.io/arch":"amd64","kubernetes.io/hostname":"i04-1-b-node-i041b-0-0","kubernetes.io/os":"linux","m3.services/cluster":"i04-1-b","m3.services/worker-group":"i041b-0","node.systems.mittwald.cloud/placement-group":"i041b-0","openebs.io/nodeid":"i04-1-b-node-i041b-0-0","openebs.io/nodename":"i04-1-b-node-i041b-0-0"}}],"requisite":[{"segments":{"beta.kubernetes.io/arch":"amd64","beta.kubernetes.io/os":"linux","kubernetes.io/arch":"amd64","kubernetes.io/hostname":"i04-1-b-node-i041b-0-0","kubernetes.io/os":"linux","m3.services/cluster":"i04-1-b","m3.services/worker-group":"i041b-0","node.systems.mittwald.cloud/placement-group":"i041b-0","openebs.io/nodeid":"i04-1-b-node-i041b-0-0","openebs.io/nodename":"i04-1-b-node-i041b-0-0"}}]},"capacity_range":{"required_bytes":5368709120},"name":"pvc-eb11c446-f26c-4333-b3da-4ecb9c5b2457","parameters":{"compression":"off","csi.storage.k8s.io/pv/name":"pvc-eb11c446-f26c-4333-b3da-4ecb9c5b2457","csi.storage.k8s.io/pvc/name":"test-pvc0","csi.storage.k8s.io/pvc/namespace":"default","dedup":"off","fstype":"zfs","poolname":"kluster-pool/i04-1-b-128k","recordsize":"128k","shared":"yes","thinprovision":"yes"},"volume_capabilities":[{"AccessType":{"Mount":{"fs_type":"zfs"}},"access_mode":{"mode":1}}]}
I0916 15:02:00.449153       1 volume.go:221] zfs: deleted the volume pvc-eb11c446-f26c-4333-b3da-4ecb9c5b2457
I0916 15:02:00.449299       1 controller.go:466] created the volume kluster-pool/i04-1-b-128k/pvc-eb11c446-f26c-4333-b3da-4ecb9c5b2457 on node i04-1-b-node-i041b-0-0
I0916 15:02:00.553778       1 grpc.go:81] GRPC response: {"volume":{"accessible_topology":[{"segments":{"openebs.io/nodeid":"i04-1-b-node-i041b-0-0"}}],"capacity_bytes":5368709120,"volume_context":{"openebs.io/cas-type":"localpv-zfs","openebs.io/poolname":"kluster-pool/i04-1-b-128k"},"volume_id":"pvc-eb11c446-f26c-4333-b3da-4ecb9c5b2457"}}
```

Signed-off-by: Luca Berneking <[email protected]>
@codecov-commenter
Copy link

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 96.37%. Comparing base (e3d55d7) to head (0521ba4).
Report is 2 commits behind head on develop.

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@           Coverage Diff            @@
##           develop     #588   +/-   ##
========================================
  Coverage    96.37%   96.37%           
========================================
  Files            1        1           
  Lines          496      496           
========================================
  Hits           478      478           
  Misses          14       14           
  Partials         4        4           
Flag Coverage Δ
bddtests 96.37% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@niladrih
Copy link
Member

niladrih commented Sep 25, 2024

Hmm... this is upsetting. I suppose all of the tests on our Go repos need to use the -race flag on CI pipeline.
I've created this ticket: openebs/openebs#3784

Copy link

@tiagolobocastro tiagolobocastro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!
I wonder if we're missing this in other places...

@@ -705,6 +722,10 @@ func (cs *controller) CreateSnapshot(
if err != nil {
return nil, err
}
unlockVol := cs.LockVolume(volumeID)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok for now but if we have more uses of the volume&snapshot, might be worth having a function to lock, ensuring the locks are always taken in the same order to avoid deadlock

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, added a new function LockVolumeWithSnapshot

}

func (cs *controller) LockVolume(volume string) func() {
value, _ := cs.volMutexes.LoadOrStore(volume, &sync.Mutex{})

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wondering, are entries automatically GC'd?
Otherwise this may grow "forever" ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, they are not getting GC'd. The Mutex can only be removed from the Map if there are no other references to the Mutex. Im not sure if this relatively small leak (the volume string key is probably larger than the mutex) is worth the additional complexity.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, it might grow to large extents? Also what about deleleted volume, the entries for them will also stay in the map right?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems most drivers on kubernetes-csi actually use the same exact implementation: https://github.com/kubernetes-sigs/gcp-compute-persistent-disk-csi-driver/blob/master/pkg/common/volume_lock.go
Which handles it by not using N locks but rather keep a list of volumes and deleting the volume entry.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are handling it a bit different, instead of waiting for the lock, the csi call errors instantly with An operation with the given Volume ID %s already exists https://github.com/kubernetes-sigs/gcp-compute-persistent-disk-csi-driver/blob/master/pkg/gce-pd-csi-driver/node.go#L140

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah guess we'd need to do the same here as well. Perhaps that's also a better approach, letting the caller retry.
Should we use this new way instead? @Lucaber @Abhinandan-Purkait

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have looked at a few different Implementations. TopoLVM also uses a single map to add and remove keys but using a sync.Cond that locks the second concurrent call like my implementation: https://github.com/topolvm/topolvm/blob/main/internal/driver/lock.go This seems to be the best way to handle this.

Alternatively we could also use the keymutex implementation by kubernetes that used a few mutexes which are being shared between volumes. But this would require special handling to ensure that the snapshot and parent volume are using different mutexes (to prevent deadlocking). https://github.com/kubernetes/utils/blob/master/keymutex/hashed.go

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The topolvm one looks reasonable, wdyt?

@Lucaber
Copy link
Author

Lucaber commented Sep 25, 2024

Hmm... this is upsetting. I suppose all of the tests on our Go repos need to use the -race flag on CI pipeline. I've created this ticket: openebs/openebs#3784

This isn't a go data race, -race can't detect this issue.
To reproduce this issue I created 50 PVCs with WaitForFirstConsumer and then started a single pod with all 50 volumes to create a high load on both kubelet and openebs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants