-
Notifications
You must be signed in to change notification settings - Fork 102
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(controller): Fix race of csi controller calls on the same volume #588
base: develop
Are you sure you want to change the base?
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -4,6 +4,7 @@ import ( | |
"fmt" | ||
"strconv" | ||
"strings" | ||
"sync" | ||
"time" | ||
|
||
"github.com/container-storage-interface/spec/lib/go/csi" | ||
|
@@ -63,6 +64,15 @@ type controller struct { | |
|
||
k8sNodeInformer cache.SharedIndexInformer | ||
zfsNodeInformer cache.SharedIndexInformer | ||
|
||
volMutexes sync.Map | ||
} | ||
|
||
func (cs *controller) LockVolume(volume string) func() { | ||
value, _ := cs.volMutexes.LoadOrStore(volume, &sync.Mutex{}) | ||
mtx := value.(*sync.Mutex) | ||
mtx.Lock() | ||
return func() { mtx.Unlock() } | ||
} | ||
|
||
// NewController returns a new instance | ||
|
@@ -448,6 +458,9 @@ func (cs *controller) CreateVolume( | |
contentSource := req.GetVolumeContentSource() | ||
pvcName := helpers.GetInsensitiveParameter(¶meters, "csi.storage.k8s.io/pvc/name") | ||
|
||
unlock := cs.LockVolume(volName) | ||
defer unlock() | ||
|
||
if contentSource != nil && contentSource.GetSnapshot() != nil { | ||
snapshotID := contentSource.GetSnapshot().GetSnapshotId() | ||
|
||
|
@@ -491,6 +504,8 @@ func (cs *controller) DeleteVolume( | |
} | ||
|
||
volumeID := strings.ToLower(req.GetVolumeId()) | ||
unlock := cs.LockVolume(volumeID) | ||
defer unlock() | ||
|
||
// verify if the volume has already been deleted | ||
vol, err := zfs.GetVolume(volumeID) | ||
|
@@ -609,6 +624,8 @@ func (cs *controller) ControllerExpandVolume( | |
"ControllerExpandVolume: no volumeID provided", | ||
) | ||
} | ||
unlock := cs.LockVolume(volumeID) | ||
defer unlock() | ||
|
||
/* round off the new size */ | ||
updatedSize := getRoundedCapacity(req.GetCapacityRange().GetRequiredBytes()) | ||
|
@@ -705,6 +722,10 @@ func (cs *controller) CreateSnapshot( | |
if err != nil { | ||
return nil, err | ||
} | ||
unlockVol := cs.LockVolume(volumeID) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ok for now but if we have more uses of the volume&snapshot, might be worth having a function to lock, ensuring the locks are always taken in the same order to avoid deadlock There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good point, added a new function |
||
defer unlockVol() | ||
unlockSnap := cs.LockVolume(snapName) | ||
defer unlockSnap() | ||
|
||
snapTimeStamp := time.Now().Unix() | ||
var state string | ||
|
@@ -801,6 +822,10 @@ func (cs *controller) DeleteSnapshot( | |
// should succeed when an invalid snapshot id is used | ||
return &csi.DeleteSnapshotResponse{}, nil | ||
} | ||
unlockVol := cs.LockVolume(snapshotID[0]) | ||
defer unlockVol() | ||
unlockSnap := cs.LockVolume(snapshotID[1]) | ||
defer unlockSnap() | ||
if err := zfs.DeleteSnapshot(snapshotID[1]); err != nil { | ||
return nil, status.Errorf( | ||
codes.Internal, | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just wondering, are entries automatically GC'd?
Otherwise this may grow "forever" ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, they are not getting GC'd. The Mutex can only be removed from the Map if there are no other references to the Mutex. Im not sure if this relatively small leak (the volume string key is probably larger than the mutex) is worth the additional complexity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, it might grow to large extents? Also what about deleleted volume, the entries for them will also stay in the map right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems most drivers on kubernetes-csi actually use the same exact implementation: https://github.com/kubernetes-sigs/gcp-compute-persistent-disk-csi-driver/blob/master/pkg/common/volume_lock.go
Which handles it by not using
N
locks but rather keep a list of volumes and deleting the volume entry.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are handling it a bit different, instead of waiting for the lock, the csi call errors instantly with
An operation with the given Volume ID %s already exists
https://github.com/kubernetes-sigs/gcp-compute-persistent-disk-csi-driver/blob/master/pkg/gce-pd-csi-driver/node.go#L140There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah guess we'd need to do the same here as well. Perhaps that's also a better approach, letting the caller retry.
Should we use this new way instead? @Lucaber @Abhinandan-Purkait
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have looked at a few different Implementations. TopoLVM also uses a single map to add and remove keys but using a sync.Cond that locks the second concurrent call like my implementation: https://github.com/topolvm/topolvm/blob/main/internal/driver/lock.go This seems to be the best way to handle this.
Alternatively we could also use the keymutex implementation by kubernetes that used a few mutexes which are being shared between volumes. But this would require special handling to ensure that the snapshot and parent volume are using different mutexes (to prevent deadlocking). https://github.com/kubernetes/utils/blob/master/keymutex/hashed.go
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The topolvm one looks reasonable, wdyt?