Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(controller): Fix race of csi controller calls on the same volume #588

Open
wants to merge 2 commits into
base: develop
Choose a base branch
from
Open
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 25 additions & 0 deletions pkg/driver/controller.go
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ import (
"fmt"
"strconv"
"strings"
"sync"
"time"

"github.com/container-storage-interface/spec/lib/go/csi"
Expand Down Expand Up @@ -63,6 +64,15 @@ type controller struct {

k8sNodeInformer cache.SharedIndexInformer
zfsNodeInformer cache.SharedIndexInformer

volMutexes sync.Map
}

func (cs *controller) LockVolume(volume string) func() {
value, _ := cs.volMutexes.LoadOrStore(volume, &sync.Mutex{})

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wondering, are entries automatically GC'd?
Otherwise this may grow "forever" ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, they are not getting GC'd. The Mutex can only be removed from the Map if there are no other references to the Mutex. Im not sure if this relatively small leak (the volume string key is probably larger than the mutex) is worth the additional complexity.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, it might grow to large extents? Also what about deleleted volume, the entries for them will also stay in the map right?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems most drivers on kubernetes-csi actually use the same exact implementation: https://github.com/kubernetes-sigs/gcp-compute-persistent-disk-csi-driver/blob/master/pkg/common/volume_lock.go
Which handles it by not using N locks but rather keep a list of volumes and deleting the volume entry.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are handling it a bit different, instead of waiting for the lock, the csi call errors instantly with An operation with the given Volume ID %s already exists https://github.com/kubernetes-sigs/gcp-compute-persistent-disk-csi-driver/blob/master/pkg/gce-pd-csi-driver/node.go#L140

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah guess we'd need to do the same here as well. Perhaps that's also a better approach, letting the caller retry.
Should we use this new way instead? @Lucaber @Abhinandan-Purkait

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have looked at a few different Implementations. TopoLVM also uses a single map to add and remove keys but using a sync.Cond that locks the second concurrent call like my implementation: https://github.com/topolvm/topolvm/blob/main/internal/driver/lock.go This seems to be the best way to handle this.

Alternatively we could also use the keymutex implementation by kubernetes that used a few mutexes which are being shared between volumes. But this would require special handling to ensure that the snapshot and parent volume are using different mutexes (to prevent deadlocking). https://github.com/kubernetes/utils/blob/master/keymutex/hashed.go

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The topolvm one looks reasonable, wdyt?

mtx := value.(*sync.Mutex)
mtx.Lock()
return func() { mtx.Unlock() }
}

// NewController returns a new instance
Expand Down Expand Up @@ -448,6 +458,9 @@ func (cs *controller) CreateVolume(
contentSource := req.GetVolumeContentSource()
pvcName := helpers.GetInsensitiveParameter(&parameters, "csi.storage.k8s.io/pvc/name")

unlock := cs.LockVolume(volName)
defer unlock()

if contentSource != nil && contentSource.GetSnapshot() != nil {
snapshotID := contentSource.GetSnapshot().GetSnapshotId()

Expand Down Expand Up @@ -491,6 +504,8 @@ func (cs *controller) DeleteVolume(
}

volumeID := strings.ToLower(req.GetVolumeId())
unlock := cs.LockVolume(volumeID)
defer unlock()

// verify if the volume has already been deleted
vol, err := zfs.GetVolume(volumeID)
Expand Down Expand Up @@ -609,6 +624,8 @@ func (cs *controller) ControllerExpandVolume(
"ControllerExpandVolume: no volumeID provided",
)
}
unlock := cs.LockVolume(volumeID)
defer unlock()

/* round off the new size */
updatedSize := getRoundedCapacity(req.GetCapacityRange().GetRequiredBytes())
Expand Down Expand Up @@ -705,6 +722,10 @@ func (cs *controller) CreateSnapshot(
if err != nil {
return nil, err
}
unlockVol := cs.LockVolume(volumeID)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok for now but if we have more uses of the volume&snapshot, might be worth having a function to lock, ensuring the locks are always taken in the same order to avoid deadlock

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, added a new function LockVolumeWithSnapshot

defer unlockVol()
unlockSnap := cs.LockVolume(snapName)
defer unlockSnap()

snapTimeStamp := time.Now().Unix()
var state string
Expand Down Expand Up @@ -801,6 +822,10 @@ func (cs *controller) DeleteSnapshot(
// should succeed when an invalid snapshot id is used
return &csi.DeleteSnapshotResponse{}, nil
}
unlockVol := cs.LockVolume(snapshotID[0])
defer unlockVol()
unlockSnap := cs.LockVolume(snapshotID[1])
defer unlockSnap()
if err := zfs.DeleteSnapshot(snapshotID[1]); err != nil {
return nil, status.Errorf(
codes.Internal,
Expand Down
Loading