Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CBG-4108 acquire lock to prevent NewWaiter data race #7012

Merged
merged 3 commits into from
Jul 24, 2024
Merged

Conversation

torcolvin
Copy link
Collaborator

@torcolvin torcolvin commented Jul 24, 2024

I can not actually reproduce the data race with high -count -race.

WARNING: DATA RACE
18:41:55 Write at 0x00c0008ecb98 by goroutine 57618:
18:41:55   github.com/couchbase/sync_gateway/db.(*changeListener).NotifyCheckForTermination()
18:41:55       /Users/couchbase/workspace/sgw-unix-build/3.2.0/community/sync_gateway/db/change_listener.go:237 +0xf8
18:41:55   github.com/couchbase/sync_gateway/db.(*DatabaseContext).NotifyTerminatedChanges()
18:41:55       /Users/couchbase/workspace/sgw-unix-build/3.2.0/community/sync_gateway/db/database.go:793 +0x1d0
18:41:55   github.com/couchbase/sync_gateway/db.(*blipHandler).sendChanges()
18:41:55       /Users/couchbase/workspace/sgw-unix-build/3.2.0/community/sync_gateway/db/blip_handler.go:521 +0xa54
18:41:55   github.com/couchbase/sync_gateway/db.(*blipHandler).handleSubChanges.func2()
18:41:55       /Users/couchbase/workspace/sgw-unix-build/3.2.0/community/sync_gateway/db/blip_handler.go:352 +0x939
18:41:55 
18:41:55 Previous read at 0x00c0008ecb98 by goroutine 57732:
18:41:55   github.com/couchbase/sync_gateway/db.(*changeListener).NewWaiter()
18:41:55       /Users/couchbase/workspace/sgw-unix-build/3.2.0/community/sync_gateway/db/change_listener.go:311 +0x526
18:41:55   github.com/couchbase/sync_gateway/db.(*changeListener).NewWaiterWithChannels()
18:41:55       /Users/couchbase/workspace/sgw-unix-build/3.2.0/community/sync_gateway/db/change_listener.go:330 +0x3bd
18:41:55   github.com/couchbase/sync_gateway/db.(*Database).NewUserWaiter()
18:41:55       /Users/couchbase/workspace/sgw-unix-build/3.2.0/community/sync_gateway/db/change_listener.go:416 +0x2d0
18:41:55   github.com/couchbase/sync_gateway/db.NewBlipSyncContext()
18:41:55       /Users/couchbase/workspace/sgw-unix-build/3.2.0/community/sync_gateway/db/blip_sync_context.go:52 +0x4e9
18:41:55   github.com/couchbase/sync_gateway/rest.(*handler).handleBLIPSync()
18:41:55       /Users/couchbase/workspace/sgw-unix-build/3.2.0/community/sync_gateway/rest/blip_sync.go:64 +0xa64
18:41:55   github.com/couchbase/sync_gateway/rest.(*handler).invoke()
18:41:55       /Users/couchbase/workspace/sgw-unix-build/3.2.0/community/sync_gateway/rest/handler.go:282 +0x32f
18:41:55   github.com/couchbase/sync_gateway/rest.createCommonRouter.makeHandler.makeHandlerWithOptions.func35()
18:41:55       /Users/couchbase/workspace/sgw-unix-build/3.2.0/community/sync_gateway/rest/handler.go:127 +0x15d
18:41:55   net/http.HandlerFunc.ServeHTTP()
18:41:55       /Users/couchbase/cbdeps/go1.22.5/src/net/http/server.go:2171 +0x47
18:41:55   github.com/couchbase/sync_gateway/rest.NewRouter.withServerType.func1.1()
18:41:55       /Users/couchbase/workspace/sgw-unix-build/3.2.0/community/sync_gateway/rest/routing.go:367 +0x97
18:41:55   net/http.HandlerFunc.ServeHTTP()
18:41:55       /Users/couchbase/cbdeps/go1.22.5/src/net/http/server.go:2171 +0x47
18:41:55   github.com/gorilla/mux.(*Router).ServeHTTP()
18:41:55       /Users/couchbase/workspace/sgw-unix-build/3.2.0/community/godeps/pkg/mod/github.com/gorilla/[email protected]/mux.go:212 +0x371
18:41:55   github.com/couchbase/sync_gateway/rest.CreateAdminHandler.wrapRouter.func1()
18:41:55       /Users/couchbase/workspace/sgw-unix-build/3.2.0/community/sync_gateway/rest/routing.go:418 +0x27b
18:41:55   net/http.HandlerFunc.ServeHTTP()
18:41:55       /Users/couchbase/cbdeps/go1.22.5/src/net/http/server.go:2171 +0x47
18:41:55   net/http.serverHandler.ServeHTTP()
18:41:55       /Users/couchbase/cbdeps/go1.22.5/src/net/http/server.go:3142 +0x2a1
18:41:55   net/http.(*conn).serve()
18:41:55       /Users/couchbase/cbdeps/go1.22.5/src/net/http/server.go:2044 +0x13c4
18:41:55   net/http.(*Server).Serve.gowrap3()
18:41:55       /Users/couchbase/cbdeps/go1.22.5/src/net/http/server.go:3290 +0x4f
18:41:55 
18:41:55 Goroutine 57618 (running) created at:
18:41:55   github.com/couchbase/sync_gateway/db.(*blipHandler).handleSubChanges()
18:41:55       /Users/couchbase/workspace/sgw-unix-build/3.2.0/community/sync_gateway/db/blip_handler.go:334 +0x110e
18:41:55   github.com/couchbase/sync_gateway/db.init.collectionBlipHandler.func6()
18:41:55       /Users/couchbase/workspace/sgw-unix-build/3.2.0/community/sync_gateway/db/blip_handler.go:202 +0x5d0
18:41:55   github.com/couchbase/sync_gateway/db.init.userBlipHandler.func7()
18:41:55       /Users/couchbase/workspace/sgw-unix-build/3.2.0/community/sync_gateway/db/blip_handler.go:122 +0x70
18:41:55   github.com/couchbase/sync_gateway/db.NewBlipSyncContext.(*BlipSyncContext).register.func2()
18:41:55       /Users/couchbase/workspace/sgw-unix-build/3.2.0/community/sync_gateway/db/blip_sync_context.go:203 +0x311
18:41:55   github.com/couchbase/go-blip.(*Context).dispatchRequest()
18:41:55       /Users/couchbase/workspace/sgw-unix-build/3.2.0/community/godeps/pkg/mod/github.com/couchbase/[email protected]/context.go:304 +0x2a5
18:41:55   github.com/couchbase/go-blip.(*receiver).getPendingRequest.func1()
18:41:55       /Users/couchbase/workspace/sgw-unix-build/3.2.0/community/godeps/pkg/mod/github.com/couchbase/[email protected]/receiver.go:276 +0x6d
18:41:55   github.com/couchbase/go-blip.(*Message).asyncRead.func1()
18:41:55       /Users/couchbase/workspace/sgw-unix-build/3.2.0/community/godeps/pkg/mod/github.com/couchbase/[email protected]/message.go:381 +0x127
18:41:55 
18:41:55 Goroutine 57732 (running) created at:
18:41:55   net/http.(*Server).Serve()
18:41:55       /Users/couchbase/cbdeps/go1.22.5/src/net/http/server.go:3290 +0x8ec
18:41:55   net/http/httptest.(*Server).goServe.func1()
18:41:55       /Users/couchbase/cbdeps/go1.22.5/src/net/http/httptest/server.go:310 +0xbb
18:41:55 ==================

@@ -308,7 +309,7 @@ func (listener *changeListener) NewWaiter(keys []string, trackUnusedSequences bo
listener: listener,
keys: keys,
lastCounter: listener.CurrentCount(keys),
lastTerminateCheckCounter: listener.terminateCheckCounter,
lastTerminateCheckCounter: listener.terminateCheckCounter.Load(),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think your PR comment suggests that that using listener.L.Lock to safely fetch the value of terminateCheckCounter here will deadlock. Can you clarify why that's the case, or what the issue of that approach is? It would be preferable to avoid acquiring two locks in changeListener.Wait, since that's used highly concurrently.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I misanalysed the situation but it was caught in tests. I pushed an update that removes atomic and just acquires the lock once for NewWaiter instead of once for listener._currentCount and listener.terminateCheckCounter.

As far as I could tell, everywhere else we already acquire the lock on read or write of listener._terminateCheckCounter.

The lock was already used for listener.CurrentCount(keys), so mark
_terminateCheckCounter as needing a lock as well.
@torcolvin torcolvin changed the title CBG-4108 use an atomic to avoid data race CBG-4108 acquire lock to prevent NewWaiter data race Jul 24, 2024
@torcolvin torcolvin assigned adamcfraser and unassigned torcolvin Jul 24, 2024
@adamcfraser adamcfraser merged commit 366b045 into main Jul 24, 2024
38 checks passed
@adamcfraser adamcfraser deleted the CBG-4108 branch July 24, 2024 21:33
torcolvin added a commit that referenced this pull request Oct 22, 2024
* CBG-4108 use an atomic to avoid data race

* Grab one lock in NewWaiter

The lock was already used for listener.CurrentCount(keys), so mark
_terminateCheckCounter as needing a lock as well.

* Avoid double lock on NewWaiterWithChannels
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants