-
Notifications
You must be signed in to change notification settings - Fork 90
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[da-vinci] Bug fix for peer discovery in DVC with multi-stores #1503
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM overall, left one comment.
* @return the store client | ||
*/ | ||
AbstractAvroStoreClient getStoreClient(String storeName) { | ||
if (!storeToClientMap.containsKey(storeName)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since its using a concurrent map, is it better to use computeIfAbsent
instead of containsKey
and put
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call! updated with computeIfAbsent.
[da-vinci] Bug fix for peer discovery in DVC with multi-stores
When testing blob transfer on DVC hosts with multiple stores, it was found that some stores encountered errors while finding peers. The error message is as follows:
2025/01/30 18:41:32.990 ERROR [DaVinciBlobFinder] [ForkJoinPool.commonPool-worker-58] [flip-war] [] Error finding DVC peers for blob transfer in store: FeedEngagementCounts4dByRootObjectUrn, version: 191, partition: 198 java.lang.IllegalArgumentException: argument "src" is null at com.fasterxml.jackson.databind.ObjectMapper._assertNotNull(ObjectMapper.java:5072) ~[com.fasterxml.jackson.core.jackson-databind-2.18.0.jar:2.18.0] at com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:3920) ~[com.fasterxml.jackson.core.jackson-databind-2.18.0.jar:2.18.0] at com.linkedin.venice.blobtransfer.DaVinciBlobFinder.lambda$discoverBlobPeers$0(DaVinciBlobFinder.java:45)
The root cause was traced to the peerFinder using the clientConfig from the daVinciBackend during initialization. However, when daVinciBackend is initialized, it only occurs for the first store, and subsequent stores reuse the existing daVinciBackend (refer: AvroGenericDaVinciClient.java). As a result, only the clientConfig from the first store is passed to the peerFinder, causing the peerFinder to send requests to the first store's router to retrieve peer information.
This PR updates the logic to pass the clientConfig to the peerFinder and creates a separate store client for each store.
How was this PR tested?
integration test.
Does this PR introduce any user-facing changes?