Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set random pk in kinesis source #362

Merged
merged 4 commits into from
Aug 28, 2024
Merged

Conversation

colmsnowplow
Copy link
Collaborator

@colmsnowplow colmsnowplow commented Aug 26, 2024

Jira ref: PDP-1431

This PR:

  • Updates the UUID package we use from the now neglected "github.com/twinj/uuid" to "github.com/google/uuid"
  • Calls the uuid.EnableRandPool() setting on initialisation wherever it is used (tests revealed that without this setting, partition key skew is higher by a factor of something between 0.5 - 1% - which is small but did result in a small difference in distribution of data across shards in test when sending data to kinesis)
  • Changes the kinesis source to generate new UUIDs for partition keys instead of re-using the existing one. This fixes the issue described here

@colmsnowplow colmsnowplow merged commit 4967542 into release/2.4.2 Aug 28, 2024
2 checks passed
@colmsnowplow colmsnowplow deleted the kinesis-source-pk branch August 28, 2024 11:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants