-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix data race due to trimming ManagedFields in resource syncer #891
Fix data race due to trimming ManagedFields in resource syncer #891
Conversation
This was observed in a unit test elsewhere that configures a resync period. On resync, the K8s DeltaFIFO retrieves every object from the cache store and re-queues them. It also invokes the transform function which results in a data race when the resource syncer tries to nil the ManagedFields. This is because the object instance is the same as that stored in the cache which was previously accessed by another thread. So mutating it without the protection of a lock is unsafe. This is really an issue with the DeltaFIFO - it is documented that the "TransformFunc sees the object before any other actor, and it is now safe to mutate the object in place instead of making a copy" however this is not the case on a re-sync. The DeltaFIFO should either elide the TransformFunc on re-sync or make a copy of the object retrieved from the cache store. As a workaround, the resource syncer should only set ManagedFields to nil if it is non-nil, which will be the case a a re-sync. Added a unit test to cover this case. Also the object passed to the TransformFunc could be a DeletedFinalStateUnknown so we need to handle that as well. Signed-off-by: Tom Pantelis <[email protected]>
🤖 Created branch: z_pr891/tpantelis/transform_fn_data_race |
Expect(resourceutils.MustToMeta(obj).GetManagedFields()).Should(BeNil()) | ||
|
||
// Sleep a little so a re-sync occurs and doesn't cause a data race. | ||
time.Sleep(200) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn’t something be checked after the sleep?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's nothing we can check here but Ginkgo will fail with a data race if it occurs. So we're just trying to trigger a data race here.
Has this been reported upstream? |
|
🤖 Closed branches: [z_pr891/tpantelis/transform_fn_data_race] |
This was observed in a unit test elsewhere that configures a resync period. On re-sync, the K8s
DeltaFIFO
retrieves every object from the cache store and re-queues them. It also invokes the transform function which results in a data race when the resource syncer tries to nil theManagedFields
. This is because the object instance is the same as that stored in the cache which was previously accessed by another thread. So mutating it without the protection of a lock is unsafe. This is really an issue with theDeltaFIFO
- it is documented that the "TransformFunc sees the object before any other actor, and it is now safe to mutate the object in place instead of making a copy" however this is not the case on a re-sync. TheDeltaFIFO
should either elide theTransformFunc
on re-sync or make a copy of the object retrieved from the cache store.As a workaround, the resource syncer should only set
ManagedFields
to nil if it is non-nil, which will be the case a a re-sync. Added a unit test to cover this case.Also the object passed to the
TransformFunc
could be aDeletedFinalStateUnknown
so we need to handle that as well.