Fix data race due to trimming ManagedFields in resource syncer #891

tpantelis · 2024-04-16T12:38:11Z

This was observed in a unit test elsewhere that configures a resync period. On re-sync, the K8s DeltaFIFO retrieves every object from the cache store and re-queues them. It also invokes the transform function which results in a data race when the resource syncer tries to nil the ManagedFields. This is because the object instance is the same as that stored in the cache which was previously accessed by another thread. So mutating it without the protection of a lock is unsafe. This is really an issue with the DeltaFIFO - it is documented that the "TransformFunc sees the object before any other actor, and it is now safe to mutate the object in place instead of making a copy" however this is not the case on a re-sync. The DeltaFIFO should either elide the TransformFunc on re-sync or make a copy of the object retrieved from the cache store.

As a workaround, the resource syncer should only set ManagedFields to nil if it is non-nil, which will be the case a a re-sync. Added a unit test to cover this case.

Also the object passed to the TransformFunc could be a DeletedFinalStateUnknown so we need to handle that as well.

This was observed in a unit test elsewhere that configures a resync period. On resync, the K8s DeltaFIFO retrieves every object from the cache store and re-queues them. It also invokes the transform function which results in a data race when the resource syncer tries to nil the ManagedFields. This is because the object instance is the same as that stored in the cache which was previously accessed by another thread. So mutating it without the protection of a lock is unsafe. This is really an issue with the DeltaFIFO - it is documented that the "TransformFunc sees the object before any other actor, and it is now safe to mutate the object in place instead of making a copy" however this is not the case on a re-sync. The DeltaFIFO should either elide the TransformFunc on re-sync or make a copy of the object retrieved from the cache store. As a workaround, the resource syncer should only set ManagedFields to nil if it is non-nil, which will be the case a a re-sync. Added a unit test to cover this case. Also the object passed to the TransformFunc could be a DeletedFinalStateUnknown so we need to handle that as well. Signed-off-by: Tom Pantelis <[email protected]>

submariner-bot · 2024-04-16T12:38:14Z

🤖 Created branch: z_pr891/tpantelis/transform_fn_data_race

skitt · 2024-04-16T12:58:29Z

pkg/syncer/resource_syncer_test.go

+		Expect(resourceutils.MustToMeta(obj).GetManagedFields()).Should(BeNil())
+
+		// Sleep a little so a re-sync occurs and doesn't cause a data race.
+		time.Sleep(200)


Shouldn’t something be checked after the sleep?

There's nothing we can check here but Ginkgo will fail with a data race if it occurs. So we're just trying to trigger a data race here.

skitt · 2024-04-16T13:06:52Z

Has this been reported upstream?

tpantelis · 2024-04-16T13:18:47Z

Has this been reported upstream?

kubernetes/kubernetes#124337

submariner-bot · 2024-04-17T06:28:07Z

🤖 Closed branches: [z_pr891/tpantelis/transform_fn_data_race]

tpantelis requested review from dfarrell07 and aswinsuryan April 16, 2024 12:38

tpantelis self-assigned this Apr 16, 2024

tpantelis requested review from Oats87, skitt, sridhargaddam and vthapar as code owners April 16, 2024 12:38

skitt reviewed Apr 16, 2024

View reviewed changes

skitt approved these changes Apr 16, 2024

View reviewed changes

tpantelis enabled auto-merge (rebase) April 16, 2024 22:35

tpantelis mentioned this pull request Apr 17, 2024

Retry ipset DestroySet/DestroyAllSets if error indicates in use submariner-io/submariner#2978

Merged

yboaron approved these changes Apr 17, 2024

View reviewed changes

tpantelis merged commit 5cfcef6 into submariner-io:devel Apr 17, 2024
16 checks passed

skitt mentioned this pull request Apr 17, 2024

Add reusable TrimManagedFields transform function #893

Merged

tpantelis deleted the transform_fn_data_race branch July 17, 2024 16:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix data race due to trimming ManagedFields in resource syncer #891

Fix data race due to trimming ManagedFields in resource syncer #891

tpantelis commented Apr 16, 2024

submariner-bot commented Apr 16, 2024

skitt Apr 16, 2024

tpantelis Apr 16, 2024

skitt commented Apr 16, 2024

tpantelis commented Apr 16, 2024 •

edited

Loading

submariner-bot commented Apr 17, 2024

Fix data race due to trimming ManagedFields in resource syncer #891

Fix data race due to trimming ManagedFields in resource syncer #891

Conversation

tpantelis commented Apr 16, 2024

submariner-bot commented Apr 16, 2024

skitt Apr 16, 2024

Choose a reason for hiding this comment

tpantelis Apr 16, 2024

Choose a reason for hiding this comment

skitt commented Apr 16, 2024

tpantelis commented Apr 16, 2024 • edited Loading

submariner-bot commented Apr 17, 2024

tpantelis commented Apr 16, 2024 •

edited

Loading