-
Notifications
You must be signed in to change notification settings - Fork 96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Race condition in terraformPluginSDKExternal causing provider restarts #472
Comments
We've seen another failure related to this code, but now it's about writing to an uninitialized map. This occured in the S3 provider. See:
|
I managed to reproduce this by creating 2500 SQS Queues via a composition. Then I created 500 other queues using the same composition and making it target the same downstream resource via external-name, but with different tags. It triggers reliably every 20 minutes or so. I also replaced json-iter import with the fix mentioned by someone on Slack, but the race detector still detects the same race condition. If it's helpful, I can share the composition I used. It's likely this can be triggered with raw MR's. This manifests when there are multiple MR's, with differing definitions, mutating tags at the same downstream resource. |
What happened?
This has been previously reported here and here. Unfortunately, it hasn't been resolved yet.
This issue is prevalent in big environments where the provider manages many resources. The end result is that the Upjet provider pod restarts and affects stability in large Crossplane-managed environments using Upjet generated providers.
How can we reproduce it?
This is observed in a large Upjet-managed environment, for example one with a few hundred SQS Queues.
Root cause
I looked into this a bit and ran a race detector on Upjet itself, it fails with:
It looks like it might be the cause of the problem, where the controller's Create and Update call the SetObservation method concurrently on a vanilla Go map which is not thread safe.
Solution
Introduce synchronization or use a thread-safe data structure like sync.Map.
The text was updated successfully, but these errors were encountered: