-
Notifications
You must be signed in to change notification settings - Fork 144
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Auto promoted Freight Error Patching ArgoCD Application #2473
Comments
This seems to have been a result of having argocd auto sync policy enabled on the applications managed by the stage. Up to this point, I hadn't seen any particular reason to turn it off/on, as I assumed it didnt matter what was triggering the sync, since the branch the argocd app is associated with only gets updated when new freight is ready/pushed to the branch. With the optimistic locking for argocd app sync added in 0.8.5, I see now that it interferes with kargo's ability to trigger the sync. |
Never mind - this persists sporadically with auto-sync disabled. I'm wondering if it's the application being refreshed (not synced) by Argocd while kargo is trying to trigger its own refresh/sync? They get refreshed fairly quickly because they are configured with webhooks, but I imagine if that were the case, even without webhook configured, a stage that's responsible for enough Applications would run into that eventually when a polling refresh coincided with a kargo promotion. Either way, curious if it make sense to have some kind of retry here getting the resource, generating the patch, and attempting to apply, so single occurrences don't error the promotion. |
Any chance you can catch the individual versions of the object from around the timeframe the issue occurs? Asking because I would like to understand the core of the issue better, before trying to address it. |
Got all apps before, and the failed app after. Only thing that stands out in the diff is status.resources with an HPA recommendation. The latest item in |
I'll try to reproduce this tomorrow to get a better idea. I am a bit suspicious about what causes there to be a race, as due to how this is written, the chances of this happening based on an external factor should be quite slim (i.e., I would not expect this to be caused by ArgoCD itself). At the same time, if Kargo itself issues the patch twice, I would expect the client to catch up fairly quickly, making the chance of this happening also slim. In any of the above scenarios, retrying would indeed be the best solution. But we need to be sure about the actual cause. |
Sounds good, let me know if I can provide any more details to replicate. Even with the chance of it happening being slim, I'm curious in what case we wouldn't want to retry? Maybe I'm misunderstanding the goal of the write lock, but if it is just to prevent overwriting changes made by some other tooling, should something else interacting with the argo application be a potential blocker for a promotion? |
Description
After updating from 0.8.4 to 0.8.6 I'm consistently hitting this error for automatically promoted freight only. A subsequent manual promotion of the same freight, or an initial manual promotion of freight into the same stage is successful.
Steps to Reproduce
promotionMechanisms.gitRepoUpdate
andpromotionMechanism.argoCDAppUpdates
configured with multiple ArgoCD ApplicationsVersion
Client Version: v0.8.7
Server Version: v0.8.6
Logs
The text was updated successfully, but these errors were encountered: