-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce errors in workers #3962
Reduce errors in workers #3962
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
return basic_retry_wrapper( | ||
make_slack_api_rate_limited(_make_slack_api_call_logged(call)) | ||
)(**kwargs) | ||
return basic_retry_wrapper(make_slack_api_rate_limited(call))(**kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we want to completely remove the logging? or should we truncate or shorten it? I can see us wanting to have debug logging here occasionally for troubleshooting purposes, but it definitely is not helpful to be at the current level of spam by default.
# Then we upsert the document's external permissions in postgres | ||
try: | ||
# Add the users to the DB if they don't exist | ||
batch_add_ext_perm_user_if_not_exists( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's more appropriate for the function itself to be more resilient to the type of error that is occurring here rather than the caller being responsible. It should be entirely possible for this function to "just work" and it seems better if we write it as such.
Couple of ideas for batch_add_ext_perm_user_if_not_exists:
- Catch the IntegrityError if the batch commit fails and go one by one while deliberately ignoring the failure we know can happen.
- Use Postgres ON CONFLICT DO NOTHING ... could work?
5a709b5
to
2c36dd1
Compare
* reduce errors in workers * quick nit * update * nit
Description
This is a general pass at reducing the number of unnecessary / spammy error logs in our backgrounds pods. Each exception has a link to a relevant example in Grafana + the relevant change
Primary Worker
Tenant ID exceptions
Fixed miscellaneous tenant ID exceptions
Monitoring worker
Entity exception
Heavy Worker
Dupe external email issue
Error ended up being that we have concurrent tasks creating the same external user email
Some pretty excessive Slack spam https://g-dfa4b7be0c.grafana-workspace.us-east-2.amazonaws.com/goto/Jnc1ujFHR?orgId=1)
Light Worker
View in Grafana
How Has This Been Tested?
Backporting (check the box to trigger backport action)
Note: You have to check that the action passes, otherwise resolve the conflicts manually and tag the patches.