api_skyportal: no_retry version #442

Theodlz · 2024-10-09T15:39:45Z

create a no-retry version of the api_skyportal method, to avoid running into concurrency issues when the session retries sending a request to SkyPortal (because it took longer than the specified time out to respond to the client) when the initial one is still being processed. This is fine and can safely happen for pretty much everything, except follow-up requests.

If we tell SkyPortal to trigger an instrument, after 5 seconds without a response (the default timeout) decide to resent that request but SkyPortal was almost done (and is still) processing it, we might end up sending 2 requests at the same time and creating duplication issues.

With this PR, we can use much longer timeouts (here we try 30 seconds) while avoiding any retries when sending a follow-up request.

PS:
We can still run into a concurrency issue of course (where 2 alerts of the same object get processed at the same time, and worker B tries to trigger on alert 2 at the same time as Worker A is triggering on alert 1) and we already have logic in SkyPortal to avoid that, but if the distant server SkyPortal is sending the request to is taking too long is becomes a risk. In a future set of SkyPortal+Kowalski+Fritz PRs, we can consider posting the request in the DB in a "processing" state as soon as possible so that even before we start waiting for a distance server to answer, other processed can know that we are actively trying to send an identical request and they should cancel sending anything.

…ng into concurrency issues when the session retries sending a request to SkyPortal that might still be processing the first one. We use it paired with a longer timeout when sending followup requests

Theodlz · 2024-10-09T15:40:25Z

@nabeelre just so you know that issue is being addressed.

Theodlz · 2024-10-09T15:41:03Z

we might want to keep using a shorter timeout though (maybe 10 s) to try to minimize these concurrently issues when 2 alerts are being triggered on at once (for the same object).

mcoughlin

LGTM

nabeelre · 2024-10-09T17:48:31Z

kowalski/alert_brokers/alert_broker.py

                                "POST",
                                "/api/followup_request",
                                passed_filter["auto_followup"]["data"],
+                                timeout=30,


Jamie said last night's request took 42 seconds to process. With a 30 second timeout we'd mark the trigger as failed via timeout but it would actually succeed later? Maybe up this to 60 seconds to be conservative?

I'm honestly not quite sure where his number came from, since we got a response within 5 to 6 seconds, and definitely didn't wait 42 seconds. I would proceed with these numbers while we investigate this further w/ Jamie

create a no retry version of the api_skyportal method, to avoid runni…

24bebc5

…ng into concurrency issues when the session retries sending a request to SkyPortal that might still be processing the first one. We use it paired with a longer timeout when sending followup requests

Theodlz self-assigned this Oct 9, 2024

Theodlz requested a review from mcoughlin October 9, 2024 15:39

mcoughlin approved these changes Oct 9, 2024

View reviewed changes

nabeelre reviewed Oct 9, 2024

View reviewed changes

Theodlz merged commit 1d08773 into skyportal:main Oct 14, 2024
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

api_skyportal: no_retry version #442

api_skyportal: no_retry version #442

Theodlz commented Oct 9, 2024

Theodlz commented Oct 9, 2024

Theodlz commented Oct 9, 2024

mcoughlin left a comment

nabeelre Oct 9, 2024

Theodlz Oct 14, 2024

api_skyportal: no_retry version #442

api_skyportal: no_retry version #442

Conversation

Theodlz commented Oct 9, 2024

Theodlz commented Oct 9, 2024

Theodlz commented Oct 9, 2024

mcoughlin left a comment

Choose a reason for hiding this comment

nabeelre Oct 9, 2024

Choose a reason for hiding this comment

Theodlz Oct 14, 2024

Choose a reason for hiding this comment