Do not write to the cache file if the run status is not final #611

alarthast · 2024-10-09T14:55:11Z

Currently Bennett Bot caches the status of workflow runs after each call to the GitHub API along with the timestamp
For workflow runs with "running" or "queued" status at the time of the API call, their status will be updated later down the line, but this is oblivious to Bennett Bot
Skip cache-ing these pending statuses so that Bennett Bot will see these runs again in its next API call

rebkwok · 2024-10-10T10:22:32Z

workspace/workflows/jobs.py

+
+        pending = "running" in conclusions.values() or "queued" in conclusions.values()
+        if not pending:  # Only write cache to file if the status is final
+            self.write_cache_to_file()


I wonder if we need to rethink the caching strategy a bit. This means that if any workflow is running/queued, we have to fetch conclusions for every workflow again.
Actually, we already have fill_in_conclusions_for_missing_ids which is intended to update workflows that weren't present at the last retrieval. So if we just skip writing that specific workflow to the cache, it should get registered as a missing id at the next run, and re-fetched.
i.e. change the self.cache assignment a couple of lines up from here to:

self.cache = { "timestamp": timestamp, # only cache conclusions that are not pending or queued "conclusions": {str(k): v for k, v in conclusions.items() if v not in ["pending", "queued"]}, }

I think the use of the word "missing" was a bit misleading here. "Missing" is relative to the API response, so fill_in_conclusions_for_missing_ids uses the cache - a better name for it would be use_cached_conclusions_for_missing_ids (or something similar).

(To be discussed in a call)

Conclusion from call: The proposed changes are the way to go for now because the GitHub workflow runs API does not make it easy to:

Check against the updated_at timestamp (It only allows created as a query parameter)

Target a specific workflow ID (The GraphQL API on the other hand does allow this - see the experiment-graphql branch, but does not allow us to restrict queries to the main branch so it can't be used for our purpose)

Merging for now, although it would be great if we found a way in the future to remove the duplicate work Bennett Bot is doing as a result of this merge.

Do not write to the cache file if the run status is not final

8add4a0

rebkwok reviewed Oct 10, 2024

View reviewed changes

rebkwok approved these changes Oct 10, 2024

View reviewed changes

alarthast merged commit aee24a4 into main Oct 10, 2024
6 checks passed

alarthast deleted the skip-cache-writing-if-running branch October 10, 2024 11:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Do not write to the cache file if the run status is not final #611

Do not write to the cache file if the run status is not final #611

alarthast commented Oct 9, 2024

rebkwok Oct 10, 2024

alarthast Oct 10, 2024

alarthast Oct 10, 2024

Do not write to the cache file if the run status is not final #611

Do not write to the cache file if the run status is not final #611

Conversation

alarthast commented Oct 9, 2024

rebkwok Oct 10, 2024

Choose a reason for hiding this comment

alarthast Oct 10, 2024

Choose a reason for hiding this comment

alarthast Oct 10, 2024

Choose a reason for hiding this comment