Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not write to the cache file if the run status is not final #611

Merged
merged 1 commit into from
Oct 10, 2024

Conversation

alarthast
Copy link
Contributor

  • Currently Bennett Bot caches the status of workflow runs after each call to the GitHub API along with the timestamp
  • For workflow runs with "running" or "queued" status at the time of the API call, their status will be updated later down the line, but this is oblivious to Bennett Bot
  • Skip cache-ing these pending statuses so that Bennett Bot will see these runs again in its next API call


pending = "running" in conclusions.values() or "queued" in conclusions.values()
if not pending: # Only write cache to file if the status is final
self.write_cache_to_file()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we need to rethink the caching strategy a bit. This means that if any workflow is running/queued, we have to fetch conclusions for every workflow again.
Actually, we already have fill_in_conclusions_for_missing_ids which is intended to update workflows that weren't present at the last retrieval. So if we just skip writing that specific workflow to the cache, it should get registered as a missing id at the next run, and re-fetched.
i.e. change the self.cache assignment a couple of lines up from here to:

self.cache = {
    "timestamp": timestamp,
    # only cache conclusions that are not pending or queued
    "conclusions": {str(k): v for k, v in conclusions.items() if v not in ["pending", "queued"]},
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the use of the word "missing" was a bit misleading here. "Missing" is relative to the API response, so fill_in_conclusions_for_missing_ids uses the cache - a better name for it would be use_cached_conclusions_for_missing_ids (or something similar).

(To be discussed in a call)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Conclusion from call: The proposed changes are the way to go for now because the GitHub workflow runs API does not make it easy to:

  • Check against the updated_at timestamp (It only allows created as a query parameter)
  • Target a specific workflow ID (The GraphQL API on the other hand does allow this - see the experiment-graphql branch, but does not allow us to restrict queries to the main branch so it can't be used for our purpose)

Merging for now, although it would be great if we found a way in the future to remove the duplicate work Bennett Bot is doing as a result of this merge.

@alarthast alarthast merged commit aee24a4 into main Oct 10, 2024
6 checks passed
@alarthast alarthast deleted the skip-cache-writing-if-running branch October 10, 2024 11:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants