-
Notifications
You must be signed in to change notification settings - Fork 682
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix add-on store repository getting removed without internet #5717
base: main
Are you sure you want to change the base?
Conversation
Currently, when a git command error happens in `pull()`, we declare the repository as corrupt. Subsequent system autofix runs then execute the reset resolution, which essentially removes the git repository from the system. In situations where the Internet fails right between the last Supervisor connectivity check and the add-on store repository update (the connectivity checks are throttled to once every 10 minutes while connectivity is considered good), or if the outage is only partial (e.g. reaching connectivity check works but the store repository is not reachable), this leads to a git command error which declares the repository as corrupt just as well, and ultimately leads to the removal of the add-on store repository from the local system. Handle the git fetch errors separately. This prevents add-on store repository from being declared corrupted just because the repository is not reachable.
📝 WalkthroughWalkthroughThe pull method in the GitRepo class was modified to improve its robustness. An additional error handling block was introduced to catch Changes
Sequence Diagram(s)sequenceDiagram
participant Caller
participant GitRepo
participant Remote
Caller ->> GitRepo: call pull()
GitRepo ->> Remote: perform fetch operation
alt Fetch fails (git.CommandError)
GitRepo -->> GitRepo: log warning
GitRepo -->> Caller: raise StoreGitError
else Fetch succeeds
GitRepo ->> GitRepo: assign branch variable within try block
GitRepo -->> Caller: return success response
end
✨ Finishing Touches
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🔭 Outside diff range comments (1)
supervisor/store/git.py (1)
121-129
: 🛠️ Refactor suggestionReturn value consistency issue in early returns.
The
pull()
method returns a boolean value indicating whether the repository changed, but the early return statements (when the lock is already acquired or the repo is None) don't return any value. This could lead toNone
being returned instead of a boolean.Fix this by ensuring consistent return values:
if self.lock.locked(): _LOGGER.warning("There is already a task in progress") - return + return False if self.repo is None: _LOGGER.warning("No valid repository for %s", self.url) - return + return False
🧹 Nitpick comments (2)
supervisor/store/git.py (2)
141-143
: Improved error handling for git fetch operations.This change correctly distinguishes between connectivity errors during fetch operations and actual repository corruption issues. By logging a warning instead of creating a corruption issue, temporary network issues won't trigger unnecessary repository resets.
Consider enhancing the error message to make it clearer that this could be a temporary network issue. For example:
- _LOGGER.warning("Wasn't able to update %s repo: %s.", self.url, err) + _LOGGER.warning("Network/connectivity issue while updating %s repo: %s. Will retry later.", self.url, err)
123-188
: Exception hierarchy could be improved.Both network issues and corruption issues raise the same
StoreGitError
exception type, making it difficult for callers to distinguish between temporary network issues and actual corruption. Consider creating a more specific exception type for network-related errors.You could introduce a new exception like
StoreNetworkError
that extendsStoreGitError
:class StoreNetworkError(StoreGitError): """Error when network issues prevent Git operations."""Then update the fetch error handling:
except git.CommandError as err: _LOGGER.warning("Wasn't able to update %s repo: %s.", self.url, err) - raise StoreGitError() from err + raise StoreNetworkError() from errThis would allow callers to handle network issues differently if needed.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
supervisor/store/git.py
(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (4)
- GitHub Check: Build armv7 supervisor
- GitHub Check: Build armhf supervisor
- GitHub Check: Build aarch64 supervisor
- GitHub Check: Run tests Python 3.13.2
🔇 Additional comments (2)
supervisor/store/git.py (2)
145-146
: Good restructuring of the branch access logic.Moving the branch access into a separate try block is a smart improvement. This ensures that if the fetch operation fails, the code won't attempt to access the potentially invalid branch, which could lead to further errors.
121-129
:✅ Verification successful
Verify the overall error handling flow.
The changes look good, but let's verify that the implementation properly prevents marking the repository as corrupted when it's just a network issue. The critical improvement is that network errors during
fetch
now avoid triggering theIssueType.CORRUPT_REPOSITORY
resolution, which would have incorrectly marked a repository with network issues as corrupted.Also applies to: 182-188
🏁 Script executed:
#!/bin/bash # Verify that network errors during fetch won't create corruption issues # Look for any other places in the codebase that might create CORRUPT_REPOSITORY issues echo "Places where CORRUPT_REPOSITORY is created:" rg "IssueType.CORRUPT_REPOSITORY" --type py -C 2 # Check for any tests of this functionality echo -e "\nTests related to this functionality:" rg "test.*pull" --type pyLength of output: 5474
Confirmed: Network errors during fetch are now handled correctly
After reviewing the changes and verifying with our grep outputs and tests, the error handling flow in
supervisor/store/git.py
(including the related block at lines 182–188) properly distinguishes between genuine repository corruption and transient network errors. The implementation now ensures that if a network issue occurs during the fetch, it does not trigger theIssueType.CORRUPT_REPOSITORY
resolution. The grep results indicate that the creation ofIssueType.CORRUPT_REPOSITORY
is isolated to the intended error paths, and related tests (e.g., intests/docker/test_interface.py
) further confirm the correct behavior.
Proposed change
Currently, when a git command error happens in
pull()
, we declare the repository as corrupt. Subsequent system autofix runs then execute the reset resolution, which essentially removes the git repository from the system.In situations where the Internet fails right between the last Supervisor connectivity check and the add-on store repository update (the connectivity checks are throttled to once every 10 minutes while connectivity is considered good), or if the outage is only partial (e.g. reaching connectivity check works but the store repository is not reachable), this leads to a git command error which declares the repository as corrupt just as well, and ultimately leads to the removal of the add-on store repository from the local system.
Handle the git fetch errors separately. This prevents add-on store repository from being declared corrupted just because the repository is not reachable.
Fixes: #5668
Type of change
Additional information
Checklist
ruff format supervisor tests
)If API endpoints or add-on configuration are added/changed:
Summary by CodeRabbit