You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Quick explanation: Given multiple repository definitions within a single Project (different branches) and across Projects (same branch + different branches), the repository refresh task and API endpoint work less-than-optimally. This is compounded with higher number of repository definitions for a remote URL and greater git repo size.
Repo removed from cache when branch does not exist
Say there are there are three repos defined with the same git remote URL and different branches: my-repo:A ,my-repo:B, and my-repo:C. The branch for my-repo:B does not exist (e.g. someone tested something and forgot to remove it). The refresh process (on github event, or repo def modification) will go like this with a clean cache:
create local repo, fetch remote, checkout branch A
fail to checkout branch B, delete local repo
create local repo, fetch remote, checkout branch C
Possible solution: Don't delete recursively on unknown branch error.
All repo definitions refreshed individually, including duplicates
This one is trickier, and less of an issue if the previous issue is addressed. Say there are three Projects, each with the same repository defined my-repo:A. The refresh process will handle them all individually
This means the git operations for refreshing are repeated for each repo definition. The repo cache does help, assuming all of the defined repo branches exist.
Possible solution: A new refresh method that can coalesce on the repo URL + branch/commitID for the git operations, then update the relevant concord repo as needed.
Miscellaneous Ideas
GitHub events have a size (in KB) attribute. Perhaps we can leverage this to ignore background refreshing gigantic (threshold configurable in server.conf and/or policy).
Kill Switch: The ConcordSystem/concordTriggers/triggers repo can only disabled by doing it directly in the DB. There may be situations where disabling refreshing is desired or necessary. Perhaps it's worth having a slightly safer way of doing that?
The text was updated successfully, but these errors were encountered:
Alternative solution: use repository IDs as cache keys. Currently, we store the cached repositories as ${repositoryCache.cacheDir}/${urlEncode(repoUrl)} directories - one directory per Git repository. If we use a directory per repository ID then we can avoid situations when we have to switch back and forth between branches, tags or commits because a single Git URL being registered multiple times with different branch configurations.
The trade-off is the disk space, the repositories registered several times will be cached multiple times.
Quick explanation: Given multiple repository definitions within a single Project (different branches) and across Projects (same branch + different branches), the repository refresh task and API endpoint work less-than-optimally. This is compounded with higher number of repository definitions for a remote URL and greater git repo size.
Repo removed from cache when branch does not exist
Say there are there are three repos defined with the same git remote URL and different branches:
my-repo:A
,my-repo:B
, andmy-repo:C
. The branch formy-repo:B
does not exist (e.g. someone tested something and forgot to remove it). The refresh process (on github event, or repo def modification) will go like this with a clean cache:A
B
, delete local repoC
Possible solution: Don't delete recursively on unknown branch error.
All repo definitions refreshed individually, including duplicates
This one is trickier, and less of an issue if the previous issue is addressed. Say there are three Projects, each with the same repository defined
my-repo:A
. The refresh process will handle them all individuallyconcord/server/impl/src/main/java/com/walmartlabs/concord/server/org/project/RepositoryResourceV2.java
Line 64 in 46ea7f6
This means the git operations for refreshing are repeated for each repo definition. The repo cache does help, assuming all of the defined repo branches exist.
Possible solution: A new refresh method that can coalesce on the repo URL + branch/commitID for the git operations, then update the relevant concord repo as needed.
Miscellaneous Ideas
size
(in KB) attribute. Perhaps we can leverage this to ignore background refreshing gigantic (threshold configurable in server.conf and/or policy).ConcordSystem/concordTriggers/triggers
repo can only disabled by doing it directly in the DB. There may be situations where disabling refreshing is desired or necessary. Perhaps it's worth having a slightly safer way of doing that?The text was updated successfully, but these errors were encountered: