-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
If applied, this will add options including recursive functionality to File Client Get(Gitlab), and add Tree Client with create,get,and list functionalities #153
Conversation
github/client_repository_file.go
Outdated
} | ||
subdirectoryPath := fmt.Sprintf("%v%v/", path, *file.Name) | ||
// recursive call for child directories to get their content | ||
childFiles, err := c.Get(ctx, subdirectoryPath, branch, optFns...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a pretty expensive operation (each file requested is a separate API call).
You are not providing any way to limit it, and there's no way to know up front how many files will be fetched?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using the repository content get in github API there is no option to know sub-directory contents, but I believe a call can be added to fetch the tree to know how many files will be fetched.
But then what decides the limitations or the max number of files/sub-directories to be fetched? will an extra parameter of the max tree length be a good idea?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- You could provide a
maxFiles
andmaxDepth
options. - You could accept a
func(files *gitprovider.CommitFile)
that would let user count the number of files and exit early.
github/client_repository_file.go
Outdated
} | ||
subdirectoryPath := fmt.Sprintf("%v%v/", path, *file.Name) | ||
// recursive call for child directories to get their content | ||
childFiles, err := c.Get(ctx, subdirectoryPath, branch, optFns...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- You could provide a
maxFiles
andmaxDepth
options. - You could accept a
func(files *gitprovider.CommitFile)
that would let user count the number of files and exit early.
5ae2be2
to
b92b5fc
Compare
PR updated to only add recursive option in gitlab where it is passed to the gitlab API call so no recursive API calls are performed. |
github/client_repository_tree.go
Outdated
return &responseTreeInfo, nil | ||
} | ||
|
||
// Get returns a tree |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Get returns a tree | |
// Get returns a single tree using the SHA1 value for that tree. | |
// | |
// The number of files is limited to 100,000 after which `Truncated` will be set to true on the returned | |
// TreeInfo. |
github/client_repository_tree.go
Outdated
return nil, err | ||
} | ||
|
||
treeEntries := make([]*gitprovider.TreeEntry, 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would benefit from using len(githubTree.Entries)
as the size, because you know up front how many entries there are?
github/client_repository_tree.go
Outdated
return treeEntries, nil | ||
} | ||
|
||
func createTreeEntry(githubTreeEntry github.TreeEntry) *gitprovider.TreeEntry { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't used?
It looks like it could be used in both Get and List?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
initially it was for that reason, but the treeEntries in both Get and List are of different types. I will simply remove it.
gitprovider/options.go
Outdated
|
||
func (opts *FilesGetOptions) ApplyFilesGetOptions(target *FilesGetOptions) { | ||
// Go through each field in opts, and apply it to target if set | ||
if opts.Recursive == true { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could simplify this to...
target.Recursive = opts.Recursive
Because if target.Recursive
is true
then it'll be set to true
, otherwise it would be set to false
?
@@ -36,11 +36,18 @@ type FileClient struct { | |||
} | |||
|
|||
// Get fetches and returns the contents of a file from a given branch and path | |||
func (c *FileClient) Get(_ context.Context, path, branch string) ([]*gitprovider.CommitFile, error) { | |||
func (c *FileClient) Get(ctx context.Context, path, branch string, optFns ...gitprovider.FilesGetOption) ([]*gitprovider.CommitFile, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
get you change description here too?
github/client_repository_file.go
Outdated
@@ -34,13 +34,18 @@ type FileClient struct { | |||
ref gitprovider.RepositoryRef | |||
} | |||
|
|||
// Get fetches and returns the contents of a file from a given branch and path | |||
func (c *FileClient) Get(ctx context.Context, path, branch string) ([]*gitprovider.CommitFile, error) { | |||
// Get fetches and returns the contents of a file or directory from a given branch and path with possible options of FilesGetOption |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After reading the API, It says it will always returns a directory content if the path reference a directory. So I think that the description should say that we return the content of a file, because we expect a directory path, right? We should also have a statement of what path do we expect (file or directory or both).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will update it with the content either being a file content given a file path, or content of files in a directory given the directory path.
github/client_repository_tree.go
Outdated
ref gitprovider.RepositoryRef | ||
} | ||
|
||
// Create creates,updates,deletes a tree |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does it do that?
If true, the function name is wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gitlab/client_repository_tree.go
Outdated
|
||
// List files (blob) in a tree | ||
func (c *TreeClient) List(ctx context.Context, sha string, recursive bool) ([]*gitprovider.TreeEntry, error) { | ||
return nil, fmt.Errorf("error listing tree items %s. not implemented in gitlab yet", sha) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GitLab forces pagination on this API and limits to 100 per_page, however it does allow you to start at a subpath of the repo: https://docs.gitlab.com/ee/api/repositories.html
Proposal
In other places in this repository we auto-paginate results so we could follow that pattern here too.
Notes
- For our particular case of deleting all the files in a subpath that should run on a cluster, the worst case could be 10000+ files, which is 100 pagination requests and will probably timeout. Asking the user to manually remove those files would be the workaround and seems reasonable. In the future we could add an
...Options
param to this API to allow more control over pagination for other users of this library with other use cases. - I would guess the average case would be more on the order of < 100 files which we could handle in a single request or two.
- As we can scope down to a subpath when requesting tree-entries it won't be the entire repo, just the path which makes things a bit better.
- As Gitlab does not have a tree api for doing cleverer manipulations, the only other options to perform recursive deletion operations efficiently for gitlab seems to be:
- Grab file archive (limited to 5 requests / minute on gitlab.com)
- Git clone.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In other places in this repository we auto-paginate results so we could follow that pattern here too.
This is the main reason I recommend against using go-git-providers, it has no control over pages being loaded.
This pattern is probably a DoS risk, because you can't control how many pages might be loaded for any resource.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- with the current limitation mentioned in this issue and sticking to the limit of 100 files when gitlab and giving a warning of this limitation so that if there are more files in the specific directory they are to be deleted manually, is this a possible temporary solution?
- how about the number of pages loaded for a resource to be limited with a max_pages flag passed
github/client_repository_tree.go
Outdated
"strings" | ||
|
||
"github.com/fluxcd/go-git-providers/gitprovider" | ||
"github.com/google/go-github/v42/github" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems to be a clash here, might have to merge in main to your fork and update this to v45 now I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated, thanks
…ality to FileClient().Get(..) Signed-off-by: Rana Tarek Hassan <[email protected]>
…e Client Get Signed-off-by: Rana Tarek Hassan <[email protected]>
…d recursive options in Files Get Signed-off-by: Rana Tarek Hassan <[email protected]>
…ursive calls in Files Get Add more test cases for recursive, maxDepth, maxFiles options in Files Get Signed-off-by: Rana Tarek Hassan <[email protected]>
…ntegration tests Signed-off-by: Rana Tarek Hassan <[email protected]>
Add create,get, and list functionalities to tree client of github not implemented in gitlab and stash clients Signed-off-by: Rana Tarek Hassan <[email protected]>
LICENSE typo Co-authored-by: Kevin McDermott <[email protected]> Signed-off-by: Rana Tarek Hassan <[email protected]>
…om code review Signed-off-by: Rana Tarek Hassan <[email protected]>
Co-authored-by: souleb <[email protected]> Signed-off-by: Rana Tarek Hassan <[email protected]>
It will add go docs based on suggestions from code review Signed-off-by: Rana Tarek Hassan <[email protected]>
…n gitlab client It will add path parameter used in github tree list to optionally filter files in a certain path Signed-off-by: Rana Tarek Hassan <[email protected]>
Signed-off-by: Rana Tarek Hassan <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking really good!
We should have a simple test for tree.Create
I think. But alternatively we could remove that interface for now and do it in another PR one day if you're keen to get this landed. wdyt?
Expect(treeEntries).To(HaveLen(3)) | ||
for ind, treeEntry := range treeEntries { | ||
Expect(treeEntry.Path).To(Equal(*files[ind].Path)) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
github/client_repository_tree.go
Outdated
} | ||
|
||
// Create creates,updates,deletes a tree | ||
func (c *TreeClient) Create(ctx context.Context, tree *gitprovider.TreeInfo) (*gitprovider.TreeInfo, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't really have any tests for this function. Would that be easy to write? The other option would be to remove the Create
for now as we don't really use it and make this a more Read-only (get/list) PR. wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think making it Read-only get/list is a good option, because the functionality needed so far from the create is implemented using the commit client where the tree is updated with the new tree structure
Signed-off-by: Rana Tarek Hassan <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💯
Awesome job, thanks for being patient with all the back and forth on this one.
Very clean ✨ and very useful functionality to have!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
If applied, this commit will add options including recursive functionality to FileClient().Get(..) in Gitlab
If applied, this commit will add Tree Client with create,get,and list functionalities
Signed-off-by: Rana Tarek Hassan [email protected]
Description
Test results
Github test results:
Gitlab test results: