-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
libfetchers/git: Support export-ignore #9480
Conversation
@edolstra How do you want source filtering input accessors to work? |
@roberth #9497 adds
Maybe something similar can be done for export-ignore, i.e. wrap the |
🎉 All dependencies have been resolved ! |
5d8c99c
to
a5f5744
Compare
There is also |
2f41fa4
to
1ea9930
Compare
@@ -714,6 +729,11 @@ struct GitInputScheme : InputScheme | |||
|
|||
auto repoInfo = getRepoInfo(input); | |||
|
|||
if (getExportIgnoreAttr(input) | |||
&& getSubmodulesAttr(input)) { | |||
throw UnimplementedError("exportIgnore and submodules are not supported together yet"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that this combination isn't used in any existing expressions, because the previous implementation did not apply export-ignore when submodules were enabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apart from this check, why don't they work together yet? Given that submodules are also implemented using Git accessors, I would expect the filtering to just work for submodules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added the comment:
/* In this situation, we don't have a git CLI behavior that we can copy.
`git archive` does not support submodules, so it is unclear whether
rules from the parent should affect the submodule or not.
When git may eventually implement this, we need Nix to match its
behavior. */
This pull request has been mentioned on NixOS Discourse. There might be relevant details there: https://discourse.nixos.org/t/2024-01-08-nix-team-meeting-minutes-114/38156/1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some comment, but LGTM overall.
A Boolean parameter that specifies whether `export-ignore` from `.gitattributes` should be applied. | ||
This approximates part of the `git archive` behavior. | ||
|
||
Enabling this option is not recommended because it is unknown whether the Git developers commit to the reproducibility of `export-ignore` in newer Git versions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a bit odd to say that enabling this option is not recommended, and then having the default as "enabled". Probably better to say "We recommend disabling this option because bla bla".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not enabled for fetchTree
though; just for fetchGit
.
(And we can't unrecommend fetchGit
until this is stable.)
src/libfetchers/git-utils.cc
Outdated
std::string pathStr {path.rel()}; | ||
const char * pathCStr = pathStr.c_str(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better to add a rel_c_str()
to CanonPath
. I.e.
const char * raw_c_str() const
{ return path.c_str() + 1; }
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@@ -714,6 +729,11 @@ struct GitInputScheme : InputScheme | |||
|
|||
auto repoInfo = getRepoInfo(input); | |||
|
|||
if (getExportIgnoreAttr(input) | |||
&& getSubmodulesAttr(input)) { | |||
throw UnimplementedError("exportIgnore and submodules are not supported together yet"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apart from this check, why don't they work together yet? Given that submodules are also implemented using Git accessors, I would expect the filtering to just work for submodules.
} | ||
|
||
bool isAllowed(const CanonPath & path) override { | ||
return !isExportIgnored(path); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the performance penalty for export-ignore lookups? Should this be cached? The lazy-trees branch has a CachingFilteringInputAccessor
that caches isAllowed()
, which might be useful here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It takes about 2× as long on my local nixpkgs clone. Not great.
Caching could reduce the overhead by a factor three, so we may expect no better than 1.3× from that.
Using the batch variation of the libgit2 call could bring down the overhead some more though. That makes the lookups more eager, which in turns means that it's not a great match with the CachingFilteringInputAccessor
interface ("protected
" interface part as it were).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
I've picked
CachingFilteringInputAccessor
from the diff, and that brought it down to a 15% penalty. -
batch variation
I misremembered while libgit2.org was down. It retrieves multiple attrs, not multiple files, so this is not a solution.
-
New idea: turn off filter when
export-ignore
is not present at allI've tried this, but it seems to make fetching Nixpkgs with
fetchGit
slower (local repo, no fetcher cache, probably low single digit %) That's probably because I did an extra traversal of the whole repo before returning the accessor.It's possible that a per-directory, on-the-fly approach does go below that 15%, but I don't have a lot of confidence right now.
-
Another possible strategy:
Cache it in a table like(parent_dir, filename, allow_bool)
, which could be queried quite efficiently for either the whole repo or particular directories when lazy. Nonetheless, there's a risk that it's not much better, and this is a lot more work to pull of, so again I'd say not for this release.
Conclusion: will stop optimizing now.
...with the intention to prevent future regressions in fetchGit
Enabled for fetchGit, which historically had this behavior, among other behaviors we do not want in fetchGit. fetchTree disables this parameter by default. It can choose the simpler behavior, as it is still experimental. I am not confident that the filtering implementation is future proof. It should reuse a source filtering wrapper, which I believe Eelco has already written, but not merged yet.
Intentionally dumb change ahead of architectural improvements.
This will be needed because the accessor will be wrapped, and therefore not be an instance of GitInputAccessor anymore.
Also fingerprint and some preparatory improvements. Testing is still not up to scratch because lots of logic is duplicated between the workdir and commit cases.
efa08de
to
469cf26
Compare
Co-authored-by: Eelco Dolstra <[email protected]>
Defensively because isRoot() is also defensive.
Motivation
Reintroduce export-ignore processing for fetchGit, which historically had this behavior, among other behaviors that I believe we do not want in fetchGit.
Change
exportIgnore
parameter to the git fetcherfetchTree
disables this parameter by default. It can choose thesimpler behavior, as it is still experimental.
fetchTree
too!)Nix.fetchGit
surprisingly usesexport-ignore
git attribute #7195fetchGit
enables this parameter by default, to approximate legacy behavior.TODO
amwas not confident that the filtering implementation is future proof. It should reuse a source filtering wrapper, such asFilteringInputAccessor
from Lazy trees #6530. Not a straightforward backport because we don't havevirtual bool isAllowed(const CanonPath & path) = 0;
(yet?)exportIgnore
(Nix) parameter should be inherited)_ext
git attributes functions,or reimplement(no, half of semantics already hardcoded; not worth it). Need to passrev
fetchGit
does not export-ignore when it submodules.Context
Priorities
Add 👍 to pull requests you find important.