Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

libfetchers/git: Support export-ignore #9480

Merged
merged 17 commits into from
Jan 16, 2024
Merged

Conversation

roberth
Copy link
Member

@roberth roberth commented Nov 27, 2023

Motivation

Reintroduce export-ignore processing for fetchGit, which historically had this behavior, among other behaviors that I believe we do not want in fetchGit.

Change

  • Add exportIgnore parameter to the git fetcher
  • fetchTree disables this parameter by default. It can choose the
    simpler behavior, as it is still experimental.
  • fetchGit enables this parameter by default, to approximate legacy behavior.

TODO

  • I am was not confident that the filtering implementation is future proof. It should reuse a source filtering wrapper, such as FilteringInputAccessor from Lazy trees #6530. Not a straightforward backport because we don't have virtual bool isAllowed(const CanonPath & path) = 0; (yet?)
  • POSTPONED Test export-ignore in submodule (exportIgnore (Nix) parameter should be inherited)
    • basic tests
    • somewhat decent coverage (perhaps via DRY item below?)
  • Release note
  • Use _ext git attributes functions, or reimplement (no, half of semantics already hardcoded; not worth it). Need to pass rev
  • DRY workdir vs rev more
  • Discourage use, because this assumes that the feature is as stable as the tree data model and interpretation. I don't expect the git developer to apply the same level of rigor to it, although it's probably still pretty good.
  • POSTPONED Decide how the interaction works between submodules and export-ignore. Currently root ignores affect the submodule, and I'm not happy about that. Maybe just forbid it for now. fetchGit does not export-ignore when it submodules.

Context

Priorities

Add 👍 to pull requests you find important.

@github-actions github-actions bot added with-tests Issues related to testing. PRs with tests have some priority fetching Networking with the outside (non-Nix) world, input locking labels Nov 27, 2023
@roberth roberth added the bug label Nov 27, 2023
@roberth roberth marked this pull request as ready for review November 29, 2023 20:37
@roberth roberth requested a review from edolstra as a code owner November 29, 2023 20:37
@roberth
Copy link
Member Author

roberth commented Nov 29, 2023

@edolstra How do you want source filtering input accessors to work?

@edolstra
Copy link
Member

@roberth #9497 adds FilteringInputAccessor and AllowListInputAccessor. The latter is used by the Git input type for workdirs to allow access to the tracked files in the workdir only, like this:

        ref<InputAccessor> accessor =
            AllowListInputAccessor::create(
                makeFSInputAccessor(CanonPath(repoInfo.url)),
                std::move(repoInfo.workdirInfo.files),
                makeNotAllowedError(repoInfo.url));

Maybe something similar can be done for export-ignore, i.e. wrap the GitAccessor into an AllowListInputAccessor that contains the unfiltered paths. Or it can subclass FilteringInputAccessor (where isExportIgnored() would become the mandatory isAllowed() method).

@roberth roberth added the regression Something doesn't work anymore label Dec 1, 2023
Copy link

dpulls bot commented Dec 8, 2023

🎉 All dependencies have been resolved !

@roberth roberth self-assigned this Dec 8, 2023
@roberth roberth force-pushed the libfetchers-git-exportIgnore branch from 5d8c99c to a5f5744 Compare December 11, 2023 18:35
@roberth roberth marked this pull request as draft December 11, 2023 21:40
@roberth roberth mentioned this pull request Dec 29, 2023
@DavHau
Copy link
Member

DavHau commented Dec 30, 2023

There is also export-subst which we would need to re-implemented for compatibility.
I added a test for it at: #9391

@roberth roberth force-pushed the libfetchers-git-exportIgnore branch 3 times, most recently from 2f41fa4 to 1ea9930 Compare January 5, 2024 18:04
@roberth roberth marked this pull request as ready for review January 5, 2024 18:07
@@ -714,6 +729,11 @@ struct GitInputScheme : InputScheme

auto repoInfo = getRepoInfo(input);

if (getExportIgnoreAttr(input)
&& getSubmodulesAttr(input)) {
throw UnimplementedError("exportIgnore and submodules are not supported together yet");
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this combination isn't used in any existing expressions, because the previous implementation did not apply export-ignore when submodules were enabled.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apart from this check, why don't they work together yet? Given that submodules are also implemented using Git accessors, I would expect the filtering to just work for submodules.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added the comment:

            /* In this situation, we don't have a git CLI behavior that we can copy.
               `git archive` does not support submodules, so it is unclear whether
               rules from the parent should affect the submodule or not.
               When git may eventually implement this, we need Nix to match its
               behavior. */

@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/2024-01-08-nix-team-meeting-minutes-114/38156/1

Copy link
Member

@edolstra edolstra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comment, but LGTM overall.

doc/manual/rl-next/git-fetcher.md Outdated Show resolved Hide resolved
doc/manual/rl-next/git-fetcher.md Outdated Show resolved Hide resolved
doc/manual/rl-next/git-fetcher.md Outdated Show resolved Hide resolved
src/libexpr/primops/fetchTree.cc Outdated Show resolved Hide resolved
A Boolean parameter that specifies whether `export-ignore` from `.gitattributes` should be applied.
This approximates part of the `git archive` behavior.

Enabling this option is not recommended because it is unknown whether the Git developers commit to the reproducibility of `export-ignore` in newer Git versions.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit odd to say that enabling this option is not recommended, and then having the default as "enabled". Probably better to say "We recommend disabling this option because bla bla".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not enabled for fetchTree though; just for fetchGit.

(And we can't unrecommend fetchGit until this is stable.)

Comment on lines 676 to 678
std::string pathStr {path.rel()};
const char * pathCStr = pathStr.c_str();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to add a rel_c_str() to CanonPath. I.e.

    const char * raw_c_str() const
    { return path.c_str() + 1; }

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

src/libfetchers/git-utils.cc Outdated Show resolved Hide resolved
@@ -714,6 +729,11 @@ struct GitInputScheme : InputScheme

auto repoInfo = getRepoInfo(input);

if (getExportIgnoreAttr(input)
&& getSubmodulesAttr(input)) {
throw UnimplementedError("exportIgnore and submodules are not supported together yet");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apart from this check, why don't they work together yet? Given that submodules are also implemented using Git accessors, I would expect the filtering to just work for submodules.

src/libfetchers/git-utils.cc Outdated Show resolved Hide resolved
}

bool isAllowed(const CanonPath & path) override {
return !isExportIgnored(path);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the performance penalty for export-ignore lookups? Should this be cached? The lazy-trees branch has a CachingFilteringInputAccessor that caches isAllowed(), which might be useful here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It takes about 2× as long on my local nixpkgs clone. Not great.
Caching could reduce the overhead by a factor three, so we may expect no better than 1.3× from that.
Using the batch variation of the libgit2 call could bring down the overhead some more though. That makes the lookups more eager, which in turns means that it's not a great match with the CachingFilteringInputAccessor interface ("protected" interface part as it were).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • I've picked CachingFilteringInputAccessor from the diff, and that brought it down to a 15% penalty.

  • batch variation

    I misremembered while libgit2.org was down. It retrieves multiple attrs, not multiple files, so this is not a solution.

  • New idea: turn off filter when export-ignore is not present at all

    I've tried this, but it seems to make fetching Nixpkgs with fetchGit slower (local repo, no fetcher cache, probably low single digit %) That's probably because I did an extra traversal of the whole repo before returning the accessor.

    It's possible that a per-directory, on-the-fly approach does go below that 15%, but I don't have a lot of confidence right now.

  • Another possible strategy:
    Cache it in a table like (parent_dir, filename, allow_bool), which could be queried quite efficiently for either the whole repo or particular directories when lazy. Nonetheless, there's a risk that it's not much better, and this is a lot more work to pull of, so again I'd say not for this release.

Conclusion: will stop optimizing now.

DavHau and others added 6 commits January 12, 2024 15:31
...with the intention to prevent future regressions in fetchGit
Enabled for fetchGit, which historically had this behavior,
among other behaviors we do not want in fetchGit.

fetchTree disables this parameter by default. It can choose the
simpler behavior, as it is still experimental.

I am not confident that the filtering implementation is future
proof. It should reuse a source filtering wrapper, which I believe
Eelco has already written, but not merged yet.
Intentionally dumb change ahead of architectural improvements.
This will be needed because the accessor will be wrapped, and therefore
not be an instance of GitInputAccessor anymore.
@roberth roberth force-pushed the libfetchers-git-exportIgnore branch from efa08de to 469cf26 Compare January 12, 2024 15:03
@roberth roberth requested a review from edolstra January 12, 2024 21:57
@roberth roberth merged commit 2a3c5e6 into master Jan 16, 2024
15 checks passed
@roberth roberth deleted the libfetchers-git-exportIgnore branch January 16, 2024 22:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug fetching Networking with the outside (non-Nix) world, input locking regression Something doesn't work anymore with-tests Issues related to testing. PRs with tests have some priority
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

5 participants