Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce libgit2 #9240

Merged
merged 23 commits into from
Nov 20, 2023
Merged

Introduce libgit2 #9240

merged 23 commits into from
Nov 20, 2023

Conversation

edolstra
Copy link
Member

@edolstra edolstra commented Oct 25, 2023

Motivation

This makes most of GitInputScheme use libgit2 instead of the "git" binary. Extracted from the lazy-trees branch, where GitInputAccessor is used to provide direct access to git repos without copying them to the Nix store first.

Depends on #9239.

Context

Priorities

Add 👍 to pull requests you find important.

@github-actions github-actions bot added with-tests Issues related to testing. PRs with tests have some priority fetching Networking with the outside (non-Nix) world, input locking labels Oct 25, 2023
Copy link

dpulls bot commented Oct 31, 2023

🎉 All dependencies have been resolved !

…or()

This is for graceful migration to lazy-trees fetchers (which are all
accessor-based). Eventually fetch() will be removed.
This replaces most calls to the "git" binary with libgit2.
Instead of making a complete copy of the repo, fetching the
submodules, and writing the result to the store (which is all
superexpensive), we now fetch the submodules recursively using the Git
fetcher, and return a union accessor that "mounts" the accessors for
the submodules on top of the root accessor.
@edolstra edolstra marked this pull request as ready for review October 31, 2023 13:56
@lovesegfault

This comment was marked as resolved.

@edolstra

This comment was marked as resolved.

@knedlsepp

This comment was marked as resolved.

@lovesegfault

This comment was marked as resolved.

@Ericson2314

This comment was marked as resolved.

@Ericson2314

This comment was marked as resolved.

src/libfetchers/git-utils.cc Outdated Show resolved Hide resolved
src/libfetchers/git-utils.cc Outdated Show resolved Hide resolved
src/libfetchers/git.cc Outdated Show resolved Hide resolved
src/libfetchers/union-input-accessor.cc Outdated Show resolved Hide resolved
src/libfetchers/union-input-accessor.cc Outdated Show resolved Hide resolved
@fricklerhandwerk
Copy link
Contributor

fricklerhandwerk commented Nov 6, 2023

Triaged in Nix team meeting:

  • @Ericson2314: would like to land Git hashing first
    • @edolstra: that seems to have no overlap
    • @Ericson2314: if we want Git fetching to be tree-hash oriented, we have to align all the internal book keeping
    • @thufschmitt: it seems rather backwards to make the small change depend on the large new feature
    • @edolstra: we would only have to make Git hashing end-to-end if we actually stored the fetched result under Git hashes, but we currently don't
    • @roberth: sounds great but doesn't sound like it has to be blocking each other
  • @Ericson2314: worried that if care isn't taken with the Git hashes that I'll have to re-implement the libgit part
    • (some discussion of the trust model for hashes)
    • @edolstra: lazy trees currently will clone the entire repo and then resolve subdirs based on tree hashes using libgit
      • it can in principle become more lazy in the future, and fetch subdirs on demand
    • @Ericson2314: alright, since the whole graph is offline, hash validation can be done locally
  • @thufschmitt: wonder how to roll put changes like that. it's a big rewrite of fetchers, so it's easy to sneak in regressions
    • making it experimental would be a pain as that would make for a lot duplication
      • @roberth: it may be worth it though
      • @edolstra: making it experimental could result in no one using it
      • another option is to run things in parallel, but for git fetching it may be too costly
        • @roberth: given we're intending to fix the smudging bug in a backwards-incompatible way that won't work anyway
    • @Ericson2314: personally like the way of running things in parallel
      • @edolstra: the point of this is to reduce the diff to the lazy trees branch, and we don't want to maintain two versions
        • @thufschmitt: we don't want to maintain both, just prevent regressions and then drop the old one once it's safe
        • @edolstra: similarly signature verification was merged the other day, and keeping both would mean the versions to diverge
        • @thufschmitt as opposed to the ssh vs. ssh-ng situation (where ssh-ng never reached feature parity) we're not at risk of stalling out
  • shall we roll it out and risk breakage or run the two versions in parallel until
    • @Ericson2314: hate the old one, love the new one; we may run into people discovering bugs in the old implementation that they rely on
      • must get rid of shelling out at all cost, as fast as possible
    • @roberth: should be okay with releasing the new one alone
    • @edolstra: want to get rid of the old one fast
    • @tomberek: the question is if the bugs we expect are such that we can fix them in a timely fashion. then we can keep the new code and proceed. if we're concerned that's not possible, the alternative to keep both becomes more appropriate in terms of cost-benefit ratio
      • @roberth: we can expect some hashes to change. that would be an appropriate breakage for an experimental feature
        • @edolstra: what needs to be done in this PR is testing the current smudging behavior to ensure there won't be a regression
  • @roberth: could do a call for testing when we deem it good enough
    • @edolstra would prefer that. get it out with the next release and fix bugs, at worst re-introduce the old fetcher
    • @Ericson2314: if there is a bug that can't be fixed easily we can revert, and possibly keep the two variants side by side
    • @thufschmitt: some time ago we had the idea to ask a large consumer such as NixOS or Home Manager to run their code against master, we could try that out here
    • @tomberek: one could run some percentage of install actions with master
      • @edolstra: that is something the Determinate Systems installer tests are doing in fact, they pick versions at random
  • @Ericson2314 summary: the status quo is that fetchGit is stable, and the only serious backwards-incompatible change we expect is fixing the unspecified, inconsistent smudging behavior. minor breakages can be fixed up later
    • @tomberek: there are also potential changes to submodules behavior, and we had those in the future. it may be difficult to work around breakages there
      • @thufschmitt: hash changes will be a real bad breakage for users, it's breaking a fundamental stability property
      • @edolstra: don't expect breakages due to submodules
      • @tomberek: how can we increase our confidence that there won't be breakages? can we do a large evaluation that depends on fetchGit and check the outcome is the same?
        • @thufschmitt: we have a Nixpkgs regression test, but Nixpkgs uses build-time fetching
        • @tomberek: maybe one of the language support tools? crane, naersk...
  • @roberth: we could also have a completely new fetcher type and don't change the old one
    • @edolstra: possible, but that would result in keeping the two fetchers around forever
  • @roberth: another thing we should consider is very clearly communicating the rationale for the decision and informing users about the consequences
  • options:
    • running both code paths side by side
      • will predictably produce different results in some cases
      • also expensive in a few ways
    • call for testing against master
    • including large consumers: NixOS, Home Manager, language support tooling ...
    • only add the new code to fetchTree and keep fetchGit as is
      • keeps fetchGit stable bug for bug
    • make the new code opt-out
    • add the new code opt-in as an experimental feature
    • release new code, fix forward
  • decision:
    • merge when ready
    • call for testing master on Discourse
    • fix up regressions until the next release
    • release if no severe issues remain and fix the minor ones forward
    • otherwise fall back to having both implementations side-by-side (opt-in or opt-out the new one), and see from there

@Ericson2314 Ericson2314 self-requested a review November 6, 2023 18:27
@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/2023-11-03-nix-team-meeting-minutes-100/35245/1

@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/2023-11-06-nix-team-meeting-minutes-101/35247/1

std::vector<Submodule> submodules;
};

virtual WorkdirInfo getWorkdirInfo() = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about bare repos?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK bare repos just work. There is a test in functional/flakes/flakes.sh.

Copy link
Member

@roberth roberth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No significant issues. LGTM if you could implement/address the suggestions (this review + previous one + John's).

.expired = !locked && (settings.tarballTtl.get() == 0 || timestamp + settings.tarballTtl < time(0)),
.infoAttrs = jsonToAttrs(nlohmann::json::parse(infoJSON)),
};
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This cache behavior is hard to follow, but I suggest we fix that forward.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar for the many interactions with the Inputs object. I can't claim to fully understand how the attributes "evolve", let alone prove to myself that requirements about caching and reproducibility are upheld, but I figure I will have a better understanding after completing something based on

Not a blocker for merging this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may consider using an existing wrapper, such as https://github.com/AndreyG/libgit2cpp, but this can be done later.

src/libfetchers/union-input-accessor.cc Outdated Show resolved Hide resolved
Copy link
Member

@roberth roberth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question about shallow+caching. Otherwise LGTM

Comment on lines +563 to +564
if (isShallow && !getShallowAttr(input))
throw Error("'%s' is a shallow Git repository, but shallow repositories are only allowed when `shallow = true;` is specified", repoInfo.url);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this check needed?
What if the repo was fetched shallowly and then later we need it without shallow?

If it's part of the cache key that would be a solution. I'd consider that a performance bug still, but could be fixed after merge.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is much likely a bug in this (already before this PR). If:

  • a shallow remote is fetched with shallow = true
  • later the remote unshallows
  • another revision is fetched from the same remote with shallow = false
    ...then this error would trigger and the only way to fix it is deleting the nix cache directory.

This bug could only ever trigger on remote repos, as local repos don't get cached via ~/.cache.
One could argue that this bug is very unlikely to trigger because remote repos are usually never shallow.


It seems to me that the only reason this check has to exist is because on shallow clones the revCount breaks. If revCount was ignored in general and removed from the API, we wouldn't need to care much about repos being shallow or not.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To cover potential bugs like this, I would like to improve the testing infra. I opened an issue where this can be discussed: #9388

};

git_fetch_options opts = GIT_FETCH_OPTIONS_INIT;
opts.depth = shallow ? 1 : GIT_FETCH_DEPTH_FULL;
Copy link
Member

@DavHau DavHau Nov 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fetching with depth 1 will fail as soon as the requested rev is not the latest rev of the fetched ref, as any other rev cannot be reached with --depth 1. Basically as soon as another commit is made on the remote repo this will certainly fail.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To solve this, instead of fetching $ref:$ref, simply fetching $rev with depth=1 is sufficient. But still, the current shallowbehavior of builtins.fetchGit would be broken for which I just clarified the documentation here.

@Ericson2314 Ericson2314 merged commit f9970fd into NixOS:master Nov 20, 2023
8 checks passed
@DavHau
Copy link
Member

DavHau commented Nov 20, 2023

@Ericson2314 @edolstra You might have overlooked it, therefore pinging you.
This PR breaks backward compat for builtins.fetchGit.
As mentioned in this comment here, the --depth 1 introduced by this PR leads to a different result than the previous shallow behavior as documented here.

@edolstra edolstra deleted the libgit2 branch November 20, 2023 16:23
@tomberek tomberek mentioned this pull request Nov 22, 2023
13 tasks
@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/2023-11-20-nix-team-meeting-minutes-105/35902/1

roberth added a commit to hercules-ci/nix that referenced this pull request Jan 18, 2024
libgit2 is not capable of using git-credentials helpers yet.
This prevents private repositories from being used.

Based on code that was replaced in NixOS#9240
(Introduce libgit2); hence:

Co-authored-by: Eelco Dolstra <[email protected]>
@roberth roberth mentioned this pull request Jul 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fetching Networking with the outside (non-Nix) world, input locking with-tests Issues related to testing. PRs with tests have some priority
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

8 participants