-
-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Store path provenance tracking #11749
base: master
Are you sure you want to change the base?
Conversation
e08ec75
to
0956b7e
Compare
Backward-compatible schema changes (e.g. those that add tables or nullable columns) now no longer need a change to the global schema file (/nix/var/nix/db/schema). Thus, old Nix versions can continue to access the database. This is especially useful for schema changes required by experimental features. In particular, it replaces the ad-hoc handling of the schema changes for CA derivations (i.e. the file /nix/var/nix/db/ca-schema). Schema versions 8 and 10 could have been handled by this mechanism in a backward-compatible way as well.
f2b796f
to
31d1d7e
Compare
This looks like a cool idea. How does it help me determine which expression (which line of which file) in the checkout of some repository defines the Like, you implied this would support tracking the store path back to the expression. And, in the |
Nix historically has been bad at being able to answer the question "where did this store path come from", i.e. to provide traceability from a store path back to the Nix expression from which is was built. Nix tracks the "deriver" of a store path (the .drv file that built it) but that's pretty useless in practice, since it doesn't link back to the Nix expressions. So this PR adds a "provenance" field (a JSON object) to the ValidPaths table and to .narinfo files that describes where the store path came from and how it can be reproduced. There are currently 3 types of provenance: * "copied": Records that the store path was copied or substituted from another store (typically a binary cache). Its "from" field is the URL of the origin store. Its "provenance" field propagates the provenance of the store path on the origin store. * "derivation": Records that the store path is the output of a .drv file. This is equivalent for the "deriver" field, but it has a nested "provenance" field that records how the .drv file was created. * "flake": Records that the store path was created during the evaluation of a flake output. Example: $ nix path-info --json /nix/store/xcqzb13bd60zmfw6wv0z4242b9mfw042-patchelf-0.18.0 { "/nix/store/xcqzb13bd60zmfw6wv0z4242b9mfw042-patchelf-0.18.0": { "provenance": { "from": "https://cache.example.org", "provenance": { "drv": "rlabxgjx88bavjkc694v1bqbwslwivxs-patchelf-0.18.0.drv", "output": "out", "provenance": { "flake": { "lastModified": 1729856604, "narHash": "sha256-obmE2ZI9sTPXczzGMerwQX4SALF+ABL9J0oB371yvZE=", "owner": "NixOS", "repo": "patchelf", "rev": "689f19e499caee8e5c3d387008bbd4ed7f8dc3a9", "type": "github", }, "output": "packages.x86_64-linux.default", "type": "flake" }, "type": "derivation" }, "type": "copied" }, ... } } This specifies that the store path was copied from the binary cache https://cache.example.org and it's the "out" output of a store derivation that was produced by evaluating the flake ouput `packages.x86_64-linux.default` of some revision of the patchelf GitHub repository.
It doesn't currently, since that information wouldn't be enough to reproduce the store derivation (i.e. a package function in Nixpkgs requires arguments to be able to reproduce its output, not to mention stuff like overrides). But storing the top-level flake + flake output name that caused the store derivation to be created does allow the store derivation to be reproduced.
The problem there is that evaluation of non-flake expressions is not hermetic, so we really do need something like flakes for provenance. |
It will be less likely that you can verify the provenance, but something could be recorded nonetheless. |
(I haven't read the whole diff yet, so apologies for questions I could have answered myself, but these will need to be documented anyway, so also you're welcome :) )
Many evaluations will produce the same paths. How do we deal with that? I suppose we only need a Another solution is to only store the first provenance, but this is too arbitrary IMO, and can also be achieved with a first referrer field if we feel like storing all referrers edges is too expensive or impractical for "non-enumerating" stores like the binary cache stores. Putting new appendable data into the stores including the binary caches stores is quite a step. Do we really need this to be in the binary cache?A lot of the value of this feature could instead be produced by a local database, since that's where evaluation and realisation ultimately happen anyway. Some questionsThings to be documented and/or implemented
|
struct ProvFlake | ||
{ | ||
std::shared_ptr<nlohmann::json> flake; // FIXME: change to Attrs | ||
std::string flakeOutput; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
std::string flakeOutput; | |
std::vector<std::string> flakeOutput; |
* derivation input source) that was produced by the evaluation of | ||
* a flake. | ||
*/ | ||
struct ProvFlake |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a layer violation. We could define something like struct ProvOther { std::string type; nlohmann::json value; }
at the store layer and refine this in upper layers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm thinking about getting rid of all the Prov*
types and just passing provenance around as a JSON value.
Indeed provenance doesn't need to be hermetic or reproducible, so we could certainly have a provenance type for non-flake evaluations.
The provenance is the evaluation that produced the store path, i.e. the first one. There can of course be many other evaluations that produce the same store path, but those are not the provenance for that particular store / binary cache. (The same applies to other types of provenance like substitution: a path can be substituted from many binary caches, but we only record the one we actually used.) Recording other provenances makes the metadata for a store path potentially grow without bounds. And in the case of .narinfo files, we really don't want to update them after creation due to caching etc. This is the same semantics as the deriver field BTW.
I think so, because without that you can't query the ultimate provenance of a store path in a binary cache like cache.nixos.org. |
Motivation
Nix historically has been bad at being able to answer the question "where did this store path come from", i.e. to provide traceability from a store path back to the Nix expression from which is was built. Nix tracks the "deriver" of a store path (the
.drv
file that built it) but that's pretty useless in practice, since it doesn't link back to the Nix expressions.So this PR adds a "provenance" field (a JSON object) to the
ValidPaths
table and to.narinfo
files that describes where the store path came from and how it can be reproduced.There are currently 3 types of provenance:
copied
: Records that the store path was copied or substituted from another store (typically a binary cache). Its "from" field is the URL of the origin store. Its "provenance" field propagates the provenance of the store path on the origin store.derivation
: Records that the store path is the output of a .drv file. This is equivalent for the "deriver" field, but it has anested "provenance" field that records how the .drv file was created.
flake
: Records that the store path was created during the evaluation of a flake output.Example:
This specifies that the store path was copied from the binary cache https://cache.example.org/ and it's the "out" output of a store derivation that was produced by evaluating the flake ouput
packages.x86_64-linux.default
of some revision of the patchelf GitHub repository.Depends on #11668.
Context
Priorities and Process
Add 👍 to pull requests you find important.
The Nix maintainer team uses a GitHub project board to schedule and track reviews.