Package management questions #1965

jneem · 2024-06-18T22:12:41Z

jneem
Jun 18, 2024
Collaborator

I've been working on a package manager prototype here, and the result is something that works but also raises a bunch of questions. I've collected some here, together with the decisions made by the prototype.

Manifest vs "inline"

Nix and Dhall allow for importing dependencies dynamically, using things like fetchGit. Many (most?) other languages require dependencies to be specified in some sort of package manifest. The inline form has the advantage of being lightweight -- you can get it all done with only one file, and you only need to download packages that you actually use during evaluation -- but having a manifest makes it easier to use nickel in a sandboxed environment, and it makes lockfiles easier to manage.

The current prototype uses a manifest.

Manifest auto-detection

If we're using an external manifest, how do we find it? It would be nice if the user could just type nickel export foo.ncl and have it just work. The most common method of autodetection appears to be to look in parent directories until we find the manifest file, which has a well-known name. There is a small backwards-compatibility concern with this, in that if a user's system happens to have a file with that well-known name, adding package support to nickel could lead us the misinterpret that file as a manifest.

The current prototype's well-known name is "package.ncl", and the lock-file is called "package.lock". At least the lock-file name should probably change, or no one will be able to use nickel and node in the same project...

Kinds of dependencies

Where can dependencies come from? Dhall allows imports from arbitrary urls. Nix supports fetching from a variety of VCSs, paths, and archive formats.

The current nickel prototype allows for importing from

paths (relative or absolute)
git repositories (currently only from HEAD, but the idea would be to also support branches, tags, and revisions specified by hashes)
a central registry, that can identify packages by name and version number

The current prototype also requires the imported package to contain a manifest file. This might not be necessary, but I guess it will be necessary if the imported package wants to have its own dependencies.

Version compatibility and resolution

How do we choose package versions, and how do we handle a package that gets imported multiple times in the dependency tree? This has to depend on the dependency type, I think.

For path dependencies, there is no version choice: we import the version of the dependency that is present on the filesystem at that path. Path dependencies do present some annoyances for the lock-file, though: a path dependency's dependencies can change at any time. Therefore the lock file should record the existence of a path dependency, but not record its dependencies. (This is consistent with what cargo does.)

Git dependencies can be immutable (if some hash is specified) or not (if a branch or a tag is specified). Immutable git dependencies are easy for the resolver. For mutable git dependencies, if they are not yet present in the lock-file then they are fetched and the tree hash is recorded. After that, the tree hash is looked up in the lock-file and the dependency is treated as an immutable git dependency.

Dependencies from the registry are the most interesting. Fortunately, there are fairly well-established conventions for specifying ranges of versions (like ">=1.0 <3.0", or "^1.2"). What's less clear is how to handle multiple packages with overlapping ranges. Some languages (e.g. python) insist that each package resolves to a single version across the whole dependency tree. Other languages allow multiple versions, keeping track of which package in the dependency tree needs to import which version of a package.

I think we want to allow multiple versions of a package; the alternative can be fragile and annoying. But then we need to figure out how many different versions to allow. There's a trade-off: if we allow pulling in a different version every time a package gets imported, solving the dependency graph is easy. But it increases the chance of getting incompatibilities at runtime: we might accidentally get a value from [email protected] and try to pass it to an incompatible function defined in [email protected].

The current prototype uses a strategy similar to cargo: it divides package versions into semver-delimited "bins" and allows resolution to choose at most one version from each bin. That is, we can have a [email protected] and a [email protected] in the same dependency tree, but not a [email protected] and a [email protected].

Lock-file behavior and updates

What happens if we have a lock-file, but we modify the manifest? We don't want to be too strict about requiring the exact versions in the lock-file, or we'll end up forcing the user to re-create the lock-file from scratch.

The current prototype treats the lock-file as a suggestion: during resolving, when choosing the next package version to try, it picks the locked version first. But if the locked version leads to a conflict, it will try another version without complaining. If nothing has changed since the lock-file was created, it should always resolve the same versions.

Registry updates, and submitting packages

How should we manage the global registry? There's a potential for incurring substantial maintenance costs here, so we should be careful.

The current implementation of the registry is as a git repo with a bunch of files (one per package, containing a line per version). Each entry specifies the location of the package (currently required to be on github) and its git tree hash. This ensures that packages are immutable, but it doesn't stop them from disappearing: we don't keep a copy of the actual package contents.

The current prototype doesn't have any automatic way of introducing new packages. There is a command to scrape package repos and update the list of available versions, so the initial plan is to add new packages manually, and use a cron job to keep them sort of up-to-date.

Registry namespacing

I think packages in the registry should be namespaced, probably with a depth of 2. That is, they should be identified as organization/package-name. This maps nicely to github names, and so if we enable automatic package submission in the future, it will allow us to outsource authorization: you can publish tweag/foo if you're in the github tweag organization.

There is a possible downside of tying this too tightly to github. Maybe there should be depth-3 names, like github/tweag/foo?

Manifest file format

The prototype has its manifest in nickel format. This seemed like a fun choice (and it allows us to use a contract for validation and auto-complete), but a plain-data format like toml might be better for tooling.

Specifying dependency names

How should we refer to package names in the manifest, and in nickel code? The syntax should be light-weight and unambiguous, but it should also support package renaming.

In the current prototype, the manifest explicitly assigns an identifier to every package. For example, your manifest could include

{
  # ...
  dependencies = {
    foo = 'Index { package = "tweag/foo", version = "1.2.0" }
    bar = 'Path "../my-bar",
  }
} | std.package.Manifest

Then the actual nickel code can write import foo or import bar.

This choice has the advantage that renaming packages is trivial, but the disadvantage that the manifest syntax is redundant in the common case. Another possibility would be to allow

dependencies = {
  "tweag/foo" = "1.2.0"
}

and then import it with import tweag/foo.

Package entry points

Packages might consist of multiple files, and they might not want to publicly expose the detail of how they're structured. How do we know what part is public?

node allows the manifest to specify the entry point(s). Our prototype hard-codes "main.ncl"; when you type import foo, you get the file main.ncl in the package's root directory.

What kind of tooling do we need?

The current prototype doesn't have much. We probably want

a command for adding a new dependency to the manifest (checking if it exists, and picking the most recent version)
a command for downloading the dependency tree (for use in build systems that expect different "fetch" and "build" phases)
a command that checks for new dependency versions and updates the manifest

Anything else?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Package management questions #1965

{{title}}

Replies: 0 comments

Select a reply

Package management questions #1965

jneem Jun 18, 2024 Collaborator

Manifest vs "inline"

Manifest auto-detection

Kinds of dependencies

Version compatibility and resolution

Lock-file behavior and updates

Registry updates, and submitting packages

Registry namespacing

Manifest file format

Specifying dependency names

Package entry points

What kind of tooling do we need?

Replies: 0 comments

jneem
Jun 18, 2024
Collaborator