Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarifications needed re packaging flow from the perspective of a build backend #1779

Open
1 task done
zahlman opened this issue Jan 7, 2025 · 2 comments
Open
1 task done
Labels
type: discussion Discussion of general ideas, design, etc. type: question A user question that needs/needed an answer

Comments

@zahlman
Copy link
Contributor

zahlman commented Jan 7, 2025

Issue Description

I'm writing a build backend which aims to deliver these key features relevant to the discussion (among others):

  1. Sdists will only contain static metadata;
  2. The code for sdist-building and wheel-building is separable, such that, when an sdist is downloaded and installed automatically (e.g. by Pip), only the wheel-building code is needed as a build dependency;
  3. There is no legacy support - sdists contain a PKG-INFO specifying a core metadata version of 2.2 or higher (most likely 2.4) and a pyproject.toml.

There are several confusing points I've encountered in the description of pyproject.toml and of the core metadata format, and how they are used in source trees (pyproject.toml only), sdists (both) and wheels (core metadata only). My goal here is to verify that I can accomplish my goals while remaining standards compliant.

The main conceptual problem I'm having is that pyproject.toml and core metadata are described as canonical metadata formats, yet a non-legacy sdist is expected to contain both. I have many questions as a result.

First, regarding sdist creation: my understanding is that in this process, the build backend:

  • MUST faithfully represent static metadata (if any) from the source tree's pyproject.toml in the PKG-INFO;
  • MAY compute values for dynamic metadata and include these in the PKG-INFO as well.

The question is, what happens to the version of pyproject.toml that ends up in the sdist?

It seems to me that it cannot in general be an exact copy of the source tree's pyproject.toml, because if I compute dynamic metadata then there is a conflict - the field is marked dynamic in pyproject.toml but provided statically in PKG-INFO.

Am I at liberty to create an entirely new pyproject.toml, as long as it follows the spec? For example, can I remove the [project] table (since in general this table isn't required to be present, and I've already fully "compiled" its information into PKG-INFO)? Can I change the [build-system] table, such that a different build backend will be used to create the wheel? (One implementation idea I had was to incorporate an in-tree, wheel-specific backend into the sdist.) Should [project] at least be edited to reflect the dynamic metadata values that were calculated (e.g. add the computed values as static keys, and remove the corresponding names from project.dynamic)?


Then, regarding wheel building. Regarding core metadata, it says that "Fields defined in the following specification should be considered valid, complete and not subject to change."

Does that imply that the wheel's METADATA MUST be a copy of the sdist's PKG-INFO?

Doesn't that prevent computing metadata values at wheel creation time? (Not applicable to me, but still worth raising the question.)

Doesn't that in turn imply that non-legacy sdists need to have all the dynamic metadata values computed, and they can't be deferred to wheel-building? (I think this is intentional, so that e.g. installers can figure out basic information about the package without building it. But as of 24.3.1, Pip still does the build first anyway, even when PKG-INFO declares the latest metadata version.)

Doesn't that cause a problem for PEP 725 – Specifying external dependencies in pyproject.toml, since they propose to give semantics to Requires-External metadata whereby the wheel's version could differ? (In particular: the wheel-building process could use a tool like cibuildwheel to vendor a compiled shared C library whose source is not included in the sdist; by my reading of the PEP, the intent is that PKG-INFO would describe the library as an external requirement, but METADATA would not.)

Also: when building the wheel, is it required to look at pyproject.toml at all, or to validate it? My understanding is that the only mandatory purpose pyproject.toml actually serves at this point in the process is to tell an installer what build backend to use (and what its statically-known dependencies are); the backend itself is free to use other files for configuration (i.e. the config isn't required to be in [tool], and other tools simply won't be invoked at this point), and the [project] metadata is either redundant with PKG-INFO or erroneous.


Bonus round:

Given that PEP 725 isn't accepted yet, is there any circumstance in which it would make sense for a modern build backend to output Requires-External or Supported-Platform values in core metadata? I can't think of any.

Code of Conduct

  • I am aware that participants in this repository must follow the PSF Code of Conduct.
@webknjaz webknjaz added type: question A user question that needs/needed an answer type: discussion Discussion of general ideas, design, etc. labels Jan 7, 2025
@webknjaz
Copy link
Member

webknjaz commented Jan 7, 2025

FYI, discussions are typically more lively @ https://discuss.python.org/c/packaging.

@webknjaz
Copy link
Member

webknjaz commented Jan 7, 2025

It seems to me that it cannot in general be an exact copy of the source tree's pyproject.toml, because if I compute dynamic metadata then there is a conflict - the field is marked dynamic in pyproject.toml but provided statically in PKG-INFO.

I'm pretty sure that pyproject.toml is to be kept “as is”. The build backend for building the wheels will rely on it. This usually means including all the files necessary to build wheels (and I usually prefer the entire Git repo work dir) into sdists. sdists are often regarded as a close enough source of truth, almost equivalent to Git checkouts by various parties (downstream redistributors, for example). Said parties would be using sdists not just for building wheels, but also for running the tests and building the docs.

I'd expect a build backend for building from sdist to behave as close to building from Git as possible. setuptools-scm, for example, injects some metadata (I don't remember where) so that when it's executed from sdist, it outputs the same version (since there's no Git to consult with). It also has a mechanism for extracting that metadata from Git archives (requiring some additional configuration from the users).

My understanding is that pyproject.toml is human-writable, while PKG-INFO is machine-writable. With the points above, as a user, I would expect that it remains unchanged.

As for mixing up static+dynamic metadata, I recall @henryiii presenting something during PyPA Packaging Summit about two years ago.

Also, cc @pradyunsg for PEP 725 opinions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: discussion Discussion of general ideas, design, etc. type: question A user question that needs/needed an answer
Projects
None yet
Development

No branches or pull requests

2 participants