Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate incorporating extras merging into the resolver #14

Open
uranusjr opened this issue Jan 31, 2020 · 4 comments
Open

Investigate incorporating extras merging into the resolver #14

uranusjr opened this issue Jan 31, 2020 · 4 comments

Comments

@uranusjr
Copy link
Member

Cargo directly merges extras (features in Rust terminology) into existing, already resolved candidates.

https://github.com/rust-lang/cargo/blob/4fbd644bc76e25815a8e3a2189b6430fe69fd0d6/src/cargo/core/resolver/mod.rs#L646-L657

This means the provider needs a way to “talk back” to the resolver “I want to add edges to another node,” which is a departure to the current design, in which all communication is one-sided (the resolver asks, and provider responds). But it would make handling extras more natural, and could help with marker merging (which might require a similar talk-back mechanism).

Another thing to consider is that merging extras could make it harder to answer where exactly does that dependency came from? because all extra dependencies got connected to a common parent, losing context. Cargo handles this by maintaining a separate record of requested features. I guess this eventually is a choice of poison; you gotta introduce the complexity somewhere.

One benefit of not merging extras in the resolver is that the graph becomes strictly insert-only. Once a node is inserted you know how many (out-going) edges it has; it won’t change. Sounds like a nice property, but is it even useful at all in practice?

@sanderr
Copy link
Contributor

sanderr commented Jun 28, 2023

This means the provider needs a way to “talk back” to the resolver

I came here to create a ticket for exactly this. Since this existing ticket has the same motivation (extras for pip) and the same high level idea, I'll elaborate here instead.

I think an interesting approach could be to provide a way for the provider to indicate that certain identifiers are connected. A rough first draft of what that could look like:

  • We introduce the concept of a direct dependency between identifiers, meaning one identifier is directly influenced by constraints on / candidates picked for another. Formally, a is dependent on b => when b receives new requirements, candidates for a are recalculated (/ existing candidates are invalidated for lazy recalculation).
  • To this end we extend the Provider interface with a influenced_by(identifier: KT) -> Optional[KT] (not a great name but it gets the point across for now). This method is called whenever a new identifier is encountered. This achieves the "talking back" by polling for a concrete relation, similar to how get_preference works.
  • Whenever a new requirement is added the the criteria, if it is marked is influencing any other identifiers, those others' candidates are refreshed.

For the pip use case this would mean that mypackage[myextra] would be marked as influenced by mypackage. If the latter receives a new constraint, it would be correctly propagated to the candidate selection for the identifier with the extra.

The above is not necessarily a concrete proposal for implementation. I have given this quite some thought and I like the general idea, but I have not fully considered the specifics yet. It is more of a PoC proposal in the hopes of moving this discussion forward. If I could get some feedback I'd be willing to spend more time on this.

Related to pypa/pip#12095 in the sense that that pull request makes an attempt to improve candidate selection for packages with extras but it falls short when additional constraints on the base package appear transitively, after the first find_matches has been called for the package with extra. As far as I can see there is no clean way to address this on pip's side alone, hence this proposal.

@sanderr
Copy link
Contributor

sanderr commented Jun 28, 2023

Other rough brainstorm ideas I had, but that I think are less interesting than the one above (and less concrete):

  1. Concept of volatile requirements where the list of candidates is not cached but recalculated whenever needed. In the case of pip, the candidate calculation for a package with an extra should be extremely cheap, and it would ensure we don't work with outdated information.
  2. Concept of derived requirements, that aren't stand-alone but provide a candidate only relative to a candidate for their base: mypackage[myextra] would only be able to provide a candidate when one has been picked for mypackage.

@uranusjr
Copy link
Member Author

uranusjr commented Jun 30, 2023

I’m not sure having candidates recalculated is helpful, since the problem here is we want a requirement (package) to be able to calculate candidates for another requirement (package[extra]). But I may be misunderstanding the idea here; the description is pretty brief as you mentioned.

The derived requirement idea is essentially the same as all the above ideas, but instead of having the hook to provide the information, the relation is directly encoded in the requirement. That could work if the requirement can eagerly derive the information—which is true for the case for extras as it is designed now, but could be less flexible for other use cases such as Provides-Dist, which is a similar idea but much more dynamic since it is encoded in downstream information, instead of upstream as extras.

@sanderr
Copy link
Contributor

sanderr commented Jun 30, 2023

I’m not sure having candidates recalculated is helpful

Well, if I understand the current pip implementation correctly, it always checks constraints and/or picked candidates for the base (package) when it is requested to find candidates for the extra package[extra]. So I do think it would resolve the issue: checking again would take any new constraints into account, reducing the result set. But I have to agree it may not be the most elegant solution. I'm having trouble coming up with a better one that is sufficiently generic. I'll have to give it some more thought

I agree with your assessment of the derived requirement idea, though I am not very familiar with Provides-Dist so I'm not sure if I see all implications.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants