Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Back jump doesn't do well in some cases #180

Open
frostming opened this issue Dec 2, 2024 · 9 comments · May be fixed by #183
Open

Back jump doesn't do well in some cases #180

frostming opened this issue Dec 2, 2024 · 9 comments · May be fixed by #183

Comments

@frostming
Copy link
Member

frostming commented Dec 2, 2024

Here are the package set to be resolved

1. A==1
2. B  # B@2 depends on D@2 and B@1 depends on D@1
3. C
4. D  # D@2 depends on A@2 and D@1 depends on A@1

Assume the resolver resolves packages from top to down, when resolving D, the current pins are:

A@1
B@2
C@x

Because B@2 is pinned which constrains D==2, we can't find a compatible version of D that is also happy with A@1, so conflicts happen and we back jump and pop the last pinned package C, which is totally unrelated to the incompatibilities.

However, due to the line in backjump implementation:

# Only backjump if the current broken state is
# an incompatible dependency
if name not in incompatible_deps:
break

The backjump quits very early and fallbacks to backtrack. It would be extremely slow if traversing C is super expensive. But it's obvious we should jump over to the state of B and try other versions. My question is, does the quit condition make sense in any case I am not aware of or should we fix it?

@notatallshaw


To be concrete, A: h11, B: httpcore, C: Pillow, D: httpx

@notatallshaw
Copy link
Collaborator

Yeah, the exit is probably too aggressive, but there are definitely cases that incorrectly reach resolution impossible without it.

I have at least a few real world examples to try against if you have an idea how to loosen this without causing incorrect resolutions.

@frostming
Copy link
Member Author

frostming commented Dec 2, 2024

I have at least a few real world examples to try against if you have an idea how to loosen this without causing incorrect resolutions.

I think we previously proposed to save the states first, and if no state that can jump is found, restore the states and fallback to backtrack.

@notatallshaw
Copy link
Collaborator

notatallshaw commented Dec 2, 2024

Btw, I have an intuition that if the backtrack causes are disjoint, i.e every cause is incompatible with some other cause (or set of causes), resolvelib could aggressively backjump. But currently the provider has no way to indicate to the resolver disjointness.

I will experiment this week with disjointness and aggressively backjumping, to see if the intuition is correct.

@notatallshaw
Copy link
Collaborator

I think we previously proposed to save the states first, and if no state that can jump is found, restore the states and fallback to backtrack.

Yeah, it wasn't very elegant though, for example it made complicated impossible resolutions much slower.

@notatallshaw
Copy link
Collaborator

notatallshaw commented Dec 10, 2024

My understanding is the way backjumping was originally implemented in resolvelib did not actually meet the prerequisites for backjumping to be logically sound. In particular a resolvelib "cause" is distinct from what a "cause" might be in satisfiability theory, for example you might end up with "causes" in resolvelib that have the constraints numpy>1.3, numpy>2, numpy<2, but I think the logical causes should be reduced to the constraints numpy>2, numpy<2.

I was hoping to introduce a new API that allowed the provider to filter these causes to only the logical ones, but I realized there's another case that I wasn't considering (#171 (comment)) and further I'm also concerned that the "mathematics" of pre-releases will have an unknown impact on trying to implement classical satisfiability theory (e.g. https://discuss.python.org/t/proposal-intersect-and-disjoint-operations-for-python-version-specifiers/71888/1).

All this is a long winded way to say I'm moving towards the opinion of introducing a save state. My thoughts are:

  1. Only create 1 save state
  2. Create that save state as late as possible
  3. Only revert to the save state if resolution impossible
  4. Give the provider no options(?), to simplify the logic

I think we would call this "optimistic backjumping", we backjump on the assumption that resolvelib causes are close enough to logical satisfiability causes, and we revert in the face of resolution impossible.

I am going to work on making a PR and running it against scenarios in https://github.com/notatallshaw/Pip-Resolution-Scenarios-and-Benchmarks to see what the performance wins are.

@notatallshaw
Copy link
Collaborator

notatallshaw commented Dec 14, 2024

I made a branch for pip that adds a commit for optimistic backjumping on top of resolvelib 1.1 (pypa/pip#13001) and then ran scenarios to compare them.

It comes with big wins and a big loss, here are the statistics comparing the resolvelib 1.1 branch to the optimistic backjumping branch:

Output from comparison using tools from Pip-Resolution-Scenarios-and-Benchmarks:

Difference for scenario scenarios/problematic.toml - autogluon:
        Both failed: Build Failure
        Distinct Wheels visisted: 162 -> 163 (100.62%)
        Total visisted packages: 2398 -> 2184 (91.08%)
        Total visisted requirements: 22286 -> 19708 (88.43%)
        Total rejected requirements: 12791 -> 11089 (86.69%)
        Total pinned packages: 729 -> 740 (101.51%)
        Total rounds: 1283 -> 1220 (95.09%)

Difference for scenario scenarios/problematic.toml - pip-issue-12754:
        Both failed: Resolution Too Deep
        Distinct Sdists visisted: 40 -> 81 (202.50%)
        Distinct Wheels visisted: 82 -> 23 (28.05%)
        Total visisted packages: 15004 -> 19031 (126.84%)
        Total visisted requirements: 57744 -> 60022 (103.94%)
        Total rejected requirements: 55920 -> 52594 (94.05%)
        Total pinned packages: 8029 -> 8551 (106.50%)
        Total rounds: 15019 -> 15012 (99.95%)

Difference for scenario scenarios/problematic.toml - TTS:
        Total visisted packages: 398 -> 383 (96.23%)
        Total visisted requirements: 1099 -> 1084 (98.64%)
        Total rejected requirements: 581 -> 533 (91.74%)
        Total rounds: 300 -> 295 (98.33%)

Difference for scenario scenarios/problematic.toml - boto3-urllib3-transient:
        Success: False -> True
        Failure Reason: Resolution Too Deep -> None
        Distinct Sdists visisted: 28 -> 0 (0.00%)
        Distinct Wheels visisted: 2604 -> 141 (5.41%)
        Total visisted packages: 419687 -> 297 (0.07%)
        Total visisted requirements: 1436990 -> 665 (0.05%)
        Total rejected requirements: 32320 -> 57 (0.18%)
        Total pinned packages: 7584 -> 277 (3.65%)
        Total rounds: 15001 -> 281 (1.87%)

Difference for scenario scenarios/problematic.toml - kedro-test:
        Total visisted packages: 3797 -> 3843 (101.21%)
        Total visisted requirements: 12093 -> 12395 (102.50%)
        Total rejected requirements: 3006 -> 3048 (101.40%)
        Total pinned packages: 2051 -> 2084 (101.61%)
        Total rounds: 2774 -> 2816 (101.51%)

Difference for scenario scenarios/problematic.toml - sphinx:
        Failure Reason: Resolution Too Deep -> Build Failure
        Distinct Sdists visisted: 1 -> 4 (400.00%)
        Distinct Wheels visisted: 124 -> 135 (108.87%)
        Total visisted packages: 97751 -> 46035 (47.09%)
        Total visisted requirements: 135827 -> 64137 (47.22%)
        Total rejected requirements: 137900 -> 64823 (47.01%)
        Total pinned packages: 8118 -> 3904 (48.09%)
        Total rounds: 15013 -> 7145 (47.59%)

Difference for scenario scenarios/problematic.toml - backtracks-to-old-scipy:
        Both failed: Build Failure
        Total visisted packages: 235 -> 211 (89.79%)
        Total visisted requirements: 967 -> 679 (70.22%)
        Total rejected requirements: 316 -> 236 (74.68%)
        Total rounds: 99 -> 91 (91.92%)

Difference for scenario scenarios/problematic.toml - apache-airflow-google-cloud:
        Success: True -> False
        Failure Reason: None -> Resolution Too Deep
        Distinct Wheels visisted: 179 -> 195 (108.94%)
        Total visisted packages: 184 -> 16791 (9125.54%)
        Total visisted requirements: 520 -> 31559 (6069.04%)
        Total rejected requirements: 36 -> 3606 (10016.67%)
        Total pinned packages: 163 -> 14853 (9112.27%)
        Total rounds: 165 -> 15007 (9095.15%)

Difference for scenario scenarios/problematic.toml - django-stubs:
        Total visisted packages: 38 -> 30 (78.95%)
        Total visisted requirements: 190 -> 135 (71.05%)
        Total rejected requirements: 21 -> 10 (47.62%)
        Total rounds: 26 -> 25 (96.15%)

Difference for scenario scenarios/problematic.toml - kauldron-docs:
        Success: False -> True
        Failure Reason: Resolution Too Deep -> None
        Distinct Sdists visisted: 17 -> 6 (35.29%)
        Distinct Wheels visisted: 323 -> 303 (93.81%)
        Total visisted packages: 119167 -> 4565 (3.83%)
        Total visisted requirements: 372168 -> 27670 (7.43%)
        Total rejected requirements: 48082 -> 1812 (3.77%)
        Total pinned packages: 8605 -> 2362 (27.45%)
        Total rounds: 15005 -> 2830 (18.86%)

Some details to note, my tooling stops resolution at ~15k rounds as so far I haven't found a resolution that completes between 15k and 200k (pip's max default) rounds.

There were no resolution differences for either popular packages or big packages, all the resolution differences were related to problematic resolutions.

Generally performance was much better, in-particular boto3-urllib3-transient and kauldron-docs went from resolution too deep to quickly solvable which is the best win. And sphinx went from resolution too deep to build failure, which I consider a win because it gives the user information to apply lower bounds.

However, apache-airflow-google-cloud went from quickly completing to resolution too deep, which I consider to be the worst failure, this is the same error that the current pip 24.3 produces, and resolvelib 1.1 was going to solve this. I suspect that optimistic backjumping has skipped over an important branch in the resolution graph and even with optimistic backjumping it can not reach resolution impossible before the 15k rounds in my tooling (nor even the 200k rounds in pip 24.3).

I have some bug fixes and further optimizations to pip I want to test against to see if this improves the "apache-airflow-google-cloud" situation at all, but due to obligations I don't think I will have anymore time to dedicate to this in 2024. Currently this leaves me on the fence about optimistic backjumping.

@notatallshaw
Copy link
Collaborator

Here's my current thinking:

  1. The aim should be to enable proper backjumping (without the need to fallback) because it's clearly more efficient
  2. Optimistic backjumping is a step forward for many use cases, even if it's not a 100% win, it should probably be enabled

With regards to "1." I think resolvelib needs to be slightly redesigned, specifically rather than attempting to pin one criterion at a time I think we would need to pass all criterion to the provider at a time and ask "which ones are unsatisfiable?" (either because they are mutually exclusive or because no version matches their range). This would be a significant change to the API and would require a resolvelib 2.0.

For "2." I have been testing optimizations on the provider side to see if we can overcome this one known example where it reaches ResolutionTooDeep, implementing something like astral-sh/uv#9843 looks promising, I had one example where a partially implemented version of it resolved, I will keep working on this. In the meantime I will raise a PR on resolvelibs side.

@frostming
Copy link
Member Author

Feel free to do 1, the whole implementation is hidden behind a private Resolution class

@notatallshaw
Copy link
Collaborator

I'm fairly sure it will require a change to the provider API to work though, I'll start taking a more serious look in a few weeks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants