Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make the specializing interpreter thread-safe in --disable-gil builds #115999

Open
4 of 17 tasks
Tracked by #108219
swtaarrs opened this issue Feb 27, 2024 · 9 comments
Open
4 of 17 tasks
Tracked by #108219

Make the specializing interpreter thread-safe in --disable-gil builds #115999

swtaarrs opened this issue Feb 27, 2024 · 9 comments
Assignees
Labels
topic-free-threading type-feature A feature request or enhancement

Comments

@swtaarrs
Copy link
Member

swtaarrs commented Feb 27, 2024

Feature or enhancement

Proposal:

In free-threaded builds, the specializing adaptive interpreter needs to be made thread-safe. We should start with a small PR to simply disable it in free-threaded builds, which will be correct but will incur a performance penalty. Then we can work out how to properly support specialization in a free-threaded build.

These two commits from Sam's nogil-3.12 branch can serve as inspiration:

  1. specialize: make specialization thread-safe
  2. specialize: optimize for single-threaded programs

There are two primary concerns to balance while implementing this functionality on main:

  1. Runtime overhead: There should be no performance impact on normal builds, and minimal performance impact on single-threaded code running in free-threaded builds.
  2. Reducing code duplication/divergence: We should come up with a design that is minimally disruptive to ongoing work on the specializing interpreter. It should be easy for other devs to keep the free-threaded build working without having to know too much about it.

Has this already been discussed elsewhere?

I have already discussed this feature proposal on Discourse

Links to previous discussion of this feature:

Specialization Families

Tasks

Linked PRs

@brandtbucher
Copy link
Member

(subscribing myself)

colesbury pushed a commit that referenced this issue Mar 1, 2024
…aded builds (#116013)

For now, disable all specialization when the GIL might be disabled.
@swtaarrs swtaarrs removed their assignment Mar 1, 2024
@swtaarrs
Copy link
Member Author

swtaarrs commented Mar 1, 2024

This is now a performance (rather than correctness) issue for free-threaded builds, so I'm going to focus on more time-sensitive issues for a while.

woodruffw pushed a commit to woodruffw-forks/cpython that referenced this issue Mar 4, 2024
…e-threaded builds (python#116013)

For now, disable all specialization when the GIL might be disabled.
adorilson pushed a commit to adorilson/cpython that referenced this issue Mar 25, 2024
…e-threaded builds (python#116013)

For now, disable all specialization when the GIL might be disabled.
diegorusso pushed a commit to diegorusso/cpython that referenced this issue Apr 17, 2024
…e-threaded builds (python#116013)

For now, disable all specialization when the GIL might be disabled.
@corona10
Copy link
Member

@swtaarrs Out of curiosity, is there any progress or plan for this issue?

@Fidget-Spinner
Copy link
Member

@corona10 I'm planning to work on this after I get the deferred reference stack in. However, there are no concrete plans as of now. I'm really happy for you or anyone else to propose a design for the specializing interpreter with free-threaded safety!

@corona10
Copy link
Member

@Fidget-Spinner cc @swtaarrs
Nice. I was also thinking about how to make it thread-safe in a seamless way since I agree with @swtaarrs.
But there is no good idea yet to solve the issue right now since I am not in a full-time position for this task :)
So it will be happy to see you have a good plan.
(I am curious that we can make them per-thread mechanism...)

By the way, in the short term, can we enable the specializer to be used only for the main thread if we can not solve the issue before 3.13 is released?
We can easily track the performance degradation between the default build because most of pyperformance benchmark are based on a single thread :)

@Fidget-Spinner
Copy link
Member

@corona10 for 3.13, I think generally we're focusing on scalability across multicore rather than single-threaded perf for 3.13. It's a bit too near to feature freeze for me to feel safe re-enabling specialization at this point. There are a lot of unsolved problems still even with specialization only on the main thread. Consider the following:

Two threads sharing the same code object, A and B. A is main thread.
Thread B is in LOAD_ATTR_METHOD_WITH_VALUES's action (after guards, it is in the middle of loading from a method)
Thread A is in LOAD_ATTR_METHOD_WITH_VALUES's guard, but then deopts, meaning the method reference is now most likely dead/invalid.
Thread B loads from LOAD_ATTR_METHOD_WITH_VALUE's method, it is now holding a dangling pointer.
Thread B pushes dangling pointer to the stack. Everything crashes.

I'm reading a few papers to get some inspiration and also looking at how CRuby and other runtimes deal with this. Will post back when I have an actual plan.

@mpage mpage self-assigned this Aug 8, 2024
mpage added a commit to mpage/cpython that referenced this issue Sep 13, 2024
mpage added a commit to mpage/cpython that referenced this issue Sep 17, 2024
mpage added a commit to mpage/cpython that referenced this issue Sep 25, 2024
mpage added a commit to mpage/cpython that referenced this issue Sep 26, 2024
mpage added a commit to mpage/cpython that referenced this issue Sep 28, 2024
mpage added a commit to mpage/cpython that referenced this issue Sep 30, 2024
mpage added a commit to mpage/cpython that referenced this issue Oct 5, 2024
mpage added a commit to mpage/cpython that referenced this issue Oct 7, 2024
colesbury pushed a commit that referenced this issue Oct 8, 2024
Stop the world when invalidating function versions

The tier1 interpreter specializes `CALL` instructions based on the values
of certain function attributes (e.g. `__code__`, `__defaults__`). The tier1
interpreter uses function versions to verify that the attributes of a function
during execution of a specialization match those seen during specialization.
A function's version is initialized in `MAKE_FUNCTION` and is invalidated when
any of the critical function attributes are changed. The tier1 interpreter stores
the function version in the inline cache during specialization. A guard is used by
the specialized instruction to verify that the version of the function on the operand
stack matches the cached version (and therefore has all of the expected attributes).
It is assumed that once the guard passes, all attributes will remain unchanged
while executing the rest of the specialized instruction.

Stopping the world when invalidating function versions ensures that all critical
function attributes will remain unchanged after the function version guard passes
in free-threaded builds. It's important to note that this is only true if the remainder
of the specialized instruction does not enter and exit a stop-the-world point.

We will stop the world the first time any of the following function attributes
are mutated:

- defaults
- vectorcall
- kwdefaults
- closure
- code

This should happen rarely and only happens once per function, so the performance
impact on majority of code should be minimal.

Additionally, refactor the API for manipulating function versions to more clearly
match the stated semantics.
efimov-mikhail pushed a commit to efimov-mikhail/cpython that referenced this issue Oct 9, 2024
…ython#124997)

Stop the world when invalidating function versions

The tier1 interpreter specializes `CALL` instructions based on the values
of certain function attributes (e.g. `__code__`, `__defaults__`). The tier1
interpreter uses function versions to verify that the attributes of a function
during execution of a specialization match those seen during specialization.
A function's version is initialized in `MAKE_FUNCTION` and is invalidated when
any of the critical function attributes are changed. The tier1 interpreter stores
the function version in the inline cache during specialization. A guard is used by
the specialized instruction to verify that the version of the function on the operand
stack matches the cached version (and therefore has all of the expected attributes).
It is assumed that once the guard passes, all attributes will remain unchanged
while executing the rest of the specialized instruction.

Stopping the world when invalidating function versions ensures that all critical
function attributes will remain unchanged after the function version guard passes
in free-threaded builds. It's important to note that this is only true if the remainder
of the specialized instruction does not enter and exit a stop-the-world point.

We will stop the world the first time any of the following function attributes
are mutated:

- defaults
- vectorcall
- kwdefaults
- closure
- code

This should happen rarely and only happens once per function, so the performance
impact on majority of code should be minimal.

Additionally, refactor the API for manipulating function versions to more clearly
match the stated semantics.
mpage added a commit to mpage/cpython that referenced this issue Oct 18, 2024
mpage added a commit to mpage/cpython that referenced this issue Oct 19, 2024
mpage added a commit to mpage/cpython that referenced this issue Oct 24, 2024
mpage added a commit to mpage/cpython that referenced this issue Oct 31, 2024
mpage added a commit to mpage/cpython that referenced this issue Nov 4, 2024
mpage added a commit to mpage/cpython that referenced this issue Nov 4, 2024
mpage added a commit to mpage/cpython that referenced this issue Nov 4, 2024
mpage added a commit that referenced this issue Nov 4, 2024
…for `BINARY_OP` (#123926)

Each thread specializes a thread-local copy of the bytecode, created on the first RESUME, in free-threaded builds. All copies of the bytecode for a code object are stored in the co_tlbc array on the code object. Threads reserve a globally unique index identifying its copy of the bytecode in all co_tlbc arrays at thread creation and release the index at thread destruction. The first entry in every co_tlbc array always points to the "main" copy of the bytecode that is stored at the end of the code object. This ensures that no bytecode is copied for programs that do not use threads.

Thread-local bytecode can be disabled at runtime by providing either -X tlbc=0 or PYTHON_TLBC=0. Disabling thread-local bytecode also disables specialization.

Concurrent modifications to the bytecode made by the specializing interpreter and instrumentation use atomics, with specialization taking care not to overwrite an instruction that was instrumented concurrently.
Yhg1s added a commit that referenced this issue Nov 5, 2024
…de change (#126440)

Fix the gdb pretty printer in the face of --enable-shared by delaying the attempt to load the _PyInterpreterFrame definition until after .so files are loaded.
corona10 added a commit to corona10/cpython that referenced this issue Nov 5, 2024
corona10 added a commit to corona10/cpython that referenced this issue Nov 6, 2024
corona10 added a commit that referenced this issue Nov 6, 2024
- The specialization logic determines the appropriate specialization using only the operand's type, which is safe to read non-atomically (changing it requires stopping the world). We are guaranteed that the type will not change in between when it is checked and when we specialize the bytecode because the types involved are immutable (you cannot assign to `__class__` for exact instances of `dict`, `set`, or `frozenset`). The bytecode is mutated atomically using helpers.
- The specialized instructions rely on the operand type not changing in between the `DEOPT_IF` checks and the calls to the appropriate type-specific helpers (e.g. `_PySet_Contains`). This is a correctness requirement in the default builds and there are no changes to the opcodes in the free-threaded builds that would invalidate this.
corona10 added a commit to corona10/cpython that referenced this issue Nov 6, 2024
mpage added a commit that referenced this issue Nov 6, 2024
Introduce helpers for (un)specializing instructions

Consolidate the code to specialize/unspecialize instructions into
two helper functions and use them in `_Py_Specialize_BinaryOp`.
The resulting code is more concise and keeps all of the logic at
the point where we decide to specialize/unspecialize an instruction.
@serhiy-storchaka
Copy link
Member

#126414 broke the main branch.

Python/specialize.c: In function ‘_Py_Specialize_ContainsOp’:
Python/specialize.c:2801:5: error: implicit declaration of function ‘SET_OPCODE_OR_RETURN’ [-Werror=implicit-function-declaration]
 2801 |     SET_OPCODE_OR_RETURN(instr, CONTAINS_OP);
      |     ^~~~~~~~~~~~~~~~~~~~
cc1: some warnings being treated as errors
make: *** [Makefile:3116: Python/specialize.o] Помилка 1
make: *** Очікування завершення завдань...

@mpage
Copy link
Contributor

mpage commented Nov 6, 2024

#126414 broke the main branch.

Python/specialize.c: In function ‘_Py_Specialize_ContainsOp’:
Python/specialize.c:2801:5: error: implicit declaration of function ‘SET_OPCODE_OR_RETURN’ [-Werror=implicit-function-declaration]
 2801 |     SET_OPCODE_OR_RETURN(instr, CONTAINS_OP);
      |     ^~~~~~~~~~~~~~~~~~~~
cc1: some warnings being treated as errors
make: *** [Makefile:3116: Python/specialize.o] Помилка 1
make: *** Очікування завершення завдань...

Ugh sorry. #126414 raced with #126450.

@corona10
Copy link
Member

corona10 commented Nov 9, 2024

I am working on TO_BOOL and BINARY_SUBSCR

corona10 added a commit to corona10/cpython that referenced this issue Nov 9, 2024
* None / bool / int / str are immutable types, so they is thread-safe.
* list is mutable, but by using ``PyList_GET_SIZE`` we can make it
  as thread-safe.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic-free-threading type-feature A feature request or enhancement
Projects
None yet
Development

No branches or pull requests

6 participants