Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix race condition involving wrapper lookup #865

Merged
merged 3 commits into from
Jan 20, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
66 changes: 62 additions & 4 deletions src/nb_type.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,60 @@ static PyObject **nb_weaklist_ptr(PyObject *self) {
return weaklistoffset ? (PyObject **) ((uint8_t *) self + weaklistoffset) : nullptr;
}

static void nb_enable_try_inc_ref(PyObject *obj) noexcept {
#if 0 && defined(Py_GIL_DISABLED) && PY_VERSION_HEX >= 0x030E00A5
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the 0 intentional here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The APIs haven't been added to CPython yet, but I wanted to get this PR ready while the issue was still fresh in my mind. We can either:

  1. Delete this code path for now
  2. Wait until the APIs are added
  3. Or anything else you prefer

Copy link
Owner

@wjakob wjakob Jan 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's OK to commit the code as is, we can remove the #if 0 part once the change to CPython goes in.

Do I understand correctly that the version below now has an optimization (specific to object construction) that won't be in the official API?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I intend to add a similar optimization in the implementation of the CPython API.

PyUnstable_EnableTryIncRef(obj);
#elif defined(Py_GIL_DISABLED)
// Since this is called during object construction, we know that we have
// the only reference to the object and can use a non-atomic write.
assert(obj->ob_ref_shared == 0);
obj->ob_ref_shared = _Py_REF_MAYBE_WEAKREF;
#endif
}

static bool nb_try_inc_ref(PyObject *obj) noexcept {
#if 0 && defined(Py_GIL_DISABLED) && PY_VERSION_HEX >= 0x030E00A5
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the 0 intentional here?

return PyUnstable_TryIncRef(obj);
#elif defined(Py_GIL_DISABLED)
// See https://github.com/python/cpython/blob/d05140f9f77d7dfc753dd1e5ac3a5962aaa03eff/Include/internal/pycore_object.h#L761
uint32_t local = _Py_atomic_load_uint32_relaxed(&obj->ob_ref_local);
local += 1;
if (local == 0) {
// immortal
return true;
}
if (_Py_IsOwnedByCurrentThread(obj)) {
_Py_atomic_store_uint32_relaxed(&obj->ob_ref_local, local);
#ifdef Py_REF_DEBUG
_Py_INCREF_IncRefTotal();
#endif
return true;
}
Py_ssize_t shared = _Py_atomic_load_ssize_relaxed(&obj->ob_ref_shared);
for (;;) {
// If the shared refcount is zero and the object is either merged
// or may not have weak references, then we cannot incref it.
if (shared == 0 || shared == _Py_REF_MERGED) {
return false;
}

if (_Py_atomic_compare_exchange_ssize(
&obj->ob_ref_shared, &shared, shared + (1 << _Py_REF_SHARED_SHIFT))) {
#ifdef Py_REF_DEBUG
_Py_INCREF_IncRefTotal();
#endif
return true;
}
}
#else
if (Py_REFCNT(obj) > 0) {
Py_INCREF(obj);
return true;
}
return false;
#endif
}

static PyGetSetDef inst_getset[] = {
{ "__dict__", PyObject_GenericGetDict, PyObject_GenericSetDict, nullptr, nullptr },
{ nullptr, nullptr, nullptr, nullptr, nullptr }
Expand Down Expand Up @@ -98,6 +152,7 @@ PyObject *inst_new_int(PyTypeObject *tp, PyObject * /* args */,
self->clear_keep_alive = 0;
self->intrusive = intrusive;
self->unused = 0;
nb_enable_try_inc_ref((PyObject *)self);

// Update hash table that maps from C++ to Python instance
nb_shard &shard = internals->shard((void *) payload);
Expand Down Expand Up @@ -163,6 +218,7 @@ PyObject *inst_new_ext(PyTypeObject *tp, void *value) {
self->clear_keep_alive = 0;
self->intrusive = intrusive;
self->unused = 0;
nb_enable_try_inc_ref((PyObject *)self);

nb_shard &shard = internals->shard(value);
lock_shard guard(shard);
Expand Down Expand Up @@ -1766,16 +1822,18 @@ PyObject *nb_type_put(const std::type_info *cpp_type,
PyTypeObject *tp = Py_TYPE(seq.inst);

if (nb_type_data(tp)->type == cpp_type) {
Py_INCREF(seq.inst);
return seq.inst;
if (nb_try_inc_ref(seq.inst)) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that if nb_try_inc_ref fails, what we actually want to do is to break out of the loop that considers the different Python objects associated with that pointer address. We tried the only possible match, and it cannot be returned.

Ditto below.

There are also two other places, where this nb_try_inc_ref will be needed, which is in nb_type_put_p below (mirrors the function here, but for polymorphic C++ objects)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think breaking out here would create more wrappers than necessary. The case I'm thinking about is where there are multiple lookups before the to-be-deallocated wrapper is removed from inst_c2p:

T1: starts to deallocate wrapper object, but hasn't removed it from inst_c2p yet.
T2: lookups wrapper, sees zero refcount. The entry is converted to a linked list and the new wrapper is inserted at the end.
...
T3: lookups wrapper. The first entry still the zero refcount wrapper. The second entry is the wrapper we want. If we break out early, we'll create a third wrapper, which doesn't seem ideal.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In nanobind (and pybind11, for that matter), a single pointer can be associated with multiple Python instances.

For example, you could have

struct A {
    B b;
    /* ... */
}

with both an A* pointer and a B* pointer being simultaneously alive through Python wrappers. These can be created and GCed independently from each other.

The nb_type_put() function converts a C++ pointer into an associated Python object (if one exists) or it creates a new one. It walks through a chain of objects that are associated with a given pointer address until it finds one of the right type. There can only be one of the right type.

Your commit adds a failure mechanism, where it turns out that one wrapper is in the process of being deleted, and we need to create a new one. In this case, it (AFAIK) doesn't make sense to try the other types associated with the current pointer, since we already found the one. Hence the suggestion to break and immediately create a new wrapper. Arguably this is pretty minor and unlikely to have a big performance impact.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hum, thinking more about this, I think that is currently necessary continue iterating. There could be multiple objects that are in the process of being cleaned up.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can also get multiple nanobind instances wrapping the same pointer, with the same type, by using nb::inst_wrap from the low-level API.

return seq.inst;
}
}

if (!lookup_type())
return nullptr;

if (PyType_IsSubtype(tp, td->type_py)) {
Py_INCREF(seq.inst);
return seq.inst;
if (nb_try_inc_ref(seq.inst)) {
return seq.inst;
}
}

if (seq.next == nullptr)
Expand Down
5 changes: 5 additions & 0 deletions tests/test_thread.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,8 @@ struct Counter {
}
};

struct GlobalData {} global_data;

nb::ft_mutex mutex;

NB_MODULE(test_thread_ext, m) {
Expand All @@ -34,4 +36,7 @@ NB_MODULE(test_thread_ext, m) {
nb::ft_lock_guard guard(mutex);
c.inc();
}, "counter");

nb::class_<GlobalData>(m, "GlobalData")
.def_static("get", [] { return &global_data; }, nb::rv_policy::reference);
}
15 changes: 14 additions & 1 deletion tests/test_thread.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
import test_thread_ext as t
from test_thread_ext import Counter
from test_thread_ext import Counter, GlobalData
from common import parallelize

def test01_object_creation(n_threads=8):
Expand Down Expand Up @@ -75,3 +75,16 @@ def f():

parallelize(f, n_threads=n_threads)
assert c.value == n * n_threads


def test_06_global_wrapper(n_threads=8):
# Check wrapper lookup racing with wrapper deallocation
n = 10000
def f():
for i in range(n):
GlobalData.get()
GlobalData.get()
GlobalData.get()
GlobalData.get()

parallelize(f, n_threads=n_threads)
Loading