Improve type_caster for complex number types. #854

hpkfft · 2025-01-12T19:07:01Z

Results for this PR were obtained with Python 3.12.8 and g++ 14.2.0.

Currently, after importing the module defined in tests/test_stl.cpp as t and defining

    class MyComplex:
        def __init__(self, real, imag):
            self.re = real
            self.im = imag
        def __complex__(self):
            return complex(self.re, self.im)

the following raises a TypeError:

    assert t.complex_value_double(MyComplex(1, 2)) == complex(1.0, 2.0)

Instead, it is expected that the __complex__ method be called and the assertion be correct.
Note that the Stable ABI for PyComplex_RealAsDouble,ImagAsDouble was fixed in version 3.13. See cpython issue 109598

Also, the current complex type_caster does not honor nb::arg().noconvert().
The following test passes

    with pytest.raises(TypeError, match="incompatible function arguments"):
        t.complex_value_double_nc(0)

but given

    class MyInt(int):
        def __new__(cls, value):
            return super().__new__(cls, value)

the following tests DID NOT RAISE <class 'TypeError'> and fail:

    with pytest.raises(TypeError, match="incompatible function arguments"):
        t.complex_value_double_nc(MyInt(7))

    with pytest.raises(TypeError, match="incompatible function arguments"):
        t.complex_value_double_nc(0.0)

    with pytest.raises(TypeError, match="incompatible function arguments"):
        t.complex_value_float_nc(1.1 + 2.0j)  # Inexact narrowing conversion

This PR re-writes the complex type_caster.

Regarding performance, given the extension

    m.def("cadd", [](std::complex<double> x, std::complex<double> y) {
                          return x + y; });

    m.def("caddf", [](std::complex<float> x, std::complex<float> y) {
                          return x + y; });

and measuring time with

python3 -m timeit -n 100000000 -r 10 \
-s "import numpy as np, my_extension as me; x = complex(1); y = complex(2)" \
"me.cadd(x, y)"

I obtained the following results:

	old 312	new 312	old abi3	new abi3
cadd(complex)	27.6 ns	23.7 ns	31.0 ns	27.5 ns
caddf(complex)	28.0 ns	26.5 ns	29.2 ns	29.8 ns
cadd(np.complex128)	42.8 ns	30.0 ns	45.7 ns	44.1 ns
caddf(np.complex64)	393. ns	81.1 ns	403. ns	284. ns

For the last two lines above, note that

>>> issubclass(np.complex128, complex)
True
>>> issubclass(np.complex64, complex)
False

I would expect that compiling new abi3 with -DPy_LIMITED_API=0x030D0000 would reduce caddf(np.complex64) by 2X, but I don't have Python 3.13 handy.

wjakob · 2025-01-14T02:38:17Z

Like with the floating point type casters, could you move the (now fairly large) body of the converter to the static library component?

hpkfft · 2025-01-14T05:59:59Z

Sure. Do you prefer not to include <complex> in the static library, or is it OK to do so for a single cpp file?

wjakob · 2025-01-14T11:51:31Z

It's fine to include in the cpp file.

hpkfft · 2025-01-15T05:29:37Z

I ran the caddf(np.complex64) benchmark with Python 3.13.1

	(nsec)
old 313 non-STABLE	380.
new 313 non-STABLE	82.1
old abi3 0x030C0000	384.
new abi3 0x030C0000	274.
new abi3 0x030D0000	136.

The old complex caster used the C API functions from the stable abi regardless of whether STABLE_ABI was specified.
Prior to version 3.13, these functions had a bug (or, at least, some undesirable behavior) when the python object was neither complex nor a subclass of complex. Since np.complex64 is important to nanobind users, a workaround was needed. The default from_python</*Recursive=*/true> made a temporary python object and called from_python</*Recursive=*/false> on it. It did this using the heuristic that an object with an attribute named imag should probably be converted to complex. It's not foolproof, but it works for np.complex64 .

The new complex caster uses PyComplex_AsCComplex(), which is not in the stable abi, when STABLE_ABI is not specified. This is faster, and no workaround is needed.

The new caster uses the same complex C API functions as the old one did when STABLE_ABI is specified. A workaround is applied when the object is not complex, not a subtype of complex, and has attribute __complex__. Also, the workaround was made faster. The function from_python() is no longer templated.

Also, templating is removed from from_cpp() since there seems to be no reason to distinguish rvalues from lvalues.

For non-STABLE_ABI builds, the new caster is much faster, and for STABLE_ABI builds it's faster.

As this Note mentions, the behavior of a stable abi function can change. In particular, the workaround is no longer needed in version 3.13. But, it is still perfectly correct to define Py_LIMITED_API=0x030C0000 and build with the workaround code in place.

Improve type_caster for complex number types.

120449f

hpkfft added 2 commits January 14, 2025 19:56

Move load_cmplx() into the nanobind library.

c9cbde1

Fix visibility of load_cmplx().

0de790d

wjakob force-pushed the master branch 2 times, most recently from 98def19 to 92d9cb3 Compare January 27, 2025 01:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve type_caster for complex number types. #854

Improve type_caster for complex number types. #854

hpkfft commented Jan 12, 2025

wjakob commented Jan 14, 2025

hpkfft commented Jan 14, 2025

wjakob commented Jan 14, 2025

hpkfft commented Jan 15, 2025

Improve type_caster for complex number types. #854

Are you sure you want to change the base?

Improve type_caster for complex number types. #854

Conversation

hpkfft commented Jan 12, 2025

wjakob commented Jan 14, 2025

hpkfft commented Jan 14, 2025

wjakob commented Jan 14, 2025

hpkfft commented Jan 15, 2025