Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recommend cpython agnosticism in cpp custom op tutorial #3250

Merged
merged 8 commits into from
Jan 28, 2025
187 changes: 150 additions & 37 deletions advanced_source/cpp_custom_ops.rst
Original file line number Diff line number Diff line change
Expand Up @@ -62,20 +62,78 @@ Using ``cpp_extension`` is as simple as writing the following ``setup.py``:

setup(name="extension_cpp",
ext_modules=[
cpp_extension.CppExtension("extension_cpp", ["muladd.cpp"])],
cmdclass={'build_ext': cpp_extension.BuildExtension})
cpp_extension.CppExtension(
"extension_cpp",
["muladd.cpp"],
# define Py_LIMITED_API with min version 3.9 to expose only the stable
# limited API subset from Python.h
extra_compile_args={"cxx": ["-DPy_LIMITED_API=0x03090000"]},
py_limited_api=True)], # Build 1 wheel across multiple Python versions
cmdclass={'build_ext': cpp_extension.BuildExtension},
options={"bdist_wheel": {"py_limited_api": "cp39"}} # 3.9 is minimum supported Python version
)

If you need to compile CUDA code (for example, ``.cu`` files), then instead use
`torch.utils.cpp_extension.CUDAExtension <https://pytorch.org/docs/stable/cpp_extension.html#torch.utils.cpp_extension.CUDAExtension>`_.
Please see `extension-cpp <https://github.com/pytorch/extension-cpp>`_ for an
example for how this is set up.

Starting with PyTorch 2.6, you can now build a single wheel for multiple CPython
versions (similar to what you would do for pure python packages). In particular,
if your custom library adheres to the `CPython Stable Limited API
<https://docs.python.org/3/c-api/stable.html>`_ or avoids CPython entirely, you
can build one Python agnostic wheel against a minimum supported CPython version
through setuptools' ``py_limited_api`` flag, like so:
The above example represents what we refer to as a CPython agnostic wheel, meaning we are
building a single wheel that can be run across multiple CPython versions (similar to pure
Python packages). CPython agnosticism is desirable in minimizing the number of wheels your
custom library needs to support and release. The minimum version we'd like to support is
3.9, since it is the oldest supported version currently, so we use the corresponding hexcode
and specifier throughout the setup code. We suggest building the extension in the same
environment as the minimum CPython version you'd like to support to minimize unknown behavior,
so, here, we build the extension in a CPython 3.9 environment. When built, this single wheel
will be runnable in any CPython environment 3.9+. To achieve this, there are three key lines
to note.

The first is the specification of ``Py_LIMITED_API`` in ``extra_compile_args`` to the
minimum CPython version you would like to support:

.. code-block:: python

extra_compile_args={"cxx": ["-DPy_LIMITED_API=0x03090000"]},

Defining the ``Py_LIMITED_API`` flag helps verify that the extension is in fact
only using the `CPython Stable Limited API <https://docs.python.org/3/c-api/stable.html>`_,
which is a requirement for the building a CPython agnostic wheel. If this requirement
is not met, it is possible to build a wheel that looks CPython agnostic but will crash,
or worse, be silently incorrect, in another CPython environment. Take care to avoid
using unstable CPython APIs, for example APIs from libtorch_python (in particular
pytorch/python bindings,) and to only use APIs from libtorch (ATen objects, operators
and the dispatcher). We strongly recommend defining the ``Py_LIMITED_API`` flag to
help ascertain the extension is compliant and safe as a CPython agnostic wheel. Note that
defining this flag is not a full guarantee that the built wheel is CPython agnostic, but
it is better than the wild wild west. There are several caveats mentioned in the
`Python docs <https://docs.python.org/3/c-api/stable.html#limited-api-caveats>`_,
and you should test and verify yourself that the wheel is truly agnostic for the relevant
CPython versions.

The second and third lines specifying ``py_limited_api`` inform setuptools that you intend
to build a CPython agnostic wheel and will influence the naming of the wheel accordingly:

.. code-block:: python

setup(name="extension_cpp",
ext_modules=[
cpp_extension.CppExtension(
...,
janeyx99 marked this conversation as resolved.
Show resolved Hide resolved
py_limited_api=True)], # Build 1 wheel across multiple Python versions
...,
options={"bdist_wheel": {"py_limited_api": "cp39"}} # 3.9 is minimum supported Python version
)

It is necessary to specify ``py_limited_api=True`` as an argument to CppExtension/
CUDAExtension and also as an option to the ``"bdist_wheel"`` command with the minimal
supported CPython version (in this case, 3.9). Consequently, the ``setup`` in our
tutorial would build one properly named wheel that could be installed across multiple
CPython versions ``>=3.9``.

If your extension uses CPython APIs outside the stable limited set, then you cannot
build a CPython agnostic wheel! You should build one wheel per CPython version instead,
like so:

.. code-block:: python

Expand All @@ -86,28 +144,10 @@ through setuptools' ``py_limited_api`` flag, like so:
ext_modules=[
cpp_extension.CppExtension(
"extension_cpp",
["python_agnostic_code.cpp"],
py_limited_api=True)],
["muladd.cpp"])],
cmdclass={'build_ext': cpp_extension.BuildExtension},
options={"bdist_wheel": {"py_limited_api": "cp39"}}
)

Note that you must specify ``py_limited_api=True`` both within ``setup``
and also as an option to the ``"bdist_wheel"`` command with the minimal supported
Python version (in this case, 3.9). This ``setup`` would build one wheel that could
be installed across multiple Python versions ``python>=3.9``. Please see
`torchao <https://github.com/pytorch/ao>`_ for an example.

.. note::

You must verify independently that the built wheel is truly Python agnostic.
Specifying ``py_limited_api`` does not check for any guarantees, so it is possible
to build a wheel that looks Python agnostic but will crash, or worse, be silently
incorrect, in another Python environment. Take care to avoid using unstable CPython
APIs, for example APIs from libtorch_python (in particular pytorch/python bindings,)
and to only use APIs from libtorch (aten objects, operators and the dispatcher).
For example, to give access to custom ops from Python, the library should register
the ops through the dispatcher (covered below!).

Defining the custom op and adding backend implementations
---------------------------------------------------------
Expand Down Expand Up @@ -252,16 +292,89 @@ matters (importing in the wrong order will lead to an error).

To use the custom operator with hybrid Python/C++ registrations, we must
first load the C++ library that holds the custom operator definition
and then call the ``torch.library`` registration APIs. This can happen in one
of two ways:

1. If you're following this tutorial, importing the Python C extension module
we created will load the C++ custom operator definitions.
2. If your C++ custom operator is located in a shared library object, you can
also use ``torch.ops.load_library("/path/to/library.so")`` to load it. This
is the blessed path for Python agnosticism, as you will not have a Python C
extension module to import. See `torchao __init__.py <https://github.com/pytorch/ao/blob/881e84b4398eddcea6fee4d911fc329a38b5cd69/torchao/__init__.py#L26-L28>`_
for an example.
and then call the ``torch.library`` registration APIs. This can happen in
three ways:


1. The first way to load the C++ library that holds the custom operator definition
is to define a dummy Python module for _C. Then, in Python, when you import the
module with ``import _C``, the ``.so`` files corresponding to the extension will
be loaded and the ``TORCH_LIBRARY`` and ``TORCH_LIBRARY_IMPL`` static initializers
will run. One can create a dummy Python module with ``PYBIND11_MODULE`` like below,
but you will notice that this does not compile with ``Py_LIMITED_API``, because
``pybind11`` does not promise to only use the stable limited CPython API! With
the below code, you sadly cannot build a CPython agnostic wheel for your extension!
(Foreshadowing: I wonder what the second way is ;) ).

.. code-block:: cpp

// in, say, not_agnostic/csrc/extension_BAD.cpp
#include <pybind11/pybind11.h>

PYBIND11_MODULE("_C", m) {}

.. code-block:: python

# in, say, extension/__init__.py
from . import _C

2. In this tutorial, because we value being able to build a single wheel across multiple
CPython versions, we will replace the unstable ``PYBIND11`` call with stable API calls.
The below code compiles with ``-DPy_LIMITED_API=0x03090000`` and successfully creates
a dummy Python module for our ``_C`` extension so that it can be imported from Python.
See `extension_cpp/__init__.py <https://github.com/pytorch/extension-cpp/blob/38ec45e/extension_cpp/__init__.py>`_
and `extension_cpp/csrc/muladd.cpp <https://github.com/pytorch/extension-cpp/blob/38ec45e/extension_cpp/csrc/muladd.cpp>`_
for more details:

.. code-block:: cpp

#include <Python.h>

extern "C" {
/* Creates a dummy empty _C module that can be imported from Python.
The import from Python will load the .so consisting of this file
in this extension, so that the TORCH_LIBRARY static initializers
below are run. */
PyObject* PyInit__C(void)
{
static struct PyModuleDef module_def = {
PyModuleDef_HEAD_INIT,
"_C", /* name of module */
NULL, /* module documentation, may be NULL */
-1, /* size of per-interpreter state of the module,
or -1 if the module keeps state in global variables. */
NULL, /* methods */
};
return PyModule_Create(&module_def);
}
}

.. code-block:: python

# in, say, extension/__init__.py
from . import _C

3. If you want to avoid ``Python.h`` entirely in your C++ custom operator, you may
use ``torch.ops.load_library("/path/to/library.so")`` in Python to load the ``.so``
file(s) compiled from the extension. Note that, with this method, there is no ``_C``
Python module created for the extension so you cannot call ``import _C`` from Python.
Instead of relying on the import statement to trigger the custom operators to be
registered, ``torch.ops.load_library("/path/to/library.so")`` will do the trick.
The challenge then is shifted towards understanding where the ``.so`` files are
located so that you can load them, which is not always trivial:

.. code-block:: python

import torch
from pathlib import Path

so_files = list(Path(__file__).parent.glob("_C*.so"))
assert (
len(so_files) == 1
), f"Expected one _C*.so file, found {len(so_files)}"
torch.ops.load_library(so_files[0])

from . import ops


Adding training (autograd) support for an operator
Expand Down