Skip to content

Commit

Permalink
Added documentation
Browse files Browse the repository at this point in the history
Installing pyarrow in ci workflow
Split Table, RecordBatch and Field/Schema bindings into separate headers
  • Loading branch information
maximiliank committed Jan 17, 2024
1 parent 22c5eba commit d32a07a
Show file tree
Hide file tree
Showing 8 changed files with 145 additions and 20 deletions.
5 changes: 5 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,11 @@ jobs:
run: |
python -m pip install numpy scipy
- name: Install pyarrow
if: ${{ !startsWith(matrix.python, 'pypy') && !contains(matrix.python, 'alpha') }}
run: |
python -m pip install pyarrow
- name: Configure
run: >
cmake -S . -B build -DNB_TEST_STABLE_ABI=ON -DNB_TEST_SHARED_BUILD="$(python3 -c 'import sys; print(int(sys.version_info.minor>=11))')"
Expand Down
4 changes: 3 additions & 1 deletion cmake/nanobind-config.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -159,10 +159,12 @@ function (nanobind_build_library TARGET_NAME)
${NB_DIR}/include/nanobind/pyarrow/chunked_array.h
${NB_DIR}/include/nanobind/pyarrow/datatype.h
${NB_DIR}/include/nanobind/pyarrow/pyarrow_import.h
${NB_DIR}/include/nanobind/pyarrow/record_batch.h
${NB_DIR}/include/nanobind/pyarrow/scalar.h
${NB_DIR}/include/nanobind/pyarrow/sparse_tensor.h
${NB_DIR}/include/nanobind/pyarrow/tabular.h
${NB_DIR}/include/nanobind/pyarrow/table.h
${NB_DIR}/include/nanobind/pyarrow/tensor.h
${NB_DIR}/include/nanobind/pyarrow/type.h
${NB_DIR}/src/buffer.h
${NB_DIR}/src/hash.h
${NB_DIR}/src/nb_internals.h
Expand Down
1 change: 1 addition & 0 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -122,6 +122,7 @@ The nanobind logo was designed by `AndoTwin Studio
classes
exceptions
ndarray_index
pyarrow
packaging
utilities

Expand Down
76 changes: 76 additions & 0 deletions docs/pyarrow.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
.. _pyarrow:

PyArrow Bindings
================

nanobind can exchange ``pyarrow`` objects via a ``std::shared_ptr<..>``. To get started you have to

.. code-block:: cpp
#include <nanobind/pyarrow/pyarrow_import.h>
and make sure to call the following `pyarrow initialization <https://arrow.apache.org/docs/python/integration/extending.html#_CPPv4N5arrow14import_pyarrowEv>`__ on top of your module definition

.. code-block:: cpp
NB_MODULE(test_pyarrow_ext, m) {
static nanobind::detail::pyarrow::ImportPyarrow module;
// ...
}
The type caster headers are structured in a similar form than the headers in ``pyarrow`` (``array_primitive.h``, ``array_binary.h``, etc) itself:

.. list-table::
:widths: 42 48
:header-rows: 1

* - Types
- Type caster header
* - ``Array``, ``DoubleArray``, ``Int64Array``, ...
- ``#include <nanobind/pyarrow/array_primitive.h>``
* - ``BinaryArray``, ``LargeBinaryArray``, ``StringArray``, ``LargeStringArray``, ``FixedSizeBinaryArray``
- ``#include <nanobind/pyarrow/array_binary.h>``
* - ``ListArray``, ``LargeListArray``, ``MapArray``, ``FixedSizeListArray``, ``StructArray``, ``UnionArray``, ``SparseUnionArray``, ``DenseUnionArray``
- ``#include <nanobind/pyarrow/array_nested.h>``
* - ``ChunkedArray``
- ``#include <nanobind/pyarrow/chunked_array.h>``
* - ``Table``
- ``#include <nanobind/pyarrow/table.h>``
* - ``RecordBatch``
- ``#include <nanobind/pyarrow/record_batch.h>``
* - ``Field``, ``Schema``
- ``#include <nanobind/pyarrow/type.h>``
* - ``Scalars``
- ``#include <nanobind/pyarrow/scalar.h>``
* - ``DataTypes``
- ``#include <nanobind/pyarrow/datatype.h>``
* - ``Buffer``, ``ResizableBuffer``, ``MutableBuffer``
- ``#include <nanobind/pyarrow/buffer.h>``
* - ``Tensor``, ``NumericTensor<..>``
- ``#include <nanobind/pyarrow/tensor.h>``
* - ``SparseCOOTensor``, ``SparseCSCMatrix``, ``SparseCSFTensor``, ``SparseCSRMatrix``
- ``#include <nanobind/pyarrow/sparse_tensor.h>``

**Example**: The following code snippet shows how to create bindings for a ``pyarrow.DoubleArray``:

.. code-block:: cpp
#include <memory>
#include <nanobind/nanobind.h>
#include <nanobind/pyarrow/pyarrow_import.h>
#include <nanobind/pyarrow/array_primitive.h>
namespace nb = nanobind;
NB_MODULE(test_pyarrow_ext, m) {
static nb::detail::pyarrow::ImportPyarrow module;
m.def("my_pyarrow_function", [](std::shared_ptr<arrow::DoubleArray> arr) {
auto data = arr->data()->Copy();
return std::make_shared<arrow::DoubleArray>(std::move(data));
}
);
}
If you want to consume the ``C++`` artifacts as distributed by the ``PyPi`` ``pyarrow`` package in your own ``CMake``
project, please have a look at `FindPyArrow.cmake <https://github.com/wjakob/nanobind/cmake/FindPyArrow.cmake>`__.
28 changes: 28 additions & 0 deletions include/nanobind/pyarrow/record_batch.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
/*
nanobind/pyarrow/record_batch.h: conversion between arrow and pyarrow
Copyright (c) 2024 Maximilian Kleinert <[email protected]> and
Wenzel Jakob <[email protected]>
All rights reserved. Use of this source code is governed by a
BSD-style license that can be found in the LICENSE file.
*/
#pragma once

#include <nanobind/nanobind.h>
#include <memory>
#include <nanobind/pyarrow/detail/caster.h>
#include <arrow/record_batch.h>

NAMESPACE_BEGIN(NB_NAMESPACE)
NAMESPACE_BEGIN(detail)

template<>
struct pyarrow::pyarrow_caster_name_trait<arrow::RecordBatch> {
static constexpr auto Name = const_name("RecordBatch");
};
template<>
struct type_caster<std::shared_ptr<arrow::RecordBatch>> : pyarrow::pyarrow_caster<arrow::RecordBatch, arrow::py::is_batch, arrow::py::wrap_batch, arrow::py::unwrap_batch> {};

NAMESPACE_END(detail)
NAMESPACE_END(NB_NAMESPACE)
28 changes: 28 additions & 0 deletions include/nanobind/pyarrow/table.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
/*
nanobind/pyarrow/table.h: conversion between arrow and pyarrow
Copyright (c) 2024 Maximilian Kleinert <[email protected]> and
Wenzel Jakob <[email protected]>
All rights reserved. Use of this source code is governed by a
BSD-style license that can be found in the LICENSE file.
*/
#pragma once

#include <nanobind/nanobind.h>
#include <memory>
#include <nanobind/pyarrow/detail/caster.h>
#include <arrow/table.h>

NAMESPACE_BEGIN(NB_NAMESPACE)
NAMESPACE_BEGIN(detail)

template<>
struct pyarrow::pyarrow_caster_name_trait<arrow::Table> {
static constexpr auto Name = const_name("Table");
};
template<>
struct type_caster<std::shared_ptr<arrow::Table>> : pyarrow::pyarrow_caster<arrow::Table, arrow::py::is_table, arrow::py::wrap_table, arrow::py::unwrap_table> {};

NAMESPACE_END(detail)
NAMESPACE_END(NB_NAMESPACE)
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
nanobind/pyarrow/tabular.h: conversion between arrow and pyarrow
nanobind/pyarrow/type.h: conversion between arrow and pyarrow
Copyright (c) 2024 Maximilian Kleinert <[email protected]> and
Wenzel Jakob <[email protected]>
Expand All @@ -12,28 +12,11 @@
#include <nanobind/nanobind.h>
#include <memory>
#include <nanobind/pyarrow/detail/caster.h>
#include <arrow/record_batch.h>
#include <arrow/table.h>
#include <arrow/type.h>

NAMESPACE_BEGIN(NB_NAMESPACE)
NAMESPACE_BEGIN(detail)

template<>
struct pyarrow::pyarrow_caster_name_trait<arrow::Table> {
static constexpr auto Name = const_name("Table");
};
template<>
struct type_caster<std::shared_ptr<arrow::Table>> : pyarrow::pyarrow_caster<arrow::Table, arrow::py::is_table, arrow::py::wrap_table, arrow::py::unwrap_table> {};

template<>
struct pyarrow::pyarrow_caster_name_trait<arrow::RecordBatch> {
static constexpr auto Name = const_name("RecordBatch");
};
template<>
struct type_caster<std::shared_ptr<arrow::RecordBatch>> : pyarrow::pyarrow_caster<arrow::RecordBatch, arrow::py::is_batch, arrow::py::wrap_batch, arrow::py::unwrap_batch> {};


template<>
struct pyarrow::pyarrow_caster_name_trait<arrow::Schema> {
static constexpr auto Name = const_name("Schema");
Expand Down
4 changes: 3 additions & 1 deletion tests/test_pyarrow.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,9 @@
#include <nanobind/pyarrow/array_binary.h>
#include <nanobind/pyarrow/array_nested.h>
#include <nanobind/pyarrow/chunked_array.h>
#include <nanobind/pyarrow/tabular.h>
#include <nanobind/pyarrow/table.h>
#include <nanobind/pyarrow/record_batch.h>
#include <nanobind/pyarrow/type.h>
#include <nanobind/pyarrow/scalar.h>
#include <nanobind/pyarrow/datatype.h>
#include <nanobind/pyarrow/buffer.h>
Expand Down

0 comments on commit d32a07a

Please sign in to comment.