Implement Arrow PyCapsule Interface & make pyarrow optional dependency #268

kylebarron · 2024-07-22T17:37:16Z

The Arrow project recently created a new protocol for sharing Arrow data in Python. One of the goals of the protocol is allow exporting / importing Arrow data in Python without having to necessarily use PyArrow as an intermediary.

This allows Arrow-exportable objects to be recognized based on the presence of one of several dunder methods.

A growing number of Python-Arrow libraries are aware of the PyCapsule interface, and then would be able to read from fastexcel directly, without needing to go through pyarrow or even have it installed in the environment.

For example, I have a PR open for polars in pola-rs/polars#17693, but you could also pass the fastexcel object directly into constructors from pyarrow, nanoarrow, arro3. I'm advocating for more projects to adopt the PyCapsule interface directly, including duckdb, datafusion, vegafusion, and daft.

In terms of implementation, currently fastexcel uses arrow-rs' default pyarrow integration. Instead you need to define one or more dunder methods, probably on the ExcelSheet. If you always return a RecordBatch, then you could implement __arrow_c_array__, but if you ever wanted to expose a lazy stream, you could implement __arrow_c_stream__, which would export multiple batches of data.

I have a helper library, pyo3-arrow, that you can use to implement this, separate from arrow-rs for a few reasons. Or the relevant code is pretty small and self contained to vendor if you don't want to add an external dependency.

The text was updated successfully, but these errors were encountered:

lukapeschke · 2024-07-23T08:41:37Z

Thanks for the heads-up, I'll try to look into this when I have the time 🙂

lukapeschke added 🦀 rust 🦀 Pull requests that edit Rust code feature request labels Jul 23, 2024

kylebarron mentioned this issue Jul 23, 2024

[Python] Promote usage of the Arrow PyCapsule Protocol (for the C Data Inteface) apache/arrow#39195

Open

8 tasks

PrettyWood added this to the v0.13.0 milestone Oct 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement Arrow PyCapsule Interface & make pyarrow optional dependency #268

Implement Arrow PyCapsule Interface & make pyarrow optional dependency #268

kylebarron commented Jul 22, 2024 •

edited

Loading

lukapeschke commented Jul 23, 2024

Implement Arrow PyCapsule Interface & make pyarrow optional dependency #268

Implement Arrow PyCapsule Interface & make pyarrow optional dependency #268

Comments

kylebarron commented Jul 22, 2024 • edited Loading

lukapeschke commented Jul 23, 2024

kylebarron commented Jul 22, 2024 •

edited

Loading