-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Arrow PyCapsule Interface support #12530
Comments
Looking through the codebase, it seems there is some basic work that needs to be done to make the Arrow interoperability more generic. Right now the import implementation seems to rely on PyArrow-specific APIs: polars/py-polars/polars/utils/_construction.py Lines 1472 to 1555 in e461ffc
|
Sorry for the delay. Somehow I missed this. I think this sounds great. Being agnostic to arrow consumer without hard pyarrow dependency sound good. Does your offer still stand on this? |
Yes, I’ve started work on this locally but got distracted. I’ll try to get back to it soon :) |
Related to #14208 |
I'm still working on the Python part, but ChunkedArray import/export to ArrowArrayStream in C++ just merged, which should make this more useful when applied to a Series: apache/arrow#39455 . |
FYI, I tried to implement ArrayStream import functionality in r-polars, but found a considerable speed reduction compared to the previous implementation (copied from py-polars), so I reverted (pola-rs/r-polars#1078 (comment)). |
I wonder if using the |
@wjones127 curious if this is still something you're working on? |
I haven't had time to finish this, no. I may return to this later this year, if someone else hasn't gotten to it. |
I started a PR for data export in #17676 |
And a PR for DataFrame import via the C Stream in #17693 |
As mentioned in the Narwhals PR, and in the original post
I think this is still missing in |
Supporting the PyCapsule Interface via a top-level A struct Series with two float fields, |
Description
In the Arrow project, we recently created a new protocol for sharing Arrow data in Python. One of the goals of the protocol is allow exporting / importing Arrow data in Python without having to necessarily use PyArrow as an intermediary. For example, DuckDB can read from Polars DataFrames and LazyFrames, but only if PyArrow is installed. One this protocol is implemented, it would be possible to accomplish that integration without PyArrow.
This allows Arrow-exportable objects to be recognized based on the presence of one of several dunder methods.
Polars could implement this in two ways:
DataFrame
,Series
,DataType
polars.from_arrow
polars.DataFrame
constructorpd.DataFrame
, so it would make logical sense to support reading rectangular-shaped Arrow data.I'd be happy to contribute this to the repo, if these ideas sound good.
The text was updated successfully, but these errors were encountered: