-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add from_arrow (which uses the PyCapsule Interface) #1181
feat: add from_arrow (which uses the PyCapsule Interface) #1181
Conversation
fdcd234
to
f5da0cb
Compare
Thanks @MarcoGorelli, this looks great! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
narwhals/functions.py
Outdated
@@ -406,6 +411,100 @@ def _from_dict_impl( | |||
return from_native(native_frame, eager_only=True) | |||
|
|||
|
|||
def from_pycapsule( | |||
supports_py_capsule: SupportsPyCapsule, *, native_namespace: ModuleType |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was wondering if we could rename this to somehting like py_capsule_obj
or arrow_py_capsule_obj
or similar
At first glance I thought this was a bool (True if something supports pycapsule) π
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
π€ maybe this should just be native_frame
?
narwhals/functions.py
Outdated
class SupportsPyCapsule(Protocol): | ||
def __arrow_c_stream__(self) -> Any: ... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't the correct type hint: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html#protocol-typehints
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also personally stick with the upstream naming suggestion and call this ArrowStreamExportable
https://github.com/kylebarron/arro3/blob/45be4a12dd62cee025c5d0ecf8c8c081e13643ea/arro3-core/python/arro3/core/types.py#L52-L75
narwhals/functions.py
Outdated
@@ -406,6 +411,100 @@ def _from_dict_impl( | |||
return from_native(native_frame, eager_only=True) | |||
|
|||
|
|||
def from_pycapsule( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally I'd call this from_arrow
and under the hood use the PyCapsule Interface. I think the PyCapsule itself is an implementation detail, and not something the user should need to know about.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
based on pola-rs/polars#12530 (comment), i think this risks being confused with Polars' from_arrow
, which does something different
i'd be more inclined to use from_pycapsule
then
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you only have top-level functions? Do you export a DataFrame
-like class? Why not put it on that class?
But there is a difference between yours and the polars function because your function seems to only convert to a DataFrame
while the polars function tries to decide whether it should return a Series
or DataFrame
.
from_pycapsule
still doesn't really resolve the ambiguity, but you could change it to from_arrow_dataframe
or from_arrow_df
or from_arrow_table
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks! yeah maybe class constructors are the way to go for this one
EDIT: my hesitation with class constructors is that then we'd be adding something which is outside the Polars API (which we generally try to adhere to where possible)
I think this is probably OK for now, but we can always revisit later
thanks @EdAbati and @kylebarron for reviews! π any further comments? |
gonna ship this so we can make a release, if there's issues we can always address as a follow-up (it's a new feature so it doesn't affect existing users) - thanks all for comments! |
closes #1158
What type of PR is this? (check all applicable)
Related issues
Checklist
If you have comments or can explain your changes, please do so below.