Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pyarrow directly #242

Merged
merged 7 commits into from
Jan 26, 2024
Merged

Pyarrow directly #242

merged 7 commits into from
Jan 26, 2024

Conversation

WillAyd
Copy link
Collaborator

@WillAyd WillAyd commented Jan 25, 2024

closes #xxx

@WillAyd WillAyd added this to the 4.0 milestone Jan 25, 2024
@WillAyd
Copy link
Collaborator Author

WillAyd commented Jan 25, 2024

@jorwoods we talked about this previously would love your input.

This makes it so you can write pyarrow tables, and in theory anything that implements the PyArrow Capsule Interface

The writing is easy - we can just check for the __arrow_c_stream__ dunder on an object and work off of that (pandas will have that in their 3.0 release, polars has an open request for it

I am less sure about what to do when reading the resuilt. Right now this still converts everything back to a pandas dataframe, simply owing to the history of this library. Other viable alternatives are:

  • Return a PyArrow table; users can convert to whatever from there
  • Accept a keyword argument like return_type = "pandas|pyarrow|polars"

The return_type argument is convenient but maybe too presumptive about how those libraries handle conversions. The base pyarrow type is the safest but maybe the least convenient for end users.

Curious what you might have in mind

@WillAyd WillAyd merged commit 1bb35c6 into main Jan 26, 2024
4 checks passed
@WillAyd WillAyd deleted the pyarrow-directly branch January 26, 2024 15:59
@jorwoods
Copy link
Collaborator

I think returning the pyarrow table is the least presumptive about what the end user needs. This would allow a little more flexibility in the users grabbing what data type they need. Could add some basic convenience functions for quickly reading it into pandas though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants