Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat[python]: Add masked array support to numpy interop API #17577

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

saresend
Copy link

What

Aims to close #16398. This adds a masked argument to the PySeries API, which changes the output to create a numpy masked array directly, using the underlying validity buffers of the polars series.

@saresend saresend changed the title feat: add masked type param to numpy interop API feat[python]: Add masked array support to numpy interop API Jul 11, 2024
Copy link

codecov bot commented Jul 11, 2024

Codecov Report

Attention: Patch coverage is 91.48936% with 4 lines in your changes missing coverage. Please review.

Project coverage is 80.45%. Comparing base (1f15e1c) to head (98fd713).
Report is 221 commits behind head on main.

Files Patch % Lines
py-polars/src/interop/numpy/to_numpy_series.rs 91.30% 4 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #17577      +/-   ##
==========================================
- Coverage   81.00%   80.45%   -0.55%     
==========================================
  Files        1448     1484      +36     
  Lines      190551   195362    +4811     
  Branches     2723     2778      +55     
==========================================
+ Hits       154361   157185    +2824     
- Misses      35687    37665    +1978     
- Partials      503      512       +9     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@s-banach
Copy link
Contributor

The main thing you want to test is that integer and boolean series with nulls are converted to masked arrays in a dtype-preserving way.

@saresend
Copy link
Author

@s-banach Sure, I'm looking at adding the tests for that. Would you be able to explain the current numeric conversion logic? The boolean conversion seems rather straightforward (if theres an optional it converts to object dtype, bool otherwise). For the numeric types it seems to convert to float, and I just want to confirm that this is to take advantage of the nan representation (rather than making an object type).

Comment on lines +481 to +487
def test_optional_int_array_to_masked() -> None:
values = [1, 2, 3, 4]
s = pl.Series('a', values, pl.UInt8)
result = s.to_numpy()
print(result.dtype)
assert result.dtype == int

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WIP - will update once numeric conversion semantics are clear to me

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support converting to NumPy masked arrays
2 participants