feat[python]: Add masked array support to numpy interop API #17577

saresend · 2024-07-11T17:25:50Z

What

Aims to close #16398. This adds a masked argument to the PySeries API, which changes the output to create a numpy masked array directly, using the underlying validity buffers of the polars series.

codecov · 2024-07-11T18:07:40Z

Codecov Report

Attention: Patch coverage is 91.48936% with 4 lines in your changes missing coverage. Please review.

Project coverage is 80.45%. Comparing base (1f15e1c) to head (98fd713).
Report is 221 commits behind head on main.

Files	Patch %	Lines
py-polars/src/interop/numpy/to_numpy_series.rs	91.30%	4 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main   #17577      +/-   ##
==========================================
- Coverage   81.00%   80.45%   -0.55%     
==========================================
  Files        1448     1484      +36     
  Lines      190551   195362    +4811     
  Branches     2723     2778      +55     
==========================================
+ Hits       154361   157185    +2824     
- Misses      35687    37665    +1978     
- Partials      503      512       +9

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

s-banach · 2024-07-12T13:01:25Z

The main thing you want to test is that integer and boolean series with nulls are converted to masked arrays in a dtype-preserving way.

saresend · 2024-07-13T18:57:54Z

@s-banach Sure, I'm looking at adding the tests for that. Would you be able to explain the current numeric conversion logic? The boolean conversion seems rather straightforward (if theres an optional it converts to object dtype, bool otherwise). For the numeric types it seems to convert to float, and I just want to confirm that this is to take advantage of the nan representation (rather than making an object type).

saresend · 2024-07-13T18:59:43Z

py-polars/tests/unit/interop/numpy/test_to_numpy_series.py

+def test_optional_int_array_to_masked() -> None:
+    values = [1, 2, 3, 4]
+    s = pl.Series('a', values, pl.UInt8)
+    result = s.to_numpy()
+    print(result.dtype)
+    assert result.dtype == int
+


WIP - will update once numeric conversion semantics are clear to me

feat: add masked type param to numpy interop API

b8bd767

saresend requested review from ritchie46, stinodego, c-peters, alexander-beedie, MarcoGorelli and reswqa as code owners July 11, 2024 17:25

github-actions bot added the title needs formatting label Jul 11, 2024

saresend changed the title ~~feat: add masked type param to numpy interop API~~ feat[python]: Add masked array support to numpy interop API Jul 11, 2024

whitespace and import cruft

98fd713

adding tests to ensure dtypes are preserved in the face of null values

60d5e00

saresend commented Jul 13, 2024

View reviewed changes

ritchie46 force-pushed the main branch from 0a696ff to 9c29683 Compare July 28, 2024 08:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat[python]: Add masked array support to numpy interop API #17577

feat[python]: Add masked array support to numpy interop API #17577

saresend commented Jul 11, 2024

codecov bot commented Jul 11, 2024

s-banach commented Jul 12, 2024

saresend commented Jul 13, 2024

saresend Jul 13, 2024

feat[python]: Add masked array support to numpy interop API #17577

Are you sure you want to change the base?

feat[python]: Add masked array support to numpy interop API #17577

Conversation

saresend commented Jul 11, 2024

What

codecov bot commented Jul 11, 2024

Codecov Report

s-banach commented Jul 12, 2024

saresend commented Jul 13, 2024

saresend Jul 13, 2024

Choose a reason for hiding this comment