Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PanicException: skip_nulls does not work in map_elements when used on pl.struct #15322

Closed
2 tasks done
lmocsi opened this issue Mar 27, 2024 · 5 comments
Closed
2 tasks done
Labels
bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars

Comments

@lmocsi
Copy link

lmocsi commented Mar 27, 2024

Checks

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of Polars.

Reproducible example

dff = pl.DataFrame({'a': ['apple',None], 'b': ['one','two']})
# dff.with_columns(pl.col('a').map_elements(lambda x: x[0:2].upper()).alias('c')
#                  )
dff.with_columns(pl.struct(['a','b']).map_elements(lambda x: x['a']+'-'+x['b'][0:2]).alias('d')
                 )

Log output

thread '<unnamed>' panicked at py-polars/src/map/series.rs:213:19:
python function failed TypeError: unsupported operand type(s) for +: 'NoneType' and 'str'
--- PyO3 is resuming a panic after fetching a PanicException from Python. ---
Python stack trace below:

---------------------------------------------------------------------------
PanicException                            Traceback (most recent call last)
/opt/conda/envs/Python-3.9-Premium/lib/python3.9/site-packages/polars/expr/expr.py in __call__(self, *args, **kwargs)
   4046 
   4047         def __call__(self, *args: Any, **kwargs: Any) -> Any:
-> 4048             result = self.function(*args, **kwargs)
   4049             if _check_for_numpy(result) and isinstance(result, np.ndarray):
   4050                 result = pl.Series(result, dtype=self.return_dtype)

/opt/conda/envs/Python-3.9-Premium/lib/python3.9/site-packages/polars/expr/expr.py in wrap_f(x)
   4381                 with warnings.catch_warnings():
   4382                     warnings.simplefilter("ignore", PolarsInefficientMapWarning)
-> 4383                     return x.map_elements(
   4384                         function, return_dtype=return_dtype, skip_nulls=skip_nulls
   4385                     )

/opt/conda/envs/Python-3.9-Premium/lib/python3.9/site-packages/polars/series/series.py in map_elements(self, function, return_dtype, skip_nulls)
   5294         warn_on_inefficient_map(function, columns=[self.name], map_target="series")
   5295         return self._from_pyseries(
-> 5296             self._s.apply_lambda(function, pl_return_dtype, skip_nulls)
   5297         )
   5298 

PanicException: python function failed TypeError: unsupported operand type(s) for +: 'NoneType' and 'str'

---------------------------------------------------------------------------
PanicException                            Traceback (most recent call last)
/tmp/1000780000/ipykernel_16072/2312718823.py in <module>
      2 # dff.with_columns(pl.col('a').map_elements(lambda x: x[0:2].upper()).alias('c')
      3 #                  )
----> 4 dff.with_columns(pl.struct(['a','b']).map_elements(lambda x: x['a']+'-'+x['b'][0:2]).alias('d')
      5                  )

/opt/conda/envs/Python-3.9-Premium/lib/python3.9/site-packages/polars/dataframe/frame.py in with_columns(self, *exprs, **named_exprs)
   8364         └─────┴──────┴─────────────┘
   8365         """
-> 8366         return self.lazy().with_columns(*exprs, **named_exprs).collect(_eager=True)
   8367 
   8368     def with_columns_seq(

/opt/conda/envs/Python-3.9-Premium/lib/python3.9/site-packages/polars/lazyframe/frame.py in collect(self, type_coercion, predicate_pushdown, projection_pushdown, simplify_expression, slice_pushdown, comm_subplan_elim, comm_subexpr_elim, no_optimization, streaming, background, _eager)
   1941             return InProcessQuery(ldf.collect_concurrently())
   1942 
-> 1943         return wrap_df(ldf.collect())
   1944 
   1945     @overload

PanicException: python function failed TypeError: unsupported operand type(s) for +: 'NoneType' and 'str'

Issue description

The default setting of skip_nulls = True works fine if map_elements() is used on a column. But if it is used on s pl.struct, then it lets through nulls.

Expected behavior

Should be able to filter records passed to the map_elements function if any input parameter contains null, like:
skip_nulls = Any

Installed versions

--------Version info---------
Polars:               0.20.16
Index type:           UInt32
Platform:             Linux-4.18.0-372.76.1.el8_6.x86_64-x86_64-with-glibc2.28
Python:               3.9.13 (main, Oct 13 2022, 21:15:33) 
[GCC 11.2.0]

----Optional dependencies----
adbc_driver_manager:  <not installed>
cloudpickle:          2.0.0
connectorx:           <not installed>
deltalake:            <not installed>
fastexcel:            <not installed>
fsspec:               2022.02.0
gevent:               <not installed>
hvplot:               <not installed>
matplotlib:           3.8.2
numpy:                1.23.5
openpyxl:             3.0.9
pandas:               2.2.0
pyarrow:              15.0.0
pydantic:             <not installed>
pyiceberg:            <not installed>
pyxlsb:               <not installed>
sqlalchemy:           1.4.27
xlsx2csv:             <not installed>
xlsxwriter:           3.1.9
@lmocsi lmocsi added bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars labels Mar 27, 2024
@cmdlineluser
Copy link
Contributor

Some relevant info: #10102 (comment)

I had to add a check to the UDF

def udf(x):
    if None in x.values(): return
    ...
    ...

@lukemanley
Copy link
Contributor

The no longer panics on main and produces what appears to be correct output:

┌───────┬─────┬──────────┐
│ a     ┆ b   ┆ d        │
│ ---   ┆ --- ┆ ---      │
│ str   ┆ str ┆ str      │
╞═══════╪═════╪══════════╡
│ apple ┆ one ┆ apple-on │
│ null  ┆ two ┆ null     │
└───────┴─────┴──────────┘

@ritchie46
Copy link
Member

Thanks @lukemanley. Will close it.

@dhimmel
Copy link

dhimmel commented Jan 4, 2025

The no longer panics on main and produces what appears to be correct output:

@ritchie46 I am thinking this bug still persists in some way. On just released 1.19.0:

(
    pl.DataFrame(
        {"my_struct": [None, {"field_a": 1}, {"field_a": 2}]}
    )
    .with_columns(
        pl.col("my_struct").map_elements(
            lambda x: print(x),
            skip_nulls=True,
        )
    )
)

Prints the following:

None
{'field_a': 1}
{'field_a': 2}

Hence, the function is getting called on the null value, despite skip_nulls=True. And I don't think I can use when-then to circumvent these calls on null structs.

dhimmel added a commit to dhimmel/openskistats that referenced this issue Jan 4, 2025
@cmdlineluser
Copy link
Contributor

@dhimmel A PR has been merged on main.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars
Projects
None yet
Development

No branches or pull requests

5 participants