Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Series.diff returns Series['float'] even for nullable integers #1110

Open
MarcoGorelli opened this issue Feb 8, 2025 · 1 comment
Open
Labels
NA - MaskedArrays Related to pd.NA and nullable extension arrays

Comments

@MarcoGorelli
Copy link
Member

Describe the bug
A clear and concise description of what the bug is.

To Reproduce

  1. Provide a minimal runnable pandas example that is not properly checked by the stubs.
  2. Indicate which type checker you are using (mypy or pyright).
  3. Show the error message received from that type checker while checking your example.
import pandas as pd


reveal_type(pd.Series([1,2,3], dtype='Int64').diff())
#  note: Revealed type is "pandas.core.series.Series[builtins.float]"

Please complete the following information:

  • OS: [e.g. Windows, Linux, MacOS] linux
  • OS Version [e.g. 22] 22.04
  • python version 3.12.5
  • version of type checker mypy 1.15
  • version of installed pandas-stubs 2.2.3.241126

Additional context
noticed looking into #1108

@Dr-Irv
Copy link
Collaborator

Dr-Irv commented Feb 10, 2025

This one could take some work to fix, because we don't differentiate between pd.Series([1,2,3], dtype='Int64') and pd.Series([1,2,3], dtype=int) in the stubs. pd.Series([1,2,3], dtype='Int64').diff() remains a series of integers (with pd.NA as the missing value), whereas pd.Series([1,2,3], dtype=int) becomes a Series[float] because np.nan becomes the missing value.

It might be possible to introduce a class hierarchy of base class of AllInt with subclasses BaseInt and NullableInt and then we replace Series[int] with Series[AllInt] everywhere, modify Series.__new__() to return either Series[BaseInt] or Series[NullableInt] based on dtype and then in Series.diff() have overloads to return different results based on whether you had Series[BaseInt] or Series[NullableInt].

This would be a big change - would have to modify a lot of PYI files and tests.

@Dr-Irv Dr-Irv added the NA - MaskedArrays Related to pd.NA and nullable extension arrays label Feb 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NA - MaskedArrays Related to pd.NA and nullable extension arrays
Projects
None yet
Development

No branches or pull requests

2 participants