Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Series.squeeze return type should be a Series or Scalar #1103

Open
Kornel opened this issue Jan 31, 2025 · 1 comment
Open

Series.squeeze return type should be a Series or Scalar #1103

Kornel opened this issue Jan 31, 2025 · 1 comment
Labels
API - Consistency Internal Consistency of API/Behavior

Comments

@Kornel
Copy link

Kornel commented Jan 31, 2025

Describe the bug

The return type for DataFrame/Series.squeeze is currently specified as Scalar, which is incomplete. According to the official documentation (pandas.DataFrame.squeeze and pandas.Series.squeeze), the correct return type should be DataFrame, Series, or scalar.

Pandas documentation:

Admittedly how a series squeeze returns a DataFrame needs further investigation, but Series is definitely possible ;)

mypy error:

error: Incompatible return value type (got "str | bytes | date | timedelta | datetime64[date | int | None] | timedelta64[timedelta | int | None] | int | float | complex", expected "Series[Any]")  [return-value]
Found 1 error in 1 file (checked 1 source file)

To Reproduce

Example test.py:

import pandas as pd

def foo() -> pd.Series:
    df = pd.DataFrame({
        "A": [1, 2, 3]
    })

    return df["A"].squeeze()


print(
    foo()
)

Run it, just to see the expected output (a series):

(.venv) ~/test-case $ python test.py
0    1
1    2
2    3
Name: A, dtype: int64

mypy expects a scalar:

(.venv) ~/test-case $ mypy test.py
test.py:8: error: Incompatible return value type (got "str | bytes | date | timedelta | datetime64[date | int | None] | timedelta64[timedelta | int | None] | int | float | complex", expected "Series[Any]")  [return-value]
Found 1 error in 1 file (checked 1 source file)

versions:

(.venv) ~/test-case $ mypy --version
mypy 1.14.1 (compiled: yes)
(.venv) ~/test-case $ pip freeze
mypy==1.14.1
mypy-extensions==1.0.0
numpy==2.2.2
pandas==2.2.3
pandas-stubs==2.2.3.241126
python-dateutil==2.9.0.post0
pytz==2025.1
six==1.17.0
types-pytz==2024.2.0.20241221
typing_extensions==4.12.2
tzdata==2025.1
(.venv) ~/test-case $ python --version
Python 3.11.10

Please complete the following information:

  • OS: MacOS
  • OS Version 15.2
  • python version 3.11
  • version of type 1.14.1
  • version of installed pandas-stubs 2.2.3.241126
@Dr-Irv
Copy link
Collaborator

Dr-Irv commented Jan 31, 2025

Thanks for the report.

It appears that DataFrame.squeeze() can return a DataFrame, Series or scalar:

>>> df =pd.DataFrame({"A":[1,2,3], "B": [4,5,6]})
>>> df.squeeze()
   A  B
0  1  4
1  2  5
2  3  6
>>> df[["A"]].squeeze()
0    1
1    2
2    3
Name: A, dtype: int64
>>> ddf = pd.DataFrame({"a":[1]})
>>> ddf.squeeze()
np.int64(1)

And Series.squeeze() can return a Series or scalar:

>>> df["A"].squeeze()
0    1
1    2
2    3
Name: A, dtype: int64
>>> ddf["a"].squeeze()
np.int64(1)

So what needs to happen is the following:

  • Remove squeeze() from core/generic.pyi
  • Fix squeeze() in both core/frame.pyi and core/series.pyi to indicate that any of these results are possible.
  • Write appropriate tests.

PR with tests welcome.

@Dr-Irv Dr-Irv added the API - Consistency Internal Consistency of API/Behavior label Jan 31, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API - Consistency Internal Consistency of API/Behavior
Projects
None yet
Development

No branches or pull requests

2 participants