Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC, fix: use mypy pre-commit in local environment #1966

Open
wants to merge 48 commits into
base: main
Choose a base branch
from

Conversation

EdAbati
Copy link
Collaborator

@EdAbati EdAbati commented Feb 8, 2025

What type of PR is this? (check all applicable)

  • πŸ’Ύ Refactor
  • ✨ Feature
  • πŸ› Bug Fix
  • πŸ”§ Optimization
  • πŸ“ Documentation
  • βœ… Test
  • 🐳 Other

Related issues

Checklist

  • Code follows style guide (ruff)
  • Tests added
  • Documented the changes

If you have comments or can explain your changes, please do so below

I did some digging, looking for how other projects run typing and general best practices with mypy.
One thing people seem to agree with is that mirrors-mypy has its gotchas and may not be ideal for every project.
In our case we would need to add every backend dependencies as additional-dependencies. In this case we would have to maintain a script that keep those dependencies up to date.

One alternative would be to run mypy in the local environment where all the dependencies are already installed. This is what this PR is trying to do.

I have still some open questions, making this draft to test the CI too.
~I am expecting to see 300 errors from mypy πŸ‘€

Update: I propose to move to this solution where we:

  • keep using pre-commit for linting but move mypy to run in the local env -> this is to avoid having to add many additional-dependencies
  • Add Makefile to add 1 consistent command to run typing checks: make typing
  • Run mypy in a separate CI pipeline on the entire code

@MarcoGorelli
Copy link
Member

i like the look of this, thanks for doing it

it looks like this fails in pre-commit-ci, so we may need to find another way of running it in ci? (i'm not averse to removing pre-commit-ci)

@EdAbati
Copy link
Collaborator Author

EdAbati commented Feb 8, 2025

Thanks @MarcoGorelli ! ❀️

This could be a possible approach that doesn't change super much what we currently have:

  • we add mypy in the dev dependencies
  • mypy runs in pre-commit locally
  • we use pyproject.toml to configure mypy's behaviour
  • we only remove mypy from the pre-commit.ci (since the local env with dependencies will not be created there)
  • we add a new GH action for typing
  • the Makefile is to create a command that works the same way in Linux/macOS and Windows (make typing)

The errors are now showing up in the CI!
The dependencies we install in the Typing CI will influence which backends we will check for typing error. (just for this example I only added core)

Happy to hear your thoughts (also @FBruzzesi and @dangotbanned )

@MarcoGorelli
Copy link
Member

nice, happy to go with this - if we can get it green and then gradually address things, happy to get this in

@dangotbanned

This comment was marked as outdated.

@dangotbanned
Copy link
Member

#1966 (comment)

@MarcoGorelli sounds reasonable

Maybe this is a dumb question, but what is the issue with running mypy without pre-commit?
To me it seems the simplest way to get the same errors we'd see locally

@MarcoGorelli
Copy link
Member

that's fine as well. seeing as we're using pre-commit for other checks, i don't it hurt to have a make typing so that the workflow for anyone using pre-commit isn't affected

@EdAbati
Copy link
Collaborator Author

EdAbati commented Feb 9, 2025

Seriously though, what is the goal here?

My goal here was trying to find a better and more consistent way to run typing tests without disrupting too much the current workflow.
Here we keep mypy in pre-commit and we run mypy on all the relevant code in CI (or manually using make) without pre-commit

Actually if we removed pyarrow-stubs for a bit we'd get waaaay less errors πŸ‘€ b301356

Here I meant that it seems that most of typing errors (we are not already ignoring) come from the pyarrow related part of the code. Of course not suggesting to remove this stubs forever πŸ˜…
My plan/suggestion is to make the CI green with at least our core dependencies installed and then start reintroducing stubs and removing exclusions in follow-up PRs

Still testing a couple of things, hope to have this PR properly ready for review later today 🀞

@@ -33,7 +33,7 @@
from narwhals.utils import Version


class DuckDBExpr(CompliantExpr["duckdb.Expression"]):
class DuckDBExpr(CompliantExpr["duckdb.Expression"]): # type: ignore[type-var]
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This ignore should actually be solved. duckdb.Expression is not a CompatibleSeries

Is it ok for a follow-up?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup definitely

@@ -159,7 +159,7 @@ def select(
):
return self._from_native_frame(
self._native_frame.aggregate(
[val.alias(col) for col, val in new_columns_map.items()]
[val.alias(col) for col, val in new_columns_map.items()] # type: ignore[arg-type]
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

based on the duckdb stubs, only str are allowed in .aggregate. πŸ€”

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks wrong, it should be str | list[Expression], right? want to make a PR to duckdb?

Comment on lines +55 to +56
"narwhals[tests]",
"narwhals[typing]",
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should separate the dev dependencies into tests and typing.

The latest pinned mypy is available for python >=3.9 and if we simply include it in dev our python 3.8 tests will fail.

@EdAbati EdAbati marked this pull request as ready for review February 9, 2025 21:36
@@ -58,7 +58,7 @@
from narwhals.typing import CompliantDataFrame
from narwhals.typing import CompliantLazyFrame

CLASSICAL_NUMPY_DTYPES = frozenset(
CLASSICAL_NUMPY_DTYPES: frozenset[np.dtype] = frozenset(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might be able to get away with this?

Possibly an issue for 3.8 though

Suggested change
CLASSICAL_NUMPY_DTYPES: frozenset[np.dtype] = frozenset(
CLASSICAL_NUMPY_DTYPES = frozenset["np.dtype"](

@dangotbanned

This comment was marked as resolved.

@@ -60,23 +60,23 @@ def second(self: Self) -> DuckDBExpr:
def millisecond(self: Self) -> DuckDBExpr:
return self._compliant_expr._from_call(
lambda _input: FunctionExpression("millisecond", _input)
- FunctionExpression("second", _input) * 1_000,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've got a feeling they want to nudge us towards using * ConstantExpression(1000), then mypy accepts that

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that was what I went for in (bf9e0d9)

Copy link
Member

@dangotbanned dangotbanned Feb 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MarcoGorelli to clarify, that was only on _duckdb.expr (not this module _duckdb.expr_dt)

Comment on lines +447 to +451
return (
FunctionExpression("var_pop", _input)
* n_samples
/ (n_samples - ConstantExpression(ddof))
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was just fixing a pyright warning here.

If anyone has a meaningful name for the parts of this expression, I think it'd be more readable with an assignment first

Copy link
Member

@dangotbanned dangotbanned Feb 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like:

n_samples = n_samples = FunctionExpression("count", _input)
meaningful = (n_samples - ConstantExpression(ddof)) # don't use this name!
return FunctionExpression("var_pop", _input) * n_samples / meaningful

@dangotbanned
Copy link
Member

dangotbanned commented Feb 10, 2025

The last warning seems legit to me https://github.com/narwhals-dev/narwhals/actions/runs/13240255809/job/36953842381?pr=1966

Both uses of bin_count pass None as arguments to functions that don't accept None:

image

elif self._native_series.count() < 1:
if bins is not None:
data = {
"breakpoint": bins[1:],
"count": zeros(shape=len(bins) - 1),
}
else:
data = {
"breakpoint": linspace(0, 1, bin_count),
"count": zeros(shape=bin_count),
}

I don't see any tests covering Series.hist(..., bin_count=None):
https://github.com/narwhals-dev/narwhals/blob/4b1d778beeab20c3b5565dba629e320eb23804c8/tests/series_only/hist_test.py

The one mypy caught looks like it would cause an error at runtime:
https://github.com/numpy/numpy/blob/06f987b7b77453aae7fa11de15f728567fccd407/numpy/_core/function_base.py#L121-L125

@camriddell just wanted to run this by you following #1859 - feel like I must be missing something

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CI / typing: migrate away from pre-commit?
3 participants