Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create an Argument class and allow convertion of optdict into them #5584

Merged
merged 7 commits into from
Mar 29, 2022

Conversation

DanielNoord
Copy link
Collaborator

@DanielNoord DanielNoord commented Dec 21, 2021

  • Add yourself to CONTRIBUTORS if you are a new contributor.
  • Write a good description on what the PR does.

Type of Changes

Type
✨ New feature
🔨 Refactoring

Description

This is my first proposed step towards #5392.

We could build on this in a separate branch, as I intend not to make this interfere too much with any current code. That should keep merge conflicts low.

All of this is based on the documentation of argparse found here:
https://docs.python.org/3/library/argparse.html

The idea of this PR is to showcase how we can transform our current optdicts into a Argument class that we can then at some point start passing to the ArgumentsManager. I think that second part would be the next step, to make this in a MVP. After that we can start working on adding new checkers.
I chose the logging module as it is a fairly easy module without only two options.

Don't hold back! Let me know what you think of this. All criticism is valuable!

Note: Ideally these commits wouldn't need to be squashed in a final merge. That makes cherry-picking later on much easier (see 2.0 branch in astroid). So after final review I'll revisit the commit history.

@DanielNoord DanielNoord added Enhancement ✨ Improvement to a component Configuration Related to configuration labels Dec 21, 2021
@coveralls
Copy link

coveralls commented Dec 22, 2021

Pull Request Test Coverage Report for Build 2058346839

  • 90 of 92 (97.83%) changed or added relevant lines in 9 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.02%) to 94.173%

Changes Missing Coverage Covered Lines Changed/Added Lines %
pylint/checkers/base_checker.py 9 10 90.0%
pylint/config/arguments_manager.py 41 42 97.62%
Totals Coverage Status
Change from base Build 2057470749: 0.02%
Covered Lines: 15418
Relevant Lines: 16372

💛 - Coveralls

@DanielNoord
Copy link
Collaborator Author

If anybody could help me figure out why this fails on Windows that would be much appreciated!

@jacobtylerwalls
Copy link
Member

jacobtylerwalls commented Dec 22, 2021

I think it's that subroutines can't be pickled on Windows. Any way we can avoid pickling ArgumentParser?

edit: ArgumentParser implementation

@DanielNoord
Copy link
Collaborator Author

I'll try and come up with something!

@DanielNoord DanielNoord marked this pull request as draft December 22, 2021 07:46
@DanielNoord DanielNoord force-pushed the argparse-logging branch 3 times, most recently from c447d7b to b9b957a Compare December 22, 2021 08:59
@DanielNoord DanielNoord marked this pull request as ready for review December 22, 2021 09:40
@DanielNoord
Copy link
Collaborator Author

I have fixed the issue with argparse.ArgumentsParser pickling. It is annoying as it means we'll need to instantiate an ArgumentsParser and extract any relevant information during the init of the ArgumentsManager.
On the other hand, that does force us to think clearly about how we handle arguments parsing and might help us avoid any redundant calls 😄

@Pierre-Sassoulas Pierre-Sassoulas changed the title Create an Argument class and allow covertion of optdict into them Create an Argument class and allow convertion of optdict into them Dec 22, 2021
@DanielNoord DanielNoord marked this pull request as draft December 22, 2021 15:56
@DanielNoord
Copy link
Collaborator Author

Converting this to draft as I found issues with the current implementation. In retrospect I think it doesn't make sense to break this into smaller PRs, as we might run into problems after earlier parts have been merged.

Therefore, I'll keep working on this until we can completely use argparse to parse the options for the logging module. That should be a good test that at least basic functionality is guaranteed.

Copy link
Member

@Pierre-Sassoulas Pierre-Sassoulas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need new classes ? Or could we use argparse.Namespace directly ? optparse is more complex so if we try to do a "small refactor" we'll keep the optparse complexity in our implementation even when using argparse.

One thing I had in mind for this refactor was separating parsing option / checking. Maybe creating NamedTuple or DataClasses and gives them to the checker pre-parsed instead of parsing the option inside each checker (in order to not have breaking change we could parse the option if the object is not given ?).

@DanielNoord
Copy link
Collaborator Author

Do we really need new classes ? Or could we use argparse.Namespace directly ? optparse is more complex so if we try to do a "small refactor" we'll keep the optparse complexity in our implementation even when using argparse.

I think it is better to create new classes as it allows us to more easily work on this. If we start to inject into the pre-existing optparse based classes we will create merge conflicts very easily.

One thing I had in mind for this refactor was separating parsing option / checking. Maybe creating NamedTuple or DataClasses and gives them to the checker pre-parsed instead of parsing the option inside each checker (in order to not have breaking change we could parse the option if the object is not given ?).

I'm not sure what you mean here.

I think the flow would be 1) register options on parser object, 2) parse options with parser object, 3) set namespace object as config namespace of linter object.

@DanielNoord
Copy link
Collaborator Author

This might be more difficult than I thought. I had a pretty good idea of how to not require argparse.ArgumentParser as subclass of PyLinter, but we need some way to store the ArgumentParser. Otherwise we would need to register all options every time we want to change any of them.

The problem is that even Run gets pickled, so I don't know a good place to register an ArgumentParser and store it.. I'll keep thinking about this, but this is a very large limitation of argparse which I'm not sure we can overcome..

@Pierre-Sassoulas
Copy link
Member

  1. set namespace object as config namespace of linter object.

I agree with 1) and 2) but I don't understand this part.

we need some way to store the ArgumentParser. Otherwise we would need to register all options every time we want to change any of them.

Couldn't we store the result of the parsing, i.e. an argparse.Namespace ?

@DanielNoord
Copy link
Collaborator Author

I agree with 1) and 2) but I don't understand this part.

After the parsing is done (ideally in the Run init) the resulting namespace needs to be made accessible to the linter class. So it can be accessed there.

Couldn't we store the result of the parsing, i.e. an argparse.Namespace ?

There is a difference between the Namespace and the storage of arguments. The namespace only stores the current value whereas the parser objects also stores stuff like default value and description. I think it's important that information is not lost after initial parsing of arguments is done, so we'll preferably need to save both the parser and the namespace. Currently I don't see a way to store the parser, as it seems everything within Run is pickled..

@Pierre-Sassoulas
Copy link
Member

the resulting namespace needs to be made accessible to the linter class.

Ok I agree with that too.

I think it's important that information is not lost after initial parsing of arguments is done,

Why ? Where could we use the default value and the description after the parsing is done ?

@DanielNoord
Copy link
Collaborator Author

Why ? Where could we use the default value and the description after the parsing is done ?

We would need it to "reparse" for sub directories and for help messages. We would then need to recollect and reinstantiate all options for every subdirectory. That's not really good performance wise..,

@Pierre-Sassoulas
Copy link
Member

Can't we parse all the options for each directories first and store them ? (By the way the configuration by directory feature was removed from flake8 because it was bringing "90% of the bugs" we might want to reconsider that one)

@DanielNoord
Copy link
Collaborator Author

Can't we parse all the options for each directories first and store them ? (By the way the configuration by directory feature was removed from flake8 because it was bringing "90% of the bugs" we might want to reconsider that one)

That might be possible, but that would definitely come later.

However, I think I might have found a way to do it. We won't store information about the arguments this way, but I think we could design everything in such a way that that is not necessary.

@DanielNoord
Copy link
Collaborator Author

@Pierre-Sassoulas Would you consider using dill to pickle the PyLinter object for the multiprocessing handling? I'm basing this on this SO question which seems to suggest it should be able to fix our problem:
https://stackoverflow.com/questions/8804830/python-multiprocessing-picklingerror-cant-pickle-type-function

According to their README it has been download 200M+ times via Pypi and supports almost all python versions (Python >=2.7, !=3.0.*). Being able to pickle the input to multiprocessing.Pool would save all of the issues and allow us to make a more intuitive system.


Some additional arguments for why I'm exploring fixing the multiprocessing issue instead of working around: we can't be sure that all plugins and tools (such as prospector) create a Run class. In our own tests we don't do so, but after some additional thoughts I don't think we should do stuff in the Run __init__ that is needed for PyLinter to work. (That's counterintuitive and bad design).
Thus, my third (fourth, fifth?) attempt at this doesn't really work. I hoped that by doing the argument parsing in Run we could skip the pickling.

@Pierre-Sassoulas
Copy link
Member

Pierre-Sassoulas commented Dec 29, 2021

Look like dill do not have any dependencies of its own and its 3-clause BSD license seems compatible with pylint. We would more than double their download per day by adding them as a dependencies but it's already used a lot.

I don't think we should do stuff in the Run init that is needed for PyLinter to work. (That's counterintuitive and bad design).

Depends on what we want in PyLinter, right now it's doing almost everything. There's a problem with breaking change in downstream plugins to consider for sure, but let's put that aside for a minute If I understood what we're talking about and the current state of the code, PyLinter (each checker even) is parsing the options right now ? Parsing the options first before launching anything and especially the parallel run actually make more sense imo ? i.e. PyLinter could be the class that analyses a set of file given a set of parsed options and have less responsibilities. Alternatively we could include multiprocessing handling in PyLinter if we want PyLinter and not Run to be the entrypoint ?

@DanielNoord
Copy link
Collaborator Author

DanielNoord commented Dec 29, 2021

Look like dill do not have any dependencies of its own and it's 3-clause BSD license seems compatible with pylint. We would more than double their download per day by adding them as a dependencies but it's already used a lot.

I'll start work on a PR implementing it for our parallel run then, so we can continue discussion about actually doing this in a separate PR 😄

Depends on what we want in PyLinter, right now it's doing almost everything. There's a problem with breaking change in downstream plugins to consider for sure, but let's put that aside for a minute If I understood what we're talking about and the current state of the code, PyLinter (each checker even) is parsing the options right now ?

Since every checker is an OptionsProvider each plugin or checker added after initialising PyLinter can add new options that need to be read, "added to the parser" and then "parsed into the Namespace". The flow is as follows:

  1. Initialise Run, 2. Initialise PyLinter and its associated parser, 3. "Add" all options from the PyLinter class, 4. "Parse" PyLinter options, 5. "Add" all options from default checkers and plugins, 6. "Parse" all options from default checkers and plugins, 7. "Add" all options from extra plugins, 8. "Parse" all options from extra plugins.

Note load_defaults in register_checker here:
https://github.com/PyCQA/pylint/blob/e815843293adf100662fe4e183c34f3040529a15/pylint/lint/pylinter.py#L754-L763

The problem is thus that at step 5 and step 7 we need to re-initialise an ArgumentsParser if we can't store the parser as an attribute or subclass of PyLinter. Adding the parser to Run doesn't help as for step 7 we don't have access to Run anymore.
I have tested what would happen if we had to re-initialise an ArgumentsParser 100 times and it became a significant impact on runtime.

Parsing the options first before launching anything and especially the parallel run actually make more sense imo ? i.e. PyLinter could be the class that analyses a set of file given a set of parsed options and have less responsibilities. Alternatively we could include multiprocessing handling in PyLinter if we want PyLinter and not Run to be the entrypoint ?

We can't really do this as we need to parse the load-plugins option before knowing all the options we have to parse.

Copy link
Member

@Pierre-Sassoulas Pierre-Sassoulas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Look good already. I only skimmed because I'm not an expert in argparse custom parser and even less so in optparse so it's a hard review for me.

.pre-commit-config.yaml Outdated Show resolved Hide resolved
pylint/config/utils.py Show resolved Hide resolved
@Pierre-Sassoulas Pierre-Sassoulas added this to the 2.14.0 milestone Mar 25, 2022
@DanielNoord
Copy link
Collaborator Author

@Pierre-Sassoulas Do you want others to review this as well before merging? @jacobtylerwalls possibly?

@Pierre-Sassoulas
Copy link
Member

More reviews is always a good thing if we can afford the luxury :)

Copy link
Member

@jacobtylerwalls jacobtylerwalls left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Terrific, thanks for moving this migration forward. A few questions, some for my own understanding.

pylint/config/config_initialization.py Show resolved Hide resolved
pylintrc Show resolved Hide resolved
tests/config/test_argparse_config.py Outdated Show resolved Hide resolved
tests/config/test_argparse_config.py Outdated Show resolved Hide resolved
pylint/config/argument.py Outdated Show resolved Hide resolved
pylint/config/arguments_manager.py Outdated Show resolved Hide resolved
pylint/config/arguments_manager.py Show resolved Hide resolved
pylint/config/arguments_manager.py Show resolved Hide resolved
@jacobtylerwalls
Copy link
Member

Thanks for answering my questions. Is now a good time to edit the branch history?

@DanielNoord
Copy link
Collaborator Author

I guess so. I'll do a rebase.

@DanielNoord
Copy link
Collaborator Author

DanielNoord commented Mar 29, 2022

@Pierre-Sassoulas Please rebase and merge if you're okay with this and the number of reviews 😄

Copy link
Member

@Pierre-Sassoulas Pierre-Sassoulas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great refactor @DanielNoord !

@Pierre-Sassoulas Pierre-Sassoulas merged commit 0bc45e9 into pylint-dev:main Mar 29, 2022
@DanielNoord DanielNoord deleted the argparse-logging branch March 29, 2022 12:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Configuration Related to configuration Enhancement ✨ Improvement to a component
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants