-
Notifications
You must be signed in to change notification settings - Fork 875
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Re-organizing VASP I/O code #3997
base: master
Are you sure you want to change the base?
Conversation
This reverts commit 636165b.
@@ -0,0 +1,107 @@ | |||
# ruff: noqa: PGH003 |
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
Here are my comments:
I refer you to the scikit-learn module and class organization. Please use that as a guide on how to organize. Finally, I would point out that many of sklearn's module files are about the same length as sets.py (2800-3200 lines). They do not break up a module for maintenance - they break it up when there is a logic to the organization. |
The sklearn comparison is illustrative: even where there are large LOC, often there is only a single class per file, and they are quite readable. This is not the case for these specific files in pymatgen. I am not making axiomatic statements; I agree that a high number of LOC is not in itself an error. Long-time pymatgen developers (we are in good company) might know exactly where in the file to go to make an edit, but before you can get there, one has to read and understand the structure of the code. It is nice to know what classes are available, for example, without needing to read the entire file first. Point 3 I concur; perhaps it is better to split up the input sets according to functionality instead. |
Re 3, i think the most common use case is to have consistent settings across whatever mixture of workflows you run (relax, band gap, EOS, phonon, etc.). so after having run some number of relaxations, a user is unlikely to care about half a dozen other band gap workflows with inconsistent VASP settings to their relaxations. more likely, the only thing they want to know is if a band gap workflow from the same lineage (say MP) exists. hence splitting input sets by lineage makes sense imo. i guess it doesn't have to be one or the other. if you look at |
If splitting by source is preferred, I am ok with that. Note that I don't really think atomate2's organizational structure should be the reference here. In any case, if the module split is a private implementation, then it really doesn't matter since we can always reorganize as needed without breaking downstream codes. |
Summary
Re-opening changes initially introduced by @janosh in #3865 and later reverted by @shyuep (see thread).
I will be taking responsibility for this PR. This PR will require approval from both @janosh and @shyuep before merging.
Specifically,
__dir__()
to make sure all imports remain unaffected and that there are no backwards-incompatible changes.pymatgen.io.vasp.inputs
andpymatgen.io.vasp.outputs
into separate files.This change is well-motivated to improve maintainability.
I hope we can get this PR merged without too much difficulty. Thank you again to @janosh for starting this PR, to @Andrew-S-Rosen for the bug report due to the errors caused, and to @shyuep for implementing a rapid fix.