-
Notifications
You must be signed in to change notification settings - Fork 90
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix for #274 Make all Variable subclasses name valid python identifiers #307
base: master
Are you sure you want to change the base?
Conversation
…dentifier First step for solving #274. This covers all outcome and parameter classes. Still remaining are constants and potentially all Point subclasses.
from NamedObject to Variable, which means Constant name now also must be a valid python identifier
As expected, this brakes a lot of stuff. I will start fixing all errors this creates over the coming days. |
for more information, see https://pre-commit.ci
…kbench into valid_identifiers
for more information, see https://pre-commit.ci
Right, I just now understand the implications of this, after seeing all the examples with names with spaces in them which now aren't allowed. So this is definitely a 3.0 change, and we should add deprecation warnings in the next release (2.5). Since the scope in which backwards compatibility is broken is huge, I'm also not sure if we really should do this for all internal variable classes. |
All tests now pass, so this is ready for review. We have to decide also on how to handle 2.5 and 3. I agree that it would be good to keep this for a 3.0 release, but what to still do for 2.5? |
For 2.5, I would say replace all I will try to review asap. |
Ok, so we need a 3.0 branch into which this can go. I'll probably make a separate pull request for the deprecation warnings |
I think we can leave this PR open, merge the warnings in a separate PR, release 2.5, and then directly after that merge this PR to master. If we need to do 2.5.x bugfixes we can do those on a maintenance branch. |
I was thinking, can we provide a minimal, best-effort attempt to convert invalid identifiers to valid ones? A very simple option would be to convert spaces to underscores. Or, we could just convert any invalid character to an underscore. import re
def to_valid_identifier(s):
# Replace invalid characters with an underscore
s = re.sub(r'[^0-9a-zA-Z_]', '_', s)
# Ensure the first character is not a digit
if s[0].isdigit():
s = "_" + s
return s I think in that case we still want to throw an warning, like: UserWarning: f"Invalid characters found in {variable_name}. Converting all characters to underscores: {converted_variable_name} is now a valid Python identifier." |
I am hesitant about this: when in doubt, refuse the temptation to guess. Also, by default parameter names map to something in the underlying model. So if you change the name, you also need to set the variable_name kwarg to the original name. |
Yeah I understand. How would it be as an option that's off by default? I'm just trying to think about ways for users with huge Vensim models to not require major modifications. Otherwise we could suggest a valid name in the error message: if not name.isidentifier():
raise ValueError(f"'{name}' is not a valid Python identifier. Changing it to '{to_valid_identifier(name)'} would make it one.") |
It would be nice to have a convenient way of dealing with this, but I need to figure out how to go about it. The idea of automagically changing parameter names violates a classic Python principle. So, for me, that is a no-go. An alternative would be adding a helper function to 2.5 to convert models to the new behavior. This helper function would try to fix all parameters and spit out not just a fixed model but also the revised code so you can replace the existing uncertainty/constant/outcome definitions. The problem is that this function is easy for 95% of all use cases: take the parameter name, split it on whitespace, and join it with an underscore. Set the parameter name to this new name and set the variable name to the original name. You would print this out with There are, however, several issues. First, variable_name may be already used. This is easy: update the name only. The problem is with the remaining cases where, for example, alternative distributions are being used. Or when some of the other optional kwargs have been used to instantiate the parameter instance. These are currently not well covered by Suggesting correct names in the error message won't help much. The fix will be trivially obvious, but is just painful to do. Copy-pasting between the console and the model definition is even more painful than just spending 10 minutes fixing all names. |
Good points. I would just add the warnings to 2.5, then users know a change is coming, and then we have some more time to think about how to handle it exactly. |
Having a fix function would need to happen in 2.5 because, from 3 onward, you get a straight-up exception (at least atm). |
Fair, but I wouldn't let it be a blocker for 2.5.0. We can always backport to a 2.5 release branch and do a 2.5.1 path release with it. |
With 2.5 released we can merge this to the master branch (targeted for 3.0) right? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know what everything in here exactly does, but I don't see really weird things.
I'll fix the conflicts hopefully later today and merge it |
No description provided.