Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce run.clone() #141

Open
wants to merge 11 commits into
base: main
Choose a base branch
from
Open

Introduce run.clone() #141

wants to merge 11 commits into from

Conversation

glatterf42
Copy link
Member

@glatterf42 glatterf42 commented Nov 29, 2024

Closes #133 :)
This is the next part of cleaning up #101, but it also improves the code that was originally introduced there, though I'm not entirely happy with all changes.

The goal was to enable run.clone() in the core layer. For that, clone() had to move outside of the core RunRepository since Run is not related to that. The most logical place to put it seemed backend.runs.clone() to me, so I did that, but it also warrants adding clone() to the REST API.
With clone(), we're creating a new run, which sounds like a POST request to me. However, we already intercept all those requests to /runs/, so I had to find a new way. Rather than going for another request method, I opted to redirect all POST requests to /runs/clone/, which is not otherwise in use, to clone() in the DB layer. Please let me know if this is appropriate.
The actual logic for clone() worked well in the core layer, where it could make use of the run.iamc object encapsulating some validation and formatting operations. I tried to move these helper functions to the DB layer, but some require access to a specific run, making them unsuitable to be used in the DB RunRepository. Thus, I duplicated their logic within the clone() function, which is likely not the best solution. Please let me know which alternative we can employ.
Lastly, I found two oddities referring to type hints, both marked as TODO inline:

  • core.indexset.data returns a list[float] | list[int] | list[str] now, which is technically correct: there can only be one type in any indexset.data list. However, type checkers then don't know if they should allow something like indexset.add("foo") since they can't guarantee that .data is a list[str] this time. I'm not sure how, but it would be great to annotate .data ... dynamically I guess, so that it's recognized correctly and specifically.
  • Some iamc functions like iamc.datapoints.bulk_upsert() are annotated inconsistently. In the abstract layer, they may proclaim to accept a pd.DataFrame, whereas in the DB layer, they require a pandera.DataFrame of a specific schema. This leads to some type: ignore that we could probably remove, but I'm not sure how we would want to do that.
    We could of course adapt the type hint to accept both kinds of dataframes, but we probably want to ensure that the data are validated even when the DB layer is called directly, so we probably don't want to let simple pd.DataFrames slip. Just using pandera.DataFrames might be tricky, too, as e.g. pandera.DataFrame["run__id"] tells me I can't index a pandera.DataFrame. I'm happy to hear suggestions here :)

@glatterf42 glatterf42 added the enhancement New feature or request label Nov 29, 2024
@glatterf42 glatterf42 self-assigned this Nov 29, 2024
Copy link

codecov bot commented Nov 29, 2024

Codecov Report

Attention: Patch coverage is 90.90909% with 10 lines in your changes missing coverage. Please review.

Project coverage is 87.1%. Comparing base (c7ea1f3) to head (c06ff3f).

Files with missing lines Patch % Lines
ixmp4/data/db/iamc/utils.py 82.0% 7 Missing ⚠️
ixmp4/data/abstract/run.py 50.0% 1 Missing ⚠️
ixmp4/data/db/iamc/timeseries/repository.py 50.0% 1 Missing ⚠️
ixmp4/data/db/timeseries.py 50.0% 1 Missing ⚠️
Additional details and impacted files
@@          Coverage Diff          @@
##            main    #141   +/-   ##
=====================================
  Coverage   87.0%   87.1%           
=====================================
  Files        230     231    +1     
  Lines       8170    8216   +46     
=====================================
+ Hits        7112    7159   +47     
+ Misses      1058    1057    -1     
Files with missing lines Coverage Δ
ixmp4/core/iamc/data.py 100.0% <100.0%> (+8.4%) ⬆️
ixmp4/core/optimization/indexset.py 91.5% <100.0%> (ø)
ixmp4/core/run.py 98.0% <100.0%> (ø)
ixmp4/data/api/base.py 88.4% <100.0%> (ø)
ixmp4/data/api/run.py 98.0% <100.0%> (+0.1%) ⬆️
ixmp4/data/db/meta/repository.py 96.2% <100.0%> (-0.1%) ⬇️
ixmp4/data/db/model/repository.py 100.0% <100.0%> (ø)
ixmp4/data/db/run/repository.py 95.8% <100.0%> (+1.3%) ⬆️
ixmp4/data/types.py 100.0% <100.0%> (ø)
ixmp4/server/rest/run.py 100.0% <100.0%> (ø)
... and 4 more

@meksor
Copy link
Contributor

meksor commented Jan 7, 2025

I would not add an endpoint for this, especially as it does not actually add any functionality (by definition a run that is cloned could already be created via the REST API) and I hope that we wont be cloning thousands of runs on a regular basis.
The tests are very hard to read again, can we do something to make them easier to parse?

@meksor
Copy link
Contributor

meksor commented Jan 7, 2025

Just to clarify: Ill accept the PR also with the current idea, just still reading through it...

@danielhuppmann
Copy link
Member

Two quick responses to @meksor:

I would not add an endpoint for this, especially as it does not actually add any functionality (by definition a run that is cloned could already be created via the REST API)

The benefit of a dedicated endpoint would be that the data would be copied within/close to the database, right? Avoiding that the data is sent back and forth via the RestAPI.

Also, having a dedicated clone() method avoids forgetting some parts of the Run (iamc-data, optimization-items, other stuff to be added in the future)

I hope that we wont be cloning thousands of runs on a regular basis.

Sorry, cloning runs happens all the time in MESSAGE modelling - every time a user does a variation of an existing scenario, you start by cloning the base scenario and make some modifications.

@meksor
Copy link
Contributor

meksor commented Jan 7, 2025

OK, convinced!

Copy link
Contributor

@meksor meksor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, as said, i think the two tests added here could benefit from comments/whitespace/abstraction to make them easier to read.
A hard requirement is a test that checks if clone respects the permission check!
(It looks like it does since it uses create, but its best to bake this into a test.)
You can add a test in test_auth.py

@glatterf42
Copy link
Member Author

Thanks for your review :)
I'm not familiar with all the intricacies of the auth system, so please let me know if the test I added is already sufficient. Without the if platform_info == self.gated: clause, run.clone() fails on the public and private platforms due to missing permissions.

As for the readability: I tried adding some blank lines and comments, but I'm actually just adding one test here (not counting the auth test, which does not require the same setup), so I'm not sure how to abstract things away here short of a refactoring regarding how we call the methods of adding data to optimization items or listing them. Please let me know what you think.

assert_cloned_run(run, clone_with_solution, kept_solution=True)
else:
with pytest.raises(Forbidden):
_ = run.clone()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should be able to just add to the test_filters and test_guards tests above

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, that should be done now :)

Base automatically changed from enh/run-get-by-id to main January 24, 2025 15:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Enable Run.clone()
3 participants