-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rfc (api): Revisit the API of put
and get
#1269
Comments
So
would not store anything in skore? It feels confusing that some things are saved and some aren't |
With the
version I don't know how a user would add metadata about their object (see #889) Also, what key/ID would we give the report inside skore so that the user can find it again? |
An assignment with project.artifacts.roc_plot = project.artifacts.estimator_report.metrics.roc.plot() |
The name of the attribute that you used during the assignment. |
I'm not sure either, each stored item could be a project.artifacts.estimator_report = CrossValidateReport(...)
project.artifacts.estimator_report.metadata = ... It would mean that under the hood, it would be something explicitly equivalent to: project.artifacts.estimator_report = {
"item": CrossValidateReport(...)
"metadata": ...
} but overloading the |
This is a bit strange, because calling We can make metadata live alongside it, such as |
I agree with you. My thought would be more to maybe overload the
This is an alternative that is cleaner and would require less workaround, I like it. |
I don't see how it's technically possible to return The last one is called on the result of the first one, so we have to store metadata directly in the |
Just a comment: with an API like this, we should be explicit that the user can't modify his objects without reassignment. project.artifacts.my_dict_object = {"a": {"b": 0}}
project.artifact.my_dict_object["a"]["b"] = 1
print(project.artifact.my_dict_object) -> {"a": {"b": 0}} I don't think we want to deal with that. |
There might always be a way in Python from collections import UserDict
class DataItem(UserDict):
def __init__(self):
super().__init__()
self.data = {'data': None, 'metadata': {}}
def __setitem__(self, key, value):
if key in ('data', 'metadata'):
self.data[key] = value
else:
self.data['data'][key] = value
def __getitem__(self, key):
if key in ('data', 'metadata'):
return self.data[key]
return self.data['data'][key]
def __setattr__(self, name, value):
if name == 'data' and isinstance(value, dict):
# This handles UserDict's internal 'data' attribute
super().__setattr__(name, value)
elif name in ('data', 'metadata'):
self.data[name] = value
else:
self.data['data'] = value
def __getattr__(self, name):
if name in ('data', 'metadata'):
return self.data[name]
# Delegate attribute access to the stored object
return getattr(self.data['data'], name)
def __str__(self):
return str(self.data['data'])
def __repr__(self):
return repr(self.data['data'])
class Item(UserDict):
def __init__(self):
super().__init__()
self.data = {}
def __getattr__(self, name):
if name not in self.data:
self.data[name] = DataItem()
return self.data[name]
def __setattr__(self, name, value):
if name == 'data':
super().__setattr__(name, value)
else:
if name not in self.data:
self.data[name] = DataItem()
self.data[name].data = value
class A:
def __init__(self):
self.artifacts = Item()
class B:
def func(self):
print("hello world") a = A()
a.artifacts.example_1 = B()
print(a.artifacts.example_1.data) # {'data': <__main__.B object at 0x106313410>, 'metadata': {}}
print(a.artifacts.example_1) # <__main__.B object at 0x106313410>
a.artifacts.example_1.metadata = {"created": "2025-02-03"}
print(a.artifacts.example_1.metadata) # {"created": "2025-02-03"}
print(a.artifacts.example_1) # <__main__.B object at 0x106313410>
print(a.artifacts.example_1.data) # {'data': <__main__.B object at 0x1062fc440>, 'metadata': {'created': '2025-02-03'}}
a.artifacts.example_1.func() # "hello world" will provide
When opening the "details", I think I much prefer your explicit solution with separate |
@glemaitre You are changing the type of the return of For instance, i can't do a = A()
a.artifacts.my_int = 1
# [...]
my_int = a.artifacts.my_int
my_int + 1 So i persist, i don't know how to do in a clean way without overloading the original object before it is returned. |
Indeed, I just try to hide as much as possible to the user. So the next step is to overload all operators to delegate (not sure what will be the next case :)). But all this complexity tell me that |
This is fundamentally a no-go haha. Yes, i think we should focus on |
In the past, we had many discussions where @GaelVaroquaux or myself complain with the needs to use
.put
and specified the name of an item to store it (and get it).I came #949 to mention that I wanted my
CrossValidationReport
accessible fromproject
such that this operation happen. However, one limitation as mentioned by @auguste-probabl is that native Python type cannot benefit from this API. There is also discussion that you can use the same class into different ways that could be confusing.In an IRL with @adrinjalali, an API that seems much better would be to have a
dict
/Bunch
UX to deal with this use case:would be replaced by:
or for a report
would be replaced by:
To access, the latest version of an items:
would work.
All of those would work if we consider that
skore.open
set implicitly the "run" for which we store and access the items. We would still need.get
if one wants to access an item from a differentrun
where we come back to the discussion on havingskore.get_metadata
or similar to be able to query the database.The text was updated successfully, but these errors were encountered: