Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add on_set / on_get callbacks to fields #57

Open
lafrech opened this issue Oct 6, 2016 · 4 comments
Open

Add on_set / on_get callbacks to fields #57

lafrech opened this issue Oct 6, 2016 · 4 comments

Comments

@lafrech
Copy link
Collaborator

lafrech commented Oct 6, 2016

Following discussing here.

I wrote:

I still think it would be nice to have a way to call the callback at get or set time.

class User(Document):
    birthday = DateTimeField(required=True, on_set=self._compute_age)  # either here
    age = IntField(required=True, on_get=self._compute_age)  # or there

    def _compute_age(self):
        self.age = (datetime.utcnow() - self.birthday).days / 365

One nice thing with umongo is that validation occurs at set time, so the object is always valid. You don't have to wait until commit (or explicit validation) to check that. get/set callbacks would allow those dependencies between fields to be enforced anytime just as well.

@touilleMan answered:

My 0.2$ on this: Given an umongo document keeps internally the data in mongo world representation, it would be a bit cumbersome to add a check to do dynamic evaluation when retrieving a field (like you propose with on_get)
On the other hand on_set seems much easier given data always goes through BaseField.deserialize_from_mongo method. We could add a simple check there to call the function if defined.
However this will create small troubles if multiple fields are updated and have their on_set param pointing on the same callback. User would like to have the callback called once with the final value, instead it will be called twice, first time with half the data up to date...

I didn't think about the update use case. It seems this feature is more complicated than I expected.

I'd like to add a new requirement. on_[g|s]et should be lists of callbacks rather than callbacks, to allow several callbacks to be set, with a deterministic order.

class User(Document):
    birthday = DateTimeField(required=True, on_set=[self._compute_age, another_callback])  # either here
    age = IntField(required=True, on_get=[self._compute_age, yet_another_callback])  # or there

    def _compute_age(self):
        self.age = (datetime.utcnow() - self.birthday).days / 365
@touilleMan
Copy link
Member

Do you think the new marshmallow's pre/post processors integration (see #60 (comment)) could be used as a workaround for this ?

@lafrech
Copy link
Collaborator Author

lafrech commented Oct 18, 2016

Not sure. The on_[g|s]et callbacks could be executed after an attribute is set.

class User(Document):
    birthday = DateTimeField(on_set=self._compute_age)  # either here
    age = IntField()

    def _compute_age(self):
        self.age = (datetime.utcnow() - self.birthday).days / 365  # Who cares about precision?

chuck = User()
chuck.age = None

chuck.birthday = datetime.datetime(year=1940, month=3, day=10)
chuck.age = 76

I don't think you can do this with post_load, can you? AFAIU, post_load is schema-level, not field-level, it is only called when loading the whole schema (maybe with partial), but not when setting an attribute.

@touilleMan
Copy link
Member

touilleMan commented Oct 18, 2016

You're right, marshmallow's pre/post processors is not the silver bullet we were waiting for !

I'm thinking of a new way to implement this feature:

  • on_set should be simple by adding a check in Dataproxy.load, Dataproxy.set and Dataproxy.set_by_mongo_name
  • on_get can be implemented by storing the field data in dataproxy as a lazy object that need evaluation

example:

>>> def capitalize(value):
... print('capitalize !')
... return value.to_upper()
... 
>>> def double(value):
... print('double !')
... return value + ' ' + value
... 
>>> class User(Document):
...     name = fields.StrField(on_get=[capitalize, double])
...     lastname = fields.StrField(on_set=[capitalize, double])
... 
>>> user = User(name='john', lastname='doe')
capitalize !
double !
>>> user.lastname
'DOE DOE'
>>> user.name
capitalize !
double !
'JOHN JOHN'
>>> lazy_obj = user._data._data['name']
>>> lazy_obj
<LazyObj(fn=double, value=LazyObj(fn=capitalize, value='john'))>
>>> lazy_obj.eval()
capitalize !
double !
'JOHN JOHN'

But I'm still unsure how useful it would be for on_get... do you see a usecase for this ?
I can think of a way to automatically unserialize data (for example returning a dict when having a string of json in database). However this feels error-prone (in my example you may want to modified dict but this will change nothing within the database...)

Usecase for on_set is pretty obvious (hashing&salting password as you proposed) though.

@lafrech
Copy link
Collaborator Author

lafrech commented Oct 18, 2016

But I'm still unsure how useful it would be for on_get... do you see a usecase for this ?

I also see more use for on_set. I guess on_get came naturally for symmetry reasons. In practice, it could be useful for a field that depends on a lot of other fields with a simple computation. So rather than modifying the field each time one of those other fields is modified, it could be modified on the fly when accessed. This could be better in terms of both performance and code simplicity. But it doesn't have to be done in the same move anyway, it is independent from on_set.

on_set should be simple by adding a check in Dataproxy.load, Dataproxy.set and Dataproxy.set_by_mongo_name

And Dataproxy.update, Dataproxy.delete. I'm not sure about Dataproxy.from_mongoand Dataproxy._add_missing_fields.

Questions:

  • Should the callbacks be called when loading from Mongo or only when setting values from object world?
  • Should there be a mechanism to avoid calling several times the same callback in a load / update operation, but rather do all updates first then call all needed callbacks only once? Tempting, but maybe not that easy, and potentially problematic. The callbacks are not necessarily idempotent like in my age computation example. If the callback counts or adds/appends stuff, the designer might want to have them executed systematically.
  • From an implementation perspective, rather than modifying all the methods cited above, maybe there is an alternative where Dataproxy._data would be a special kind of dict with an overriden __setitem__. Just a tought. I didn't check feasibility.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants