Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filter for removing self does not work on classes #75

Open
IvanaGyro opened this issue Aug 5, 2019 · 9 comments
Open

Filter for removing self does not work on classes #75

IvanaGyro opened this issue Aug 5, 2019 · 9 comments

Comments

@IvanaGyro
Copy link
Contributor

Currently, klepto does not restrict the types of the return value from the cached functions. If an instance of the class can be cached, there is no reason that the caches, as the decorators, can not apply to the class. Below is the code I think it should work.

from klepto import archives, keymaps, no_cache
@no_cache(cache=archives.dict_archive(), keymap=keymaps.picklemap(), ignore=('self',))
class MyClass:
    def __init__(self, val):
        self.val = val

c = MyClass(1)
d = MyClass(1)
c is d  # True

However the code above will raise a TypeError:
TypeError: __call__() got multiple values for argument 'self'

The relative code in the source as the following.
_inspect.py:

    # if ignore self, remove self instead of NULL it
    if inspect.isfunction(func):
        try: # this is a pretty good filter that: user_args[0] is self
            _bound = getattr(user_args[0], func.__name__)
            _self = getattr(_bound, 'im_self', None)
            if _self is None: _self = getattr(_bound, '__self__')
            assert _self == user_args[0]
        except:
            _bound = None
        if _bound and explicitly_named[0] in ignored:
            user_args = user_args[1:]                # remove 'self' instance
            user_kwds.pop(explicitly_named[0], None) #XXX: unnecessary?
            explicitly_named = explicitly_named[1:]  # remove 'self' name
            #XXX: hopefully, this doesn't mess up arg counting and other stuff

There are two points need to be fixed in the code above.

  1. Filter out self if func is not a funcrtion.
  2. Remove 'self' instance as default. If self is retained the TypeError must be raised, so it does not need let the developers decide whether self should be retained or not.
@mmckerns
Copy link
Member

mmckerns commented Aug 6, 2019

I think my intent was that the decorator would go on the class method, not on the whole class.
Like in this example... (actually, it's demonstrated in the file a few times).

If you have a case for something that would work with a class decorator, I'm super happy to see something proposed. I don't handle that case.

@IvanaGyro
Copy link
Contributor Author

I am working on the project using some machine learning algorithms. I created a class. When the instances of the class are created, the instances train the specific models corresponding to their arguments and pickle the result into the local storage. If the decorator can be applied to classes, then I can achieve the propose without writing my own __new__ and without manually handling archives(e.g. dir_archive, file_archive,.etc.

@IvanaGyro
Copy link
Contributor Author

IvanaGyro commented Aug 6, 2019

However, I found that supporting decorating classes will be a huge job because pickle cannot pickle the object of which class cannot be directly imported by the class name.

@mmckerns
Copy link
Member

mmckerns commented Aug 6, 2019

klepto can use dill instead of pickle. Is what you are testing serializable by dill?

I'm not sure what you want to do with this issue. Is it a feature request, or a just question, or no longer valid and can be closed?

Are you looking to store the class instances in a database? I know you can use the dict interface of a klepto.archive to store a class instance. Maybe you could provide some code (or pseudo-code) to help crystalize for me what you are looking to do.

@IvanaGyro
Copy link
Contributor Author

The following is the code demonstrating what I want to do.

trained.py

from klepto import no_cache, archives, keymaps
from other import OtherBase

@no_cache(cache=archives.dir_archive('test'), keymap=keymaps.picklemap())
class TrainedModel(OtherBase):
    def __init__(self, weights, date):
        self.weights = weights
        self.data = get_data_from_db(date)
        self.training_takes_long_time()

main.py

from trained.py import TrainedModel

model = TrainedModel({'a': 1, 'b': 2}, '20190808')
print(model.predict('a'))

If the caches (e.g. no_cache) cannot apply to classes, then I have to implement a stupid method .

trained.py

from klepto import no_cache, archives, keymaps
from other import OtherBase

class TrainedModel(OtherBase):
    def __init__(self, weights, date):
        self.weights = weights
        self.data = get_data_from_db(date)
        self.fit(weights, date)

    @no_cache(cache=archives.dir_archive('test'), keymap=keymaps.picklemap())
    def fit(self, weights, date):
        # `weights`, `date` are not used
        self.training_takes_long_time()
        return self # just for cache

@IvanaGyro
Copy link
Contributor Author

I have seen a part of source code of dill. It seems that dill also cannot unpickle the instance whose class cannot be directly retrieved by the name of the class. I have writen a decorator to make caches can be apply to classes. If this feature will be added into klepto, this code may be useful.

def _wrap_cache(cache):
    """Factory for making original caches support classes

    Because `pickle` cannot unpickle the object of which class cannot be 
    imported directly by the full name of the class, see the first refrence
    below for the source code of cpython, it will fail when unpickling the 
    classes wrapped by the original caches. To make the caches support classes,
    the original classes are returned by the new caches, as the wrappers, but
    `__new__` and `__init__` of the classes are hooked, and the wrapped function
    is set as an attribute named `__cache_instances_factory__` of the class.

    Returns:
        Original cache class which `__call__` is be replaced.

    Notices:
        1. Not support the classes whose metaclass are not `type`.
        2. Not influent the subclasses.

    Related References:
        https://github.com/python/cpython/blob/0378d98678f3617fd44d9a6266e7c17ebce62755/Lib/pickle.py#L1063
        https://github.com/uqfoundation/klepto/issues/75
        https://docs.python.org/3.7/library/pickle.html#pickling-class-instances
    """
    # `functools` and `inspect` will be deleted at the end of this module.
    functools_wraps = functools.wraps
    inspect_isclass = inspect.isclass
    inspect_getfullargspec = inspect.getfullargspec

    # When `cache.__call__` is called in `wrapper(self, user_function)`,
    # `cache.__call__` is replaced, so it need to be cached here.
    original_call = cache.__call__
    @functools.wraps(cache.__call__)
    def wrapper(self, user_function):
        decorated_function = original_call(self, user_function)
        if not inspect_isclass(user_function):
            return decorated_function
        else:
            user_class = user_function # alias
            user_class.__cache_instances_factory__ = decorated_function

            # Use the original `__getnewargs_ex__` or `__getnewargs__` if they
            # are exists.
            if not hasattr(user_class, '__getnewargs_ex__') and \
                not hasattr(user_class, '__getnewargs__'):
                # XXX: If the class is wrapped by `xx_cache` more than once,
                #   this will get the correct argspec. Should the situation
                #   of wrapping the class more than once be supported?
                argspec = inspect_getfullargspec(user_class.__new__)
                def __getnewargs_ex__(self):
                    # `__newargs__` and `__newkwargs__` are set in decorated
                    # `__new__`.
                    if argspec.varargs is not None:
                        args = self.__newargs__
                    else:
                        args = self.__newargs__[:len(argspec.args)-1]
                    if argspec.varkw is not None:
                        kwargs = self.__newkwargs__
                    else:
                        kwonlyargs = set(argspec.kwonlyargs)
                        kwonlyargs.add('__to_original_m061p3__')
                        kwargs = {k: self.__newkwargs__[k] 
                            for k in kwonlyargs if k in self.__newkwargs__}
                    return args, kwargs
                user_class.__getnewargs_ex__ = __getnewargs_ex__
            
            # fix inheritance mechanism
            if '__new__' in user_class.__dict__ or \
                not hasattr(user_class, '__original_new__'):
                user_class.__original_new__ = original_new = user_class.__new__
            else:
                # give the subclass the original `__new__`
                original_new = user_class.__original_new__
            @functools_wraps(user_class.__new__)
            def wrap_new(cls, *args, **kwargs):
                # If __to_original_m061p3__ is in `kwargs`, it means that this
                # function call comes from `__cache_instances_factory__` or that
                # `__cache_instances_factory__` should be bypassed.
                # only pass to the factory if the variable, `cls`, is itself
                if '__to_original_m061p3__' not in kwargs and cls is user_class:
                    kwargs['__to_original_m061p3__'] = True
                    return cls.__cache_instances_factory__(*args, **kwargs)
                else:
                    kwargs.pop('__to_original_m061p3__', None)
                    if original_new is object.__new__:
                        new_obj = original_new(cls)
                    else:
                        new_obj = original_new(cls, *args, **kwargs)
                    # For `__getnewargs_ex__`.
                    # When unpickling the object, `__cache_instances_factory__`
                    # should be bypassed.
                    kwargs['__to_original_m061p3__'] = True
                    new_obj.__newargs__, new_obj.__newkwargs__ = args, kwargs
                    return new_obj
            user_class.__new__ = wrap_new 

            original_init = user_class.__init__
            @functools_wraps(user_class.__init__)
            def wrap_init(self, *args, **kwargs):
                if '__to_original_m061p3__' in kwargs:
                    kwargs.pop('__to_original_m061p3__')
                    original_init(self, *args, **kwargs)
                else:
                    # If the instance is not created by 
                    # `__cache_instances_factory__`, then do nothing.
                    #
                    # If the warpped class is a direct subclass of `object`
                    # and of which metaclass is `type`, the proccess of
                    # creating an instance as the following.
                    # 
                    # decorated __new__ without `__to_original_m061p3__` ->
                    # __cache_instances_factory__ ->
                    # decorated __new__ with `__to_original_m061p3__` ->
                    # original __new__ without `__to_original_m061p3__` ->
                    # decorated __init__ without `__to_original_m061p3__` ->
                    # decorated __init__ with `__to_original_m061p3__`
                    #
                    # For more details about how instances are constructed,
                    # refer to:
                    # https://blog.ionelmc.ro/2015/02/09/understanding-python-metaclasses/#putting-it-all-together
                    pass
            user_class.__init__ = wrap_init
            return user_class
    cache.__call__ = wrapper
    return cache

@mmckerns
Copy link
Member

My intuition here has been to decorate __init__, and that's indeed what I have done in similar usage patterns. However, I'm not against using a class decorator. Since you have provided some code to apply the klepto caches as a class decorator, why don't you submit a PR?

@IvanaGyro
Copy link
Contributor Author

No problem, I will submit a PR when I am not as busy as an ant on the fryer.

@IvanaGyro
Copy link
Contributor Author

No problem, I will submit a PR when I am not as busy as an ant on the fryer.

I need some days to keep the compatibility of python 2.7.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants