Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using cachedmethod with key=partial(hashkey, 'pep') is wrong #326

Open
NoamNol opened this issue Nov 3, 2024 · 5 comments
Open

Using cachedmethod with key=partial(hashkey, 'pep') is wrong #326

NoamNol opened this issue Nov 3, 2024 · 5 comments
Assignees
Labels

Comments

@NoamNol
Copy link

NoamNol commented Nov 3, 2024

In the docs there is this example with key=partial(hashkey, 'pep'):

class CachedReferences:

    def __init__(self, cachesize):
        self.cache = LRUCache(maxsize=cachesize)

    @cachedmethod(lambda self: self.cache, key=partial(hashkey, 'pep'))
    def get_pep(self, num):
        """Retrieve text of a Python Enhancement Proposal"""
        url = 'http://www.python.org/dev/peps/pep-%04d/' % num
        with urllib.request.urlopen(url) as s:
            return s.read()

    @cachedmethod(lambda self: self.cache, key=partial(hashkey, 'rfc'))
    def get_rfc(self, num):
        """Retrieve text of an IETF Request for Comments"""
        url = 'https://tools.ietf.org/rfc/rfc%d.txt' % num
        with urllib.request.urlopen(url) as s:
            return s.read()

I think it's wrong to use partial(hashkey, 'pep') with cachedmethod, because the self argument is passed to the key function.
Usually self is ignored by the default methodkey function, but with partial(hashkey, 'pep') it's not ignored.

Also we can't replace it with key=partial(methodkey, 'pep') because 'pep' will be the ignored argument instead of self.

@NoamNol NoamNol added the bug label Nov 3, 2024
@tkem
Copy link
Owner

tkem commented Nov 5, 2024

Usually self is ignored by the default methodkey function, but with partial(hashkey, 'pep') it's not ignored.

You're right, with hashkey the self argument is not ignored, leading to an unnecessary key item:

>>> print(list(docs.cache.keys()))
[('pep', <__main__.CachedReferences object at 0x7f5f2bbcead0>, 1), ('rfc', <__main__.CachedReferences object at 0x7f5f2bbcead0>, 1)]

However, AFAICS the behavior is as intended and not generally wrong, just a little less efficient than it could be.

@NoamNol
Copy link
Author

NoamNol commented Nov 5, 2024

@tkem thank you for the reply.

The main problem is the inconsistency between this and the first example in the docs where self is ignored, leading to unexpected results when my class has implemented a strict __hash__ method that returns a different hash value for every minor change in the class fields.

When using @cachedmethod(lambda self: self.cache, key=partial(hashkey, 'pep')) I didn't expect the cache key to be related to the class’s strict hash value, I only wanted the cache key to be calculated based on the parameters of the method and the name of the method.

So when the inconsistency is the first problem, the second problem is the lack of an example of how to use cachedmethod on multiple methods in the class ignoring self

@tkem
Copy link
Owner

tkem commented Nov 5, 2024

@NoamNol Thanks for your response!
Agreed, I just wanted to make sure I was not overlooking anything - if you agree that the example's behavior is basically correct, i.e. sharing a cache between methods leads to the expected results, I'd like to change this issue from "bug" to "enhancement" and think about how to make the example more consistent by using cachedmethod for the next regular release.
Suggestions are welcome, of course ;-)

@tkem
Copy link
Owner

tkem commented Nov 5, 2024

One simple solution would be to use a keyword argument to differentiate, i.e.

    @cachedmethod(lambda self: self.cache, key=partial(methodkey, type='pep'))
    def get_pep(self, num):
       ...
    @cachedmethod(lambda self: self.cache, key=partial(methodkey, type='rfc'))
    def get_rfc(self, num):
       ...       

This would lead to the following keys, where the _HashedTuple is used to separate keyword args in the resulting key tuple, which is an implementation detail we probably needn't discuss:

[(1, <class 'cachetools.keys._HashedTuple'>, 'type', 'pep'), (1, <class 'cachetools.keys._HashedTuple'>, 'type', 'rfc')]

@NoamNol
Copy link
Author

NoamNol commented Jan 8, 2025

@tkem I wouldn’t say the behavior is basically correct. After all, most users wouldn’t intend to calculate self in the key, leading to an ineffective cache that, in some cases (where the class is mutable and implements __hash__) never really uses the cache.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants