Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add base class for caches in _cache.py #82

Open
petebachant opened this issue Nov 25, 2019 · 8 comments
Open

Add base class for caches in _cache.py #82

petebachant opened this issue Nov 25, 2019 · 8 comments
Labels

Comments

@petebachant
Copy link

There's a lot of similarity between the various cache classes. Is there any reason not to create a base class and derive the various cache classes from it? Some methods that appear to be copy/paste include:

  • archive
  • key
  • lookup
  • __get_cache
  • __get_mask
  • __get__
  • __reduce__
@IvanaGyro
Copy link
Contributor

safe.py and _cache.py are also almost the same.

@mmckerns
Copy link
Member

Yes, the cache classes need to be refactored

@matanox
Copy link

matanox commented Feb 1, 2024

Joining in. Is it a good idea to enable the caching classes (current, new, or refactored) to have the optional behavior of caching everything to files? or does it already enable that? I spent some time in the docs and source but didn't get to locate that option.

In case not ― let me explain the rationale:

For functions of long computation time, it's often the case that one wants to reuse their results in new runs. At least in research workflows. Abstracting their output being cached to files is a major workflow enabler for continuity in dynamically evolving experiment workflows.

Currently I'm using my own caching function which either computes or loads from a file, with its default behavior best described as "load from file if already available". However, klepto does a good job at matching function calls to the set of bound argument values to enable caching per bound function argument's set. It even enables specifying arguments to ignore in the keying so that arguments that aren't supposed to be affecting the function return value don't generate a different caching key for the function.

"load from file if already available" would just be very enabling, if not already enabled.

Or I'll just use https://klepto.readthedocs.io/en/latest/#klepto.keygen in my own function, if my workflow isn't as generically useful to other people.


Maybe this is a little like "purge cache to archive at maxsize" mentioned in the documentation of e.g. inf_cache, but I'm not sure how caches and archives interrelate in the klepto user facing api.

@matanox
Copy link

matanox commented Feb 1, 2024

I am mentioning it here as it may interrelate with refactoring the various classes, beyond the angle of conciseness or simplification of the facade which this thread seems to have started regarding.

@mmckerns
Copy link
Member

mmckerns commented Feb 1, 2024

@matanster: you can select from any of the archive backends (a file, or files, included), and they can either be coupled to a in-memory cache that can be synchronized with the backend, or you can use the archive as the cache directly.

@matanox
Copy link

matanox commented Feb 1, 2024

Thanks and sorry for the long message before.

Currently, I think that for the rest of us having only my amount of intelligence, code examples are the main shortage for getting started on using a library like klepto. The docstring of inf_cache is very clear and coherent, and I gather that all high-level cache objects (inf_cache, lru_cache, etc) use class cache which is inheriting python's dict ― as a lean proxy to a storage backend (a.k.a. a kelpto archiver) if provided one upon initialization.

For a non-dummy archiver that persists the data somewhere, we need to pass one when initializing the high-level cache object, for example:

my_cache = inf_cache(cache=file_archive)

That's all.

So that's how we choose a cache strategy (from among inf_cache, lru_cache etc.) and a persistence backend from the list of archive types enumerated in the readme.

The cache strategy ― the cascade providing the function results from either a (1) cached value from memory (2) an archive backend such as a file archive or directory archive or (3) a recomputation is easily gleaned in the high level classes' code, for exmaple in the case of inf_cache

klepto/klepto/_cache.py

Lines 308 to 324 in e8fc4f6

try:
# get cache entry
result = cache[key]
stats[HIT] += 1
except KeyError:
# if not in cache, look in archive
if cache.archived():
cache.load(key)
try:
result = cache[key]
stats[LOAD] += 1
except KeyError:
# if not found, then compute
result = user_function(*args, **kwds)
cache[key] = result
stats[MISS] += 1
return result
and is very self-explanatory.

As user code calls __call__ by applying a function with args to the initialized high-level class instance (inf_cache/lru_cache/etc.), a hash key is generated which hashes the function and its arguments under any ignore, rounding and other contraints if provided by the caller ― which then internally becomes the key to the function's result in the dictionary that cache is plus the archiver is used as per the caching strategy of the high-level class.

my_cache(my_fn, ...)

If you like, I can update the readme to give these two lines of code and a very concise form of the above explanation, to give a usage example there.

@matanox
Copy link

matanox commented Feb 1, 2024

Maybe you could rename cache to cache_dict or otherwise streamline the internal namings to convey the structure more readily than mildly obscuring it, in case it helps with more fluent analysis of stack traces and more code contribution from the community.

As the tests are passing for me (python 3.10), I can probably make the readme more readily convey usage, towards making klepto slightly more popular as its own package with just a handful of small changes, and take a step on improving the internal structure. If any of the two make a difference for you, let me know and I'll allocate some time.

As much as you trust the test suite to provide fair safety for a small refactor having only a linux machine to test on.

If however pathos is directly calling or referencing the internals underneath the level of the high-level inf_cache level of classes, that would be beyond my resources.

@mmckerns
Copy link
Member

mmckerns commented Feb 2, 2024

Feel free to submit a PR... Unfortunately, I believe most of the examples in this tutorial are pulled from klepto.tests and don't show up elsewhere. The tests are decent examples themselves, in some cases, but aren't repeated in the documentation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants