Add base class for caches in _cache.py #82

petebachant · 2019-11-25T13:44:16Z

There's a lot of similarity between the various cache classes. Is there any reason not to create a base class and derive the various cache classes from it? Some methods that appear to be copy/paste include:

archive
key
lookup
__get_cache
__get_mask
__get__
__reduce__

The text was updated successfully, but these errors were encountered:

IvanaGyro · 2019-11-27T01:14:13Z

safe.py and _cache.py are also almost the same.

mmckerns · 2022-04-28T17:26:55Z

Yes, the cache classes need to be refactored

matanox · 2024-02-01T14:25:21Z

Joining in. Is it a good idea to enable the caching classes (current, new, or refactored) to have the optional behavior of caching everything to files? or does it already enable that? I spent some time in the docs and source but didn't get to locate that option.

In case not ― let me explain the rationale:

For functions of long computation time, it's often the case that one wants to reuse their results in new runs. At least in research workflows. Abstracting their output being cached to files is a major workflow enabler for continuity in dynamically evolving experiment workflows.

Currently I'm using my own caching function which either computes or loads from a file, with its default behavior best described as "load from file if already available". However, klepto does a good job at matching function calls to the set of bound argument values to enable caching per bound function argument's set. It even enables specifying arguments to ignore in the keying so that arguments that aren't supposed to be affecting the function return value don't generate a different caching key for the function.

"load from file if already available" would just be very enabling, if not already enabled.

Or I'll just use https://klepto.readthedocs.io/en/latest/#klepto.keygen in my own function, if my workflow isn't as generically useful to other people.

Maybe this is a little like "purge cache to archive at maxsize" mentioned in the documentation of e.g. inf_cache, but I'm not sure how caches and archives interrelate in the klepto user facing api.

matanox · 2024-02-01T14:37:38Z

I am mentioning it here as it may interrelate with refactoring the various classes, beyond the angle of conciseness or simplification of the facade which this thread seems to have started regarding.

mmckerns · 2024-02-01T15:05:16Z

@matanster: you can select from any of the archive backends (a file, or files, included), and they can either be coupled to a in-memory cache that can be synchronized with the backend, or you can use the archive as the cache directly.

matanox · 2024-02-01T19:05:29Z

Thanks and sorry for the long message before.

Currently, I think that for the rest of us having only my amount of intelligence, code examples are the main shortage for getting started on using a library like klepto. The docstring of inf_cache is very clear and coherent, and I gather that all high-level cache objects (inf_cache, lru_cache, etc) use class cache which is inheriting python's dict ― as a lean proxy to a storage backend (a.k.a. a kelpto archiver) if provided one upon initialization.

For a non-dummy archiver that persists the data somewhere, we need to pass one when initializing the high-level cache object, for example:

my_cache = inf_cache(cache=file_archive)

That's all.

So that's how we choose a cache strategy (from among inf_cache, lru_cache etc.) and a persistence backend from the list of archive types enumerated in the readme.

The cache strategy ― the cascade providing the function results from either a (1) cached value from memory (2) an archive backend such as a file archive or directory archive or (3) a recomputation is easily gleaned in the high level classes' code, for exmaple in the case of inf_cache

klepto/klepto/_cache.py

Lines 308 to 324 in e8fc4f6

    
           try: 
        
               # get cache entry 
        
               result = cache[key] 
        
               stats[HIT] += 1 
        
           except KeyError: 
        
               # if not in cache, look in archive 
        
               if cache.archived(): 
        
                   cache.load(key) 
        
               try: 
        
                   result = cache[key] 
        
                   stats[LOAD] += 1 
        
               except KeyError: 
        
                   # if not found, then compute 
        
                   result = user_function(*args, **kwds) 
        
                   cache[key] = result 
        
                   stats[MISS] += 1 
        
           return result

and is very self-explanatory.

As user code calls __call__ by applying a function with args to the initialized high-level class instance (inf_cache/lru_cache/etc.), a hash key is generated which hashes the function and its arguments under any ignore, rounding and other contraints if provided by the caller ― which then internally becomes the key to the function's result in the dictionary that cache is plus the archiver is used as per the caching strategy of the high-level class.

my_cache(my_fn, ...)

If you like, I can update the readme to give these two lines of code and a very concise form of the above explanation, to give a usage example there.

matanox · 2024-02-01T19:11:51Z

Maybe you could rename cache to cache_dict or otherwise streamline the internal namings to convey the structure more readily than mildly obscuring it, in case it helps with more fluent analysis of stack traces and more code contribution from the community.

As the tests are passing for me (python 3.10), I can probably make the readme more readily convey usage, towards making klepto slightly more popular as its own package with just a handful of small changes, and take a step on improving the internal structure. If any of the two make a difference for you, let me know and I'll allocate some time.

As much as you trust the test suite to provide fair safety for a small refactor having only a linux machine to test on.

If however pathos is directly calling or referencing the internals underneath the level of the high-level inf_cache level of classes, that would be beyond my resources.

mmckerns · 2024-02-02T03:05:03Z

Feel free to submit a PR... Unfortunately, I believe most of the examples in this tutorial are pulled from klepto.tests and don't show up elsewhere. The tests are decent examples themselves, in some cases, but aren't repeated in the documentation.

mmckerns added the refactor label Apr 28, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add base class for caches in _cache.py #82

Add base class for caches in _cache.py #82

petebachant commented Nov 25, 2019

IvanaGyro commented Nov 27, 2019

mmckerns commented Apr 28, 2022

matanox commented Feb 1, 2024 •

edited

Loading

matanox commented Feb 1, 2024

mmckerns commented Feb 1, 2024

matanox commented Feb 1, 2024

matanox commented Feb 1, 2024 •

edited

Loading

mmckerns commented Feb 2, 2024

Add base class for caches in _cache.py #82

Add base class for caches in _cache.py #82

Comments

petebachant commented Nov 25, 2019

IvanaGyro commented Nov 27, 2019

mmckerns commented Apr 28, 2022

matanox commented Feb 1, 2024 • edited Loading

matanox commented Feb 1, 2024

mmckerns commented Feb 1, 2024

matanox commented Feb 1, 2024

matanox commented Feb 1, 2024 • edited Loading

mmckerns commented Feb 2, 2024

matanox commented Feb 1, 2024 •

edited

Loading

matanox commented Feb 1, 2024 •

edited

Loading