Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

archiving impure functions ? #83

Open
asmodehn opened this issue Dec 4, 2019 · 2 comments
Open

archiving impure functions ? #83

asmodehn opened this issue Dec 4, 2019 · 2 comments

Comments

@asmodehn
Copy link

asmodehn commented Dec 4, 2019

Hi, I've been following dill and pathos for a long while, and recently decided to get my hands dirty with klepto for a project I am working on...

I am interested in storing (python's) function in/outputs (mostly in json or sql for now - something easily readable), but not as "caching" per say, since it requires the routine to be a pure function to be correct (should return the same output for the same input, and no-args should mean 'constant').
I'd rather store all inputs/outputs at a specific timestamp, and when some constraint arise(max archive size reached for example), trigger a conversion to a pure function, but only if it looks like one (independent of time, or pure function of time...). Otherwise I'd just erase old history and keep a log of recent calls.
A sort of semantic compression if you want.
For example, later on a function could be compressed into a mapping/dict (of cached results) if only a small set of args has ever been used and we need to recover resources...

So, as a first step, I am looking at using klepto for archiving ins/outs of my routines, but without the caching (only archiving, no 'replay' attempt), and adding an implicit time argument...

The quick and dirty code I was able to get working to do that is :

import random
import time
import klepto

def record(fun):

    def timewrapper(stamp=time.time(), *args):
        return stamp, fun(*args)  # **kwargs not supported by klepto._inspect

    wrapper = klepto.no_cache(cache=klepto.archives.file_archive(name='kleptotry.kpt', cached=True, serialized=True))
    #wrapper = klepto.no_cache(cache=klepto.archives.sqltable_archive(name='sqlite:///kleptotry.db', cached=True, serialized=True))
    # sqltable archive doesn't seem to work...
    return wrapper(timewrapper)

@record
def randtry():
    return random.randint(1, 42)


if __name__ == '__main__':
    print(randtry())

Since the doc is a little sparse, I was wondering if you had any tip about what I should expect to work in klepto for my usecase and what will not, because it has never been planned that way.
And also where I could hookup my custom behaviour code, especially for changing the behavior of a piece of code (during runtime? or via external tool manipulating the archive?)...

Thanks a lot for the help!

@IvanaGyro
Copy link
Contributor

In your wrapper function,

def timewrapper(stamp=time.time(), *args):
        return stamp, fun(*args)

time.time() will not be executed every time the function is called and will only be called at the compile time. See the demo below.

import time

def foo(stamp=time.time()):
    print(stamp)

foo(). # output: 1575514093.1254961
time.sleep(2)
foo(). # output: 1575514093.1254961

If you don't need to retrieve the records, you can just pass the wrapped method to no_cache.

def add_stamp(f):
    def wrapper(stamp=None, *args):
        if stamp is None:
            stamp = time.time()
        return stamp, f(*args)
   return wrapper

@klepto.no_cache(
    cache=klepto.archives.file_archive(
        name='kleptotry.kpt',
        cached=True,
        serialized=True
    )
)
@add_stamp
def randtry():
    return random.randint(1, 42)

@asmodehn
Copy link
Author

asmodehn commented Dec 5, 2019

Indeed, time.time() is a default argument and it is not the behaviour I intended, thanks for pointing it out. Much nicer code than mine by the way.

To be more specific, one issue I have with no_cache, is that if there is no add_stamp and the function is not taking any argument, then it is assumed to be a constant function, and the stored value is returned. Commenting that @add_stamp line in your code sample, and calling randtry() multiple times, is enough to exhibit that behaviour...
Is that expected of no_cache?

From the code, it looks like no_cache still behaves like a cache, returning the archived result for a specific key from the args, instead of just calling the function and archiving/logging the return, which is what I am looking for...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants