Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dir_archive implementation issues #54

Open
aathan opened this issue Jun 17, 2017 · 1 comment
Open

dir_archive implementation issues #54

aathan opened this issue Jun 17, 2017 · 1 comment
Labels

Comments

@aathan
Copy link

aathan commented Jun 17, 2017

If, for example, dir_archive is used with flattened keys the str() encoding of those tuples yields directory names with parenthesis, and these ultimately do not correctly load if the archive is written non-pickled (i.e., as python objects) because invalid characters are used in the import statement that is exec()ed. There also seem to be some hacks relative to reloading the archive from disk, in particular

  1. _getkey contains [2:] vs being based on the value of PREFIX
  2. the interactions between _getdir _getkey and _lookup make various assumptions which I believe frustrate causing _fname to meaningfully modify the text representation of keys as "good" filenames
  3. _lookup in particular does not distinguish between calls made to it where the key parameter is coming from a directory name vs really being a key (sequence is _keydict()-->_getkey()-->_lookup()-->_getdir() ), implying the assumed equivalence of dir and key encodings.

I've fixed some problems in a branch, e.g., by adding a parameter to _lookup(...,isdir=False) allowing me to implement a filename encoding in _fname which eliminates problematic characters. This yields the ability to have relatively clear-text directory names, and python object storage that works. I.e., a disk cache that is easily understood by human eyes. This makes the cache useful as a backing store for, for example, function values used to replay behavior in testing frameworks. E.g., run the program once with nothing in the dir_archive, then run it again from a full dir_archive to regression test the parts that rely on the functions that got cached.

I can submit a pull request, but see 0 pull requests here, so I'm wondering if you're accepting community input here.

... I'm also wondering why _hasinput() doesn't use os.path.isfile().

@mmckerns
Copy link
Member

mmckerns commented Jul 5, 2017

@aathan: I absolutely do welcome PRs. I just have not had any on klepto as of yet. Please feel free to be the first. I tend to like to break big PRs up into smaller multiple PRs, with one idea per PR... that way they are easier to review and understand the impact of. Anyway, it sounds like you've made some good potential changes. I admittedly have some hacks in klepto, and some things that I am unsatisfied with. I feel the package is a good start, but needs some TLC to fix some of the little issues, such as those you mention.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants