Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More powerful linking of scratch spaces to versions #23

Open
oxinabox opened this issue Mar 9, 2021 · 3 comments
Open

More powerful linking of scratch spaces to versions #23

oxinabox opened this issue Mar 9, 2021 · 3 comments

Comments

@oxinabox
Copy link

oxinabox commented Mar 9, 2021

I don't think this is important stuff, but just noting them down.

Consider the instructions for making scratch spaces that are keyed to the version.
This is kind of limitted

let's say my package does some data fetching and processing and then exposes it to the user.
It uses the sratch space to hold the processed data, so it doesn't redownload it.

Now lets say different versions of my package does different processing.
So you don't want to use a file that was processed by v1 of the package with v6 because .e.g. we have changed from representing time as a DateTime to using a ZonedDateTime.
So it should redownload the file.
So keying a subfolder to the version gets us some of that.
but there are some problems

Deleting

So this could be quite a lot of data stored in the v1 sub-folder.
and it would be good if it was deleted when uninstalled v1 of the package.
For this we kind of need the package manager to know about the folder

clear_scratchspaces!(pkg) is a decent approximation for a lot of use cases

Compatibility

Consider a more advanced version of the above.
What if while between v1 and v6 the time representation changed, but some files don't actually have any time data, so actaully remain compatiable.

The way we (Invenia) have solved this before is by giving seperate version numbers to the Data and the the Program, and then having the Program declare which versions of the data is is happy to use via semver_spec.
You can see one used of this in JLSO which declared which file versions it can read and which it can write.
We have a more intense version of this for interally with some datadeps.

So if both the scratch (sub)-space and the package have seperate versions, the package just declares which versions it can work with and then goes and find the newest that exists right now meeting that semver_spec it said it could work with.
You can do this at the function level.
(pretty much what our internal project on datadeps does)

But it would be cooler if the package manager was aware of it, so that it could tie in to automatic deleting

@cossio
Copy link

cossio commented Mar 10, 2022

Wouldn't the second more advanced example be solved if you just wrap the data in a thinner package of its own?
Then the processing package depends on the data package, and you get all the semver compat stuff from Pkg automatically.

A related question: At the moment, if I create a version-specific scratch-space (https://github.com/JuliaPackaging/Scratch.jl#can-i-create-a-scratch-space-that-is-not-shared-across-versions-of-my-package), will it be garbage collected automatically when that version of the package is removed (even though there is another version of the package installed through an update, say, with its own version-specific scratch that should not be removed)?

@oxinabox
Copy link
Author

Yeah, I have lately been thinking about just automatically generation jll-like packages just for managing data

@willow-ahrens
Copy link

I would also like this feature, I have a compiler that I'm writing a caching mechanism for, and it doesn't really make sense to keep the compiled code for v0.4 when we are using v0.5 of the compiler now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants