Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Standardize usage of machine-wide caches #3930

Open
lihaoyi opened this issue Nov 9, 2024 · 1 comment
Open

Standardize usage of machine-wide caches #3930

lihaoyi opened this issue Nov 9, 2024 · 1 comment

Comments

@lihaoyi
Copy link
Member

lihaoyi commented Nov 9, 2024

In the past we only had Coursier managing its own machine-global cache for maven central, but this is starting to grow:

  • Custom Java version support needs somewhere to put the JVMs
  • Android support needs somewhere to put the Android SDK and other heavyweight tools
  • The example PythonModule needs somewhere to cache its PIP downloads
  • The example TypescriptModule could use somewhere to cache its NPM downloads
  • Once the filesystem-independent out/ folder layout lands, we could begin sharing out/ folder results between projects, and that would also need some standard place to put the artifacts

The basic idea is that when someone calls clean, they usually dont want to start all the way from scratch re-downloading every jar from Maven Central and re-downloading their JVM. These external dowloads typically are downloaded once and cached forever, and only very rarely do people want to clean them, in comparison to needing to clean local build outputs.

We should try and standardize how these "global" cached downloads are handled so people can just plug into the standard, rather than creatively coming up with their own solutions that end up being half-baked or inconsistent

@0xnm
Copy link
Contributor

0xnm commented Nov 12, 2024

My 2c on this: such cache structure vary a lot between the different ecosystems, because the particular ecosystem dictates the structure.

For example, in case of Maven dependencies the standard file cache layout supports artifact versioning as a part of the artifact's full path and it is easy to have different artifact versions out-of-project, somewhere on the machine.

If we take a look on the NPM dependencies, for example, node_modules doesn't support hosting different versions of the same artifact by default (unless package-aliasing is used? or maybe workspaces can help?). In this case such cache unlikely can be machine-wide and then becomes more project-wide (especially if we think that some dependencies are built-in, e.g. test reporting in Kotlin/JS, and versions can be different between different Mill installations). Now, let's say we have a project with several JS modules, and it won't be very wise to download the same NPM dependency several times (for each module), so it probably makes sense to have some folder at the root of the project, but there is no:

  • built-in API to access root module (or root out folder). The closest way to access it (instead of using T.dest which is bound to the task) with the current API will be, probably, something like upd: okay, it seems T.workspace can access root, so it will be like:
val dir = T.workspace / "out" / "js"
  • way to control that there is only one job to download the particular dependency instead of the parallel ones. For example, if Module A declares a dependency foo and Module B declares the same dependency foo, it shouldn't be parallel downloads: if Module A is already downloading foo, Module B should wait and consume the result instead of shooting its own download job (is it the case currently with Maven dependencies download?).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants