-
Notifications
You must be signed in to change notification settings - Fork 704
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Addition of generic pre build and post build hooks. #9892
Comments
Example output using the
|
Related: #7394 I think this would be a welcome feature, especially if it's written with the above issue in mind so it can be extended to other actions. |
I just did a test using With an empty S3 cache: cabal clean
rm -rf ~/.cabal/store/ghc-9.6.4/
time PATH=$PATH:$HOME/.cabal/iog-hooks/ cabal build all
real 29m53.885s
user 105m12.177s
sys 11m5.029s With a warm S3 cache (ie immediately after the above): cabal clean
rm -rf ~/.cabal/store/ghc-9.6.4/
time PATH=$PATH:$HOME/.cabal/iog-hooks/ cabal build all
real 13m20.817s
user 37m24.021s
sys 6m58.173s Without the S3 cache, but just using
|
Is there an advantage to doing this using a hook rather than populating the caches based on It seems that using an approach like |
Does this issue propose to fix this?
References: |
There is another I've issue encountered, such as packages having absolute paths built into them which mean they are non-transferrable to other machines where the build location, store location or some other environment differs. I have some examples of such packages:
|
Brainstorming a bit: I can see this potentially evolving to the point where in CI or even on development machines we don't build any of the dependencies because we can point to a caching build server. If we reach this point the advantage is that if we have a large projects, components that depend on already cached dependencies can start building earlier whilst the remote build server is busy populating the cache for yet unbuilt dependencies. It could be a paid for cloud service as well, maybe something the Haskell Foundation can run? The advantage of this would be minimal idle time by build client. |
There are a few advantages to hooks that you do not get from reconstructing the store outside of cabal. As @erikd showed the patch is currently also ~20lines of extra code in cabals codebase and it permits all kinds of use for the hooks. You could run some time profiling with them, or anything else you'd like to do around builds.
ABI is a major issue. Luckily for us we are getting close to making it irrelevant for us. |
One last note: these are generic hooks. Caching may be one use for them. But hooks into cabal have been requested (and discussed outside of issue trackers) multiple times. |
Can you elaborate on the |
I wanted to understand if pre/post build hooks are the only ones we need. For example do we need a no build hook because the package was already built? |
@newhoggy caching is clearly only one instance of what could be done with these hooks. (And we likely want cabal to have more hooks!).
You can just not backup any _Paths modules in the tarball you build When you restore the cached objects into the build folder ghc will just build the ones that are missing. After that cabal will continue with copy/reg/install.
For this PoC, the pre/post hooks seem to be enough. However, you'd likely want to have many more hooks in cabal to hook different phases for experimentation, and extensions. |
Exactly. This has been requested/discussed since the dawn of time. Shell hooks are as old as unix and the other major design pattern after "pipes". I'm not sure it even warrants a discussion... and we surely shouldn't be discussing specific use cases. There are infinite. The question is more about:
I think last time it was discussed, it wasn't really clear where to insert hooks, because cabal had no clean architecture that resembles strictly the configure/build/install phases that traditional package managers have. So this would be the only concern I see: which parts are stable enough architecturally that they won't break hooks in the future. Or do we need a refactor first? |
Another place I'd consider a cabal hook is SCM support (cf. #9883) — ideally we'd just drop a script somewhere rather than needing to add it into cabal-install and make a release. (Also goes for the discussion currently going on on Matrix about possibly having |
One obvious concern with this is portability, right? At the moment most of the stuff that cabal does is limited to a known set of things that cabal handles and can be tested. Users can't currently inject random non-portable stuff, but these kinds of hooks would make it very easy to do that. (Of course you can do this today with |
I'm not sure I understand this statement. User hooks are not in scope to be tested by cabal. It's irrelevant what users do with them. If a user manages to write a package that only works with some user hooks supplied, then that's simply a broken package. What @geekosaur means, IMO, is a set of shell scripts that can act as pluggable SCM implementations. Cabal could e.g. ship them (or inline them) and allow users to overwrite them, if they want. This is very common for how source distro package managers are implemented. There's a clear separation between what is shipped and what the user wrote/configures. But this is somewhat digressing from the original proposal. A user shell hook is just a shell script that's executed. It is not a replacement for any internal cabal phase. |
I meant that it's easier for users to accidentally make packages that are non-portable, since now they have to be sure those shell scripts will run in all situations where a downstream user is compiling their package. |
I don't see how. Can you give an example? The shell hook doesn't change any internal data structures of cabal. The worst it could do is generate files that are needed for compilation, in which case it is a broken package. I don't see why users would think shell hooks are the right place to do this. |
Because users will find a way to abuse anything you give them? </sysadmin> |
If someone is relying on some random hooks to exist at some downstream user for their package, they are arguably doing it wrong. You do not ship hooks with your package. Hooks are end user customizations, nothing that package authors should consider. This is not BuildHooks. As an emacs package author you also do not rely on, or test all possible combinations of, hooks an end user might put into their config. Adding hooks to your cabal install means you (the end user) get a modified cabal. That's on you. Always. |
Yes, the hook version is 30% slower than populating cache based on |
I believe so, yes. The cache uses |
Not suggesting this per-se, but would it be possible to "fake" this particular set of hooks by wrapping ghc itself? |
Yes. Conceptually you could. Though it's harder to say why you were called if you wrap GHC. You could try to use some heuristics to determine the caller, hooks are more explicit in that scenario. |
The problem with wrapping GHC is as follows. Consider two projects, one where you do want to use hooks and one where you don't. How does a wrapped GHC tell the difference? With the hooks version the difference is inherent. |
I find myself wondering how that would interact with other things that wrap ghc (such as HLS) as well. |
Well you pass the ghc per-project with the --with-ghc flag, which can go into a project file, just like a build hook can, I imagine. I'm open to the hooks concept in general, but it seems like in this use case the "real" ask is "a ghc with high-grade distributed caching as part of the build process" so it feels more natural to me to 'fake' that directly. |
The |
The bigger issue with |
IMHO the best approach would be to better separate planning and building phases, so that you could swap the build project with your own (or part of it, or add hooks, or whatever). I think there are some considerations to make [1] but we should not bikeshed it to death. e.g. I assume this is build as in ./Setup.hs build, compiling Setup.hs is a separate thing. |
@andreabedini Yes, currently kicking the tyres on the PoC. The complete |
PR is #9899 . Only a draft, but still needs documentation and changelog entry etc. |
My main interest in hooks is interfacing with GHCup. Which phase of cabal is early enough that:
This will allow users to emulate stack behavior, where running |
These specific hooks I am proposing are not a solution to the problem you want to solve. |
I have been playing around with hooks located in Hooks should be project specific and not global for the user. For instance if I have some work projects and some personal projects, I may want to have different hooks for each, or have hooks for one and not for the other. To me it makes more sense for |
After using this for a couple of days, this project local version is VASTLY better than the user global version. I have updated the PR: #9899 |
I've had some discussion with @erikd around security. I'll let him know he should leave a comment here. |
Thanks @angerman! |
@angerman and I have had some discussion about how to make hooks secure. We think it can be made secure enough for me to think its acceptable. My current idea is that we add a file
This |
Just one UX note on this, which got lost:
Cabal would inform the user about the existing hook in the repo / or changed hash of the hook. And ask if the user wants to trust it interactively if we are in an interactive invocation. (y/N).
show would just "cat" the hook to stdout. And ask the question again. |
|
Describe the feature request
The ability to call shell scripts just before and just after each package is built.
At IOG, our use case for these is for build caching, particularly in CI. Since these pre and post hooks are just shell scripts, there are probably other uses for them.
Additional context
At IOG we have a number of very large Haskell projects with deep dependency trees, that can take a long time to build in CI.
The obvious answer to long build times is caching of build products. A previous attempt at this caching was made, but that solution was not really very satisfactory, because the cache was keyed on
${CPU}-${OS}-${GHC_VERSION)-${hash-of-dependencies}
. The first three are obvious. The problem ishash-of-dependencies
. If a single high level dependency changes, there will be no cache hit and everything will be built from scratch. As it turns out, this is actually the most common scenario.A better caching solution is one where the caching is done on individual dependencies rather than on all the dependencies as a huge blob. Caching individual dependencies means that when a high level dependency changes, there is a very high likelihood that all the lower level dependencies will still be found in the cache.
My initial implementation of this package level caching was done as a simple wrapper around
cabal
that usedrsync
to fetch and save the cache overssh
to another machine. This proved highly effective and I was able to populate the cache from one machine and use if from another (both machines running Debian Linux).However, @angerman came up with an even better solution that required adding the ability to run shell scripts before and after the build of each individual package. Using this feature (we have rough patches against cabal
HEAD
and version3.10.3.0
) we are able to use our own Amazon S3 storage for our cache. We do not propose to make this S3 storage public (obvious potential security problems) but any organization like ours or any individual could use their own S3 storage. I also have a working pair of pre and post build hooks that usessh
to a different machine as the storage backend.The naive patch against
HEAD
(error handling could be improved, maybe the hook names could be made configurable) is:An example of our
preBuildHook
script (kept in~/.cabal/iog-hooks
, S3 credentials pulled froms3-credentials.bash
, theaws
executable is from theawscli
package) is as follows:To use the cache I run
cabal
as:The text was updated successfully, but these errors were encountered: