-
Notifications
You must be signed in to change notification settings - Fork 90
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Building only dependencies fragile wrt symlinking strategy #385
Comments
AFAICT all the symlinking stuff is not worth pursuing, and taring the target dir ( My intuition tells me, that ff it was possible to implement target deduplication via extracing layered archives things would really work well. I've been responsible for CI for a large Rust projest, that builds > 10GB of |
Hi! It might be that we’re compiling different kinds of Rust projects. In the ones that I work with, there is perhaps a relatively large dependency tree (think 400 crates) but relatively small amount of project code (say, 10-20k lines). Think, small crud app which has a grpc api, accesses a database, does some serde and uses some async. What this means is that compiling the dependencies takes quite a bit of time if you’re doing it from scratch, while the incremental cost can be pretty good. Put simply, without Crane the typical dev recompile flow is measured in the single digit seconds. My users are comparing Crane usage to raw Cargo, so the expectation on ‘I have made a small change’ is a relatively cheap build. Obviously, the normal dev cycle is for people to run ‘cargo test’ or similar, but my overall aim is that people should want to run ‘nix flake check’ as their last step before pushing, and that it should typically run quickly. The cost of zstd-encoding and copying the output is considerable even on small projects, especially compared to the incremental cost of compilation in the typical cargo workflow. It also leads to considerable storage bloat - consider the crane example which runs nextest, doc, build, and clippy simultaneously. With the symlinking strategy, if there are 2GB of compiled artifacts, the build can be made to use about 2GB of storage. With the untarring strategy, it will use 8GB. This is all to say - as a comparison to recompiling, zstd and tarring is obviously better - but as a comparison to iterative cargo compilation it is perhaps weaker. Your comment does however give me an idea - I’d wonder if a tar of symlinks might represent the best of both worlds in terms of performance? It would likely be much faster to extract than traversing the filesystem due to sequential disk access and reduced cost of forking. |
Your users should keep using raw cargo in a dev shell during dev work anyway. They don't have to compare with raw cargo, because they will be using raw cargo. Having devs run
That's just a waste of time and disk space. My users do
With taring (and most importantly zstd) 12GB of artifacts compresses to like 2GB In addition if the incremental layers were implemented taring would become ever better. I'm confident that layers of incremental compressed archives is all around the best strategy. In theory symlinking could be more performant, but in practice it's an uphill battle, where cargo will break it in weird ways, because we'll be messing with its internals that are not guaranteed to be stable, and no one tests or even cares about this use case. |
I think there are some assumptions you made there which I don’t think hold
in the cases of many.
Regarding the extraction time - I’ve observed that CI nodes are typically
rather pressed for disk bandwidth. For example, a typical Azure host will
have 100-200MB/sec of disk write throughout, meaning that a 12GB extraction
will take perhaps 1 minute - not insignificant, especially problematic if
it happens 4 times - 4 minutes of waiting is something worth avoiding. This
is obviously system dependent, but that order of magnitude remains common.
This is to say that just because it is fast for you does not mean it is
fast for all.
To the tempdir thing - if you are running 4 tasks in parallel, that’s how
you get to the 12GB live disk space usage result. This also negatively
affects the OS’ ability to cache well, it will tend to thrash the page
cache.
For one other thing - nix store optimisation will tend to ensure that
rebuilds, though time consuming, don’t take up more space in the store,
whereas tarring up results will. This means that after running build 5
times with no GC, you will be using more disk space. For those of us who
run S3 backed caches, the garbage collection problem is more acute.
I have another proposal for a solution which would likely be practical, btw
- use RUSTC_WRAPPER to automatically clear out build output symlinks.
…On Fri, 15 Sep 2023 at 21:42, Dawid Ciężarkiewicz ***@***.***> wrote:
My users are comparing Crane usage to raw Cargo, so the expectation on ‘I
have made a small change’ is a relatively cheap build.
Your users should keep using raw cargo in a dev shell during dev work
anyway. They don't have to compare with raw cargo, because they will be
using raw cargo.
Having devs run nix commands during dev work is one of the biggest
mistakes one can make when using Nix flakes.
my overall aim is that people should want to run ‘nix flake check’ as
their last step before pushing, and that it should typically run quickly.
That's just a waste of time. My users do just lint or just final-check
none of which invokes nix, but only uses the inputs provided by the dev
shell to do what needs to be done. Since barely anyone ever touches Nix
code, if things work in a dev shell, 99.9% they will work in Nix.
With the untarring strategy, it will use 8GB.
With taring (and most importantly zstd) 12GB of artifacts compresses to
like 2GB target.zstd.tar. The build directory itself is being discarded
after Nix derivation is compiled so it largely doesn't matter how much
space it uses when Rust is building. Compression & extraction are fast (I
can barely notice them). What matters most is much the end artifacts take
in the /nix/store.
In addition *if the incremental layers were implemented* taring would
become ever better. I'm confident that layers of incremental compressed
archives is all around the best strategy.
In theory symlinking could be even better, but in practice it's an uphill
battle, where cargo will break it in weird ways, because we'll be messing
with its internals that are not guaranteed to be stable, and no one tests
or even cares about this use case.
—
Reply to this email directly, view it on GitHub
<#385 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAAKMJNWTHKOH4FJHFVHQNLX2S4R5ANCNFSM6AAAAAA4ZLZWOM>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
I wonder if Nix is using some kind of a ramdisk for the build (maybe it's configurable?). So the disk IO is not the 12GB of uncompressed data, but 2GB of compressed data. Things are just flying too fast, and your ballpark calculations are reasonable. This has its own problems (running out of memory). Just did the quick check: One screenshot is just dependencies (smaller), the other one is whole workspace extracted for the test derivation. Running on Jetbuild for Github Actions.
Taring the difference will ensure near 0 redundancy, and I'm 100% certain it can be done in a very performant way. This is currently unimplemented, and even with the redundancy, it's been working great. I don't deny that symlink approach would be faster, especially in theory. But as someone that have spent days debugging all sorts of issues in our Nix CI pipeline for various reasons (most my mistakes, but some were crane bugs, some just general complexity), and at least 8 hours recently on mysterious rebuilds in newer versions of Rust toolchain, I value the brutal simplicity and robustness of taring over marginal performance improvements. Not to mention that even right now the symlinking is partial, IIRC, because reasons. I would be happy to have a perfectly working "tar the symlinks" model as it seems perfect. I would use it, but I guarantee it will break sometimes in mysterious ways, because |
I believe what you're observing is GNU tar's behaviour where it doesn't fsync when extracting. Effectively, this means that you have a ramdisk so long as you have free memory - but when you run out, it'll slow down to the speed of your disk. Linking back to the point in my previous comment, the workflow you are describing will work great so long as resources are good (lots of ram, fast disk) but drop either or both of those (due to e.g. being on some colo box) and the effects are outsized. I've had another go at implementing symlinking based on my comments from last night, and it appears to work well - much faster than the present strategy and robust to crate rebuilds. PR is at #386! would appreciate if you could give it a shot to see what performance is like for you! The sense I get is that it uses rustc in a way that's robust to changes - it tells rustc to emit data into another directory and overwrites symlinks, rather than relying on rustc doing the right thing. |
@j-baker I commented on your PR. It's probably an improvement, even if not in my favorite approach. :D BTW. It occurred to me that one reason why Which reminded me that in your approach you can only symlink cargo artifacts, while incremental deduplication could dedup anything. In our project we (very unfortunately) have quite a few C dependencies and I wander how much do
I think you're right here, but also I don't think this is going to be an issue. The system you're building needs resources corresponding to the project size, and most of these files will be read over and over during the build, so the fact that they are warm in the memory is a plus. If you want a fast CI, you want your |
In
In a way extracing zstd might have saved time over symlinking again, by just reading everything into in-memory disk cache with one big sequential read, while doing only 20% of the IO. |
Since I tried the improved symlinking in the current revision I'd like to post some metrics I can already capture:
So copying just the dependencies takes 6s (vs 4s from the zstd version) But most importantly you see the wall of text in a post fixup phase:
That's an extra Further more the disk space use is much, much higher:
3.8G vs 818M with zstd. And that's only the dependencies! Now when the workspace inheriting deps-only artifacts:
3s vs 4s with zstd. So far no real speedup on inheriting, and slower savings, larger disk usage. (and we ignore the overhead of rustc wrapper). Then the current version fails, but I don't think these metrics will change, so there really need to be some sweet gains in the second stage to make up for this losses in the first one. |
flake.nix: add workaround for ipetkov/crane#385
Hi! Recently crane changed to symlinking rlib and rmeta files when reading dependencies. For the random reader, Crane generates the dependencies in one nix build, and then uses them in a later one. Effectively, you generate your cargo
/target
dir in one build step, and then you 'put it into place' in the next build step.This has many practical advantages.
Obviously, this is awesome - way less time spent copying voluminous object files. Unfortunately, it leads to a common failure mode - if a dependency is out of date somehow and cargo decides it needs to rebuild it, this rebuild will fail - it tries to write to the target of the symlink and this fails, as it's a nix store path.
I've seen this caused for 2 reasons:
CMAKE_PREFIX_PATH
not having changed. Somehow, it is changing between the root build and the 'clippy' build. The only difference I can tell is that clippy is added to the nativeBuildInputs. I'm wondering if clippy itself causes the cmake prefix path to change. Either way, build explodes.This can be worked around by setting
doNotLinkInheritedArtifacts = true
in most cases, but not all. In particular, I struggled to get a build which usedcxx
to build, because cxx created absolute symlinks which were dangling post copy into the Nix store, meaning thatcp
itself then fails. This is now fixed on master of that project.The takeaway for me is that this change makes Crane significantly more brittle when building Rust crates, and the error which is emitted can be rather hard to debug.
A sample error would be:
which if you know how crane works, makes sense - but it's not very introspectable for abstract random dev joining the project, because the symlinking behaviour isn't that obvious. I'm trying to make it so that new engineers don't need to know how crane works in order to build their code - rather, they just need to know that it does.
I've got a few proposals for ways out:
The opencv issue
Here, crane is leading to a diff in CMAKE_PREFIX_PATH, between building deps and other tests. This genuinely just seems like a bug in crane - arguably Crane should be ensuring that insofar as the env can lead to cargo changes, it should be compatible between all builds. In my head, this is likely being caused by Crane amending the nativeBuildInputs being passed in. I'm wondering if it might be more robust to add all the relevant inputs to nativeBuildInputs every time, rather than amending to the minimal set. However, it's possible I've misdiagnosed this specific issue - the person I'm working with has not yet provided me with the CMAKE_PREFIX_PATH of the 'build deps' and 'run clippy' tests.
The more general issues
cp --reflink=always file1 file2
between the two directories, and upon success, doingcp -r --reflink=always
as opposed to the symlinking. A further blocker to this might be that Nix builds in temp directories on a different filesystem to the Nix store - I am not sure about the specifics. I believe this would likely be ok on btrfs but may prove more of an issue on xfs. In any case, an optimisation rather than a total fix. However, if you could make this work on Linux, it might prove better on Mac.Just spitballing here - don't have any particular agenda - just have personally found this confusing - and as the person doing most of the 'new nix stuff' at my company, this leads to others finding this way more confusing.
The text was updated successfully, but these errors were encountered: