-
-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Really annoying apparent deadlock during nix-store -r
in Nix 1.12
#1573
Comments
What I've found out so far: I have a thread calling Lines 271 to 279 in aca4f7d
The other thing I've found by poking around the stack trace and memory is that the download thread appears to be stuck on an infinite poll of our signalling FD: Lines 424 to 433 in aca4f7d
Meaning that the |
The 'never signaling' theory should be verifiable by checking if |
I actually realized that I was stuck on 1fe1976 with this symptom, and when I updated to c94f3d5, it appears (hard to be sure given how nondeterministic this is) that all the hangs turned into timeout errors from CURL:
|
Since switching to c94f3d5 I indeed haven't had any deadlocks, but am getting a lot more of those timeout errors. I'm wondering if something got restructured in the threading code, and maybe in the past the timeouts from curl led to a deadlock, whereas they now cause a normal failure? Anyway, here's a bunch of Nix logs of the deadlock with increased verbosity from back in 1fe1976:
As you see in the gdb logs from the original gist, this can happen in All of the frozen processes whose logs I include above seem to be in the same situation as the ones in the gist: a call to I'm including these logs because I don't think the issue was actually fixed, and although it manifests itself differently now in c94f3d5, the underlying cause is the same. Will add more information as I learn it. |
Confirmed that I'm still getting this even on a much more recent |
I marked this as stale due to inactivity. → More info |
I'm starting to see this more often since Nix 2.12 |
Managed to get a bit of strace before hanging:
|
So the last action was forking a process and reading from it. @domenkozar any idea what that process might be? It'd also be good to know what the other fds 22, 26 and 17 are. That will be earlier in the trace. |
Possibly fixed by #6469 |
I'm on a Linux box running Nix unstable at 1.12pre5511_c94f3d55 and am seeing a hard-to-reproduce/minimize freeze during
nix-store -r
. The multi-thread stack trace is at this gist and I'm trying to figure out what's confusing it.The Nix on that box is still frozen if you have ideas for other things to try while it's frozen before I kill it.
Were there any recent changes in the concurrency code? I was on a
nixUnstable
a couple months ago that I never observed this problem on, but perhaps I was just lucky?Edit: I accidentally sent it a signal which caused it to fail with "error: download of ‘https://nix-cache.s3.amazonaws.com/ar3x3yk22khfrsk88nqicm11rdkzc018.narinfo’ was interrupted", which presumably shows (based on the
curl_multi_wait
in the 4th thread's stack trace) that it was somehow stuck waiting on that download. So unfortunately no more debugging on this particular instance of the failure. Back to trying to get it to happen again...cc @edolstra
The text was updated successfully, but these errors were encountered: