-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Race condition causes"Reached target .*Multi-User System.*"
#3758
Comments
Cluster creation fails after just 3s, though the logs eventually show the correct string:
|
Something funny is going on here. The error stack indicates that we are hitting this line, which seems like it is only expected to be invoked when the 30s log timeout is hit. |
Adding to the exceptional weirdness; this fails 100% of the time when invoked directly, but fails ~25% of the time when invoked within strace:
|
To validate my hypothesis, I wrote a wrapper script for
Cluster creation magically succeeds. |
"Reached target .*Multi-User System.*"
"Reached target .*Multi-User System.*"
The log attached has 8 worker node, can you paste the exact configuration and steps that youa are running, |
This reproduces with just |
No, this system has cgroupsv2. |
This is missing a lot of important details about your environment that may be useful troubleshooting what is happening. Can you update the description from the requested output from the issue template? Environment:
Also, since you can get this to repro with just a basic |
|
Thanks for the extra details! I didn't realize it wasn't on the questions template either. :] |
Logs from a standalone
|
Is it possible there is a race condition between |
Switched link to permalink so we get the exact current code here: kind/pkg/cluster/internal/providers/docker/provision.go Lines 410 to 417 in 10e058c
|
Maybe |
Sounds like If the latter, then we may have to add some check that the container is running before fetching logs ... but that's been assumed to be true from |
FYI, I am also hitting this bug with the following use case:
And the command line which cause the error:
This problem does not occur when kind cluster only have one node (i.e. control plane). I hope this will help. |
let me ask @AkihiroSuda if this rings a bell to him? this is a really weird problem |
I think we need to capture the We aren't capturing the error when |
Maybe this may help?
And the docker daemon logs
|
Thanks but I don't think that tells us what happened unfortunately, most likely we'll have to patch the code in an environment that can reproduce this to log the output of the |
On my side I have found my error, it was an issue in my kind configuration file. This was preventing the api-server from starting. It seems that if the api-server fails to start (i.e. |
I am unable to create a Kind cluster using
kind create
. I get an error:When I run
kind export logs
, I plainly see this line in thejournal.log
:I don't see anything else in the log bundle to indicate an error:
kind_logs.tar.gz
Appendix: things I eliminated as possible root causes from similar issues
kind
.max_user_instances
ormax_user_watches
. I have increased these to extremely large limits.The text was updated successfully, but these errors were encountered: