Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Could not connect to Ryuk #7300

Closed
ujhazib opened this issue Jul 6, 2023 · 8 comments
Closed

[Bug]: Could not connect to Ryuk #7300

ujhazib opened this issue Jul 6, 2023 · 8 comments
Labels
resolution/waiting-for-info Waiting for more information of the issue author or another 3rd party. type/bug

Comments

@ujhazib
Copy link

ujhazib commented Jul 6, 2023

Module

PostgreSQL

Testcontainers version

1.18.3

Using the latest Testcontainers version?

Yes

Host OS

Linux (we both a failing CentOS Linux 7 and a Debian GNU/Linux 11 now)

Host Arch

x86_64

Docker version

Client: Docker Engine - Community
 Version:           24.0.2
 API version:       1.43
 Go version:        go1.20.4
 Git commit:        cb74dfc
 Built:             Thu May 25 21:52:17 2023
 OS/Arch:           linux/amd64
 Context:           default

Server: Docker Engine - Community
 Engine:
  Version:          24.0.2
  API version:      1.43 (minimum version 1.12)
  Go version:       go1.20.4
  Git commit:       659604f
  Built:            Thu May 25 21:52:17 2023
  OS/Arch:          linux/amd64
  Experimental:     true
 containerd:
  Version:          1.6.21
  GitCommit:        3dce8eb055cbb6872793272b4f20ed16117344f8
 runc:
  Version:          1.1.7
  GitCommit:        v1.1.7-0-g860f061
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

What happened?

When running some integration tests of our product inside a gradle:7.6.1-jdk17 container, they fail with connection time out on Ryuk. See relevant log output section.

Relevant log output

08:16:41.381 [Test worker] INFO  org.testcontainers.utility.ImageNameSubstitutor - Image name substitution will be performed by: DefaultImageNameSubstitutor (composite of 'ConfigurationFileImageNameSubstitutor' and 'PrefixingImageNameSubstitutor')
08:16:42.484 [Test worker] INFO  org.testcontainers.dockerclient.DockerClientProviderStrategy - Found Docker environment with local Unix socket (unix:///var/run/docker.sock)
08:16:42.518 [Test worker] INFO  org.testcontainers.DockerClientFactory - Docker host IP address is 172.17.0.1
08:16:42.551 [Test worker] INFO  org.testcontainers.DockerClientFactory - Connected to docker: 
  Server Version: 24.0.2
  API Version: 1.43
  Operating System: CentOS Linux 7 (Core)
  Total Memory: 15883 MB
08:16:42.589 [Test worker] INFO  tc.testcontainers/ryuk:0.5.1 - Creating container for image: testcontainers/ryuk:0.5.1
08:16:42.929 [Test worker] INFO  tc.testcontainers/ryuk:0.5.1 - Container testcontainers/ryuk:0.5.1 is starting: 6e1ec6400afc2ec00e8f716d8dcfd1630642cab26796d77f8a7145c9809f1486
08:16:44.045 [Test worker] INFO  tc.testcontainers/ryuk:0.5.1 - Container testcontainers/ryuk:0.5.1 started in PT1.481417995S
08:16:49.054 [testcontainers-ryuk] WARN  org.testcontainers.utility.RyukResourceReaper - Can not connect to Ryuk at 172.17.0.1:32791
java.net.SocketTimeoutException: Connect timed out
	at java.base/sun.nio.ch.NioSocketImpl.timedFinishConnect(NioSocketImpl.java:546)
	at java.base/sun.nio.ch.NioSocketImpl.connect(NioSocketImpl.java:597)
	at java.base/java.net.SocksSocketImpl.connect(SocksSocketImpl.java:327)
	at java.base/java.net.Socket.connect(Socket.java:633)
	at org.testcontainers.utility.RyukResourceReaper.lambda$null$0(RyukResourceReaper.java:92)
	at org.rnorth.ducttape.ratelimits.RateLimiter.doWhenReady(RateLimiter.java:27)
	at org.testcontainers.utility.RyukResourceReaper.lambda$maybeStart$1(RyukResourceReaper.java:88)
	at java.base/java.lang.Thread.run(Thread.java:833)
... < previous exception reappearing dozen of times before it shuts down>

Additional Information

Volume mapping /var/run/docker.sock:/var/run/docker.sock is in place

This is working usually, and we see this issue in rare occasions. To be precise, I think I saw this only twice in a year but now two servers started to fail with the same issue.

We were on Testcontainers v1.18.0 when it started and I have updated to 1.18.3 because of the resolution comment in this:
https://github.com/testcontainers/testcontainers-java/issues/7036#issuecomment-1568674653https://github.com/testcontainers/testcontainers-java/issues/7036#issuecomment-1568674653

Didn't work. I also used host override with various examples, like host.docker.internal, didn't work either. When the test was still running, I checked docker and I saw ryuk there working properly, I could exec a shell inside of it and check if it is responding.

I don't have any repro idea or steps. We had this issue way more times on WSL2, but I'm hoping that testcontainers v 1.18.3 solves at least that.

I have tried numerous prune commands on docker to remove dangling stuff, of course didn't help.

@eddumelendez
Copy link
Member

can you see the the docker.sock available at /var/run/docker.sock in your host machine? Are you using Docker Desktop for Linux?

@eddumelendez eddumelendez added the resolution/waiting-for-info Waiting for more information of the issue author or another 3rd party. label Jul 6, 2023
@ujhazib
Copy link
Author

ujhazib commented Jul 6, 2023

Yes, I see docker.sock and it is ok. We are not using Docker Desktop, see the Docker version section above. It is Docker Engine Community edition

@eddumelendez
Copy link
Member

I think you are doing something like this https://java.testcontainers.org/supported_docker_environment/continuous_integration/dind_patterns/

I’ll move this to a discussion later today. I’m guessing this is related to the environment itself and not issue in testcontainers itself.

@ujhazib
Copy link
Author

ujhazib commented Jul 6, 2023

Thanks. I am also checking if it is an issue with the env itself, but it was working earlier, other servers are ok, seem to be the exact same symptoms we had on WSL2 which was a testcontainers issue, so that is why I opened the ticket.

@eduardolbueno
Copy link

Are you really sure that is the cause of the failure? It also happens to me a random number of times, but never causes the tests to actually fail since Ryuk eventually connects. In my cause it logs a different java exception though (Connection Refused)

2023-07-05T18:59:45.893-03:00  WARN 73409 --- [containers-ryuk] o.t.utility.RyukResourceReaper           : Can not connect to Ryuk at localhost:33098

java.net.ConnectException: Connection refused
	at java.base/sun.nio.ch.Net.pollConnect(Native Method)
	at java.base/sun.nio.ch.Net.pollConnectNow(Net.java:672)
	at java.base/sun.nio.ch.NioSocketImpl.timedFinishConnect(NioSocketImpl.java:542)
	at java.base/sun.nio.ch.NioSocketImpl.connect(NioSocketImpl.java:597)
	at java.base/java.net.SocksSocketImpl.connect(SocksSocketImpl.java:327)
	at java.base/java.net.Socket.connect(Socket.java:633)
	at org.testcontainers.utility.RyukResourceReaper.lambda$null$0(RyukResourceReaper.java:92)
	at org.rnorth.ducttape.ratelimits.RateLimiter.doWhenReady(RateLimiter.java:27)

@ujhazib
Copy link
Author

ujhazib commented Jul 6, 2023

Looks like it was indeed a setting problem. See --icc option here:

https://docs.docker.com/engine/reference/commandline/network_create/#bridge-driver-options

It was working before so not sure if somebody played on our server and we got this now or this is a flaky behavior and it was there and worked for a while. Thanks to look into anyways

@ujhazib ujhazib closed this as completed Jul 6, 2023
@gianluca1606
Copy link

gianluca1606 commented Aug 3, 2023

Are you really sure that is the cause of the failure? It also happens to me a random number of times, but never causes the tests to actually fail since Ryuk eventually connects. In my cause it logs a different java exception though (Connection Refused)

2023-07-05T18:59:45.893-03:00  WARN 73409 --- [containers-ryuk] o.t.utility.RyukResourceReaper           : Can not connect to Ryuk at localhost:33098

java.net.ConnectException: Connection refused
	at java.base/sun.nio.ch.Net.pollConnect(Native Method)
	at java.base/sun.nio.ch.Net.pollConnectNow(Net.java:672)
	at java.base/sun.nio.ch.NioSocketImpl.timedFinishConnect(NioSocketImpl.java:542)
	at java.base/sun.nio.ch.NioSocketImpl.connect(NioSocketImpl.java:597)
	at java.base/java.net.SocksSocketImpl.connect(SocksSocketImpl.java:327)
	at java.base/java.net.Socket.connect(Socket.java:633)
	at org.testcontainers.utility.RyukResourceReaper.lambda$null$0(RyukResourceReaper.java:92)
	at org.rnorth.ducttape.ratelimits.RateLimiter.doWhenReady(RateLimiter.java:27)

I am new to test containers and this also happens to me locally, after a few tries the connection is stablished and i see

[ restartedMain] o.t.utility.RyukResourceReaper : Ryuk started - will monitor and terminate Testcontainers containers on JVM exit

I am using Rancher Dekstop and Dockerd (M1 Mac) i also have docker available at /var/run/docker.sock.

Is this something i should worry about?

On the pipelines everything works fine without any errors.

@eddumelendez
Copy link
Member

There is Rancher Desktop section that has been added about how to set up Testcontainers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
resolution/waiting-for-info Waiting for more information of the issue author or another 3rd party. type/bug
Projects
None yet
Development

No branches or pull requests

4 participants