-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
all service containers of a service are not fetched in /etc/opts/hosts file #5
Comments
Are you using the Single Host or Multi Host orchestration? and what is the version of Docker? I notice in the Multi Host solution, the availability of all services is sometimes late, and I have to rerun the commands to get them all up. Any alternative suggestion to |
I am using multiple host and docker version is 1.16.0 I did it in java/python but to keep your project as it is, it will be better to use another shell script to populate the same. |
I noticed similar issues while running MPI jobs. Some of the worker nodes occasionally get lost from the /etc/opts/hosts. It won't cause problems when running a short MPI job, but it will hang there forever for some longer jobs. Any ideas to bring the hanging jobs back? |
This might be a similar issue to #4 and netstat. I've produced a solution using I'll make a pull request. |
I have created a service with 16 containers and running an MPI task from the master node. I have noticed that not all the service containers are taking the load. Then I opened the /etc/opts/hosts file which is supposed to have a list of all service containers but I found most of the time 2-3 containers are not listed in it.
I have figured it out that this is an issue with "netstat -t" command inside get_hosts, which can not resolve all containers name and hence returning fewer addresses most of the time.
The text was updated successfully, but these errors were encountered: