Skip to content
This repository has been archived by the owner on Mar 28, 2018. It is now read-only.

Fix docker swarm tests in docker 17 #956

Merged
merged 4 commits into from
Jun 13, 2017

Conversation

jcvenegas
Copy link
Contributor

This PR uses the function the new added check_swarm_replicas in swarm tests.
It removes the infinite loop that potentially can hang the test.

The function check_swarm_replicas: Will wait until all the replicas
from a docker swarm service are ready. It also uses a timeout
if after the timeout is finished the function will fail.

Fixes: #938

@jcvenegas
Copy link
Contributor Author

@chavafg could you enable swarm test for docker 17 this PR ?

@jcvenegas jcvenegas force-pushed the dont-run-swarm-docker-17 branch from e57eb8e to 7ff0e65 Compare June 7, 2017 20:55
@jcvenegas
Copy link
Contributor Author

@jodh-intel nice to checkcommits is happy with my long name \o/
Detected TravisCI Environment
Found 2 commits between commit 5463eaff19b77906700d14cc4733820fc7b8f4a4 and branch master
Checking commit 974184f
Checking commit e57eb8e
All commit checks passed.

@chavafg
Copy link
Contributor

chavafg commented Jun 7, 2017

qa-failed

Rejected with PullApprove

@chavafg
Copy link
Contributor

chavafg commented Jun 7, 2017

qa-passed

Approved with PullApprove

This patcha adds a function to be used in tests related with swarm.

The function check_swarm_replicas: Will wait until all the replicas
from a docker swarm service is ready. It also uses a timeout
if after the timeout is finished the function will fail.

The function uses :

```
docker service ps SERVICE
```
instead of:

```
docker ps --filter status=running --filter ancestor=IMAGE
```

This is needed because docker 17 does not filter containers that
are part of a swarm service.

Signed-off-by: Jose Carlos Venegas Munoz <[email protected]>
@jodh-intel
Copy link
Contributor

Should that commit message be changed like this?:

s/tests: down wait/tests: don't wait/ ?


info (){
msg="$*"
echo "INFO: $msg" >&2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: do you want this to get to stderr or stdout?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer to send info to stderr just un case someone make a function and want to use the output for the command they did , but without having issues with info log. But I can change it if that does not sounds like a good approach.


info "create service testswarm1"
$DOCKER_EXE service create \
--name testswarm1 \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you use a variable here for testswarm1 and testswarm2 which can be passed to docker and used in the info call.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree! Fixing


info "create service testdns"
$DOCKER_EXE service create \
--name testdns \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another variable opportunity for testdns.

--name testdns \
--replicas $number_of_replicas \
--publish 8084:80 \
mcastelino/nettools sleep 60000
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd consider adding a comment saying that this is designed to sleep for the duration of the test.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than use these very long sleep commands, I tend to use tail -f /dev/null for a 'sleep forever' blocking function. About equally 'obscure', but at least it doesn't have some semi-random hardwired big number in it :-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jodh-intel @grahamwhaley sure, taking a look to the code looks like that is the intention, but I also prefer to do a blocking container anyway it will be destroyed at the end.
I would like @GabyCT input , I think she did this tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tried to move the command to /bin/bash but the image start to fail. I would like to keep that without modification by now.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you were trying to replace the sleep with /bin/bash then that will likely fail as /bin/bash will have no 'commands' to process, so just instantly quits. Replace 'sleep 600000' with 'tail -f /dev/null' may just work - or worst case you may have to invoke a shell and pass 'tail -f /dev/null' as an argument to that (so '/bin/bash -c "tail -f /dev/null"' for instance). Anyhow, now probably anecdotal.

@@ -40,16 +45,20 @@ setup() {
break;
fi
done
info "running $DOCKER_EXE swarm init ${swarm_interface_arg}"
$DOCKER_EXE swarm init ${swarm_interface_arg}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could use a pattern like this to avoid having to maintain both the docker command and the info string:

cmd=$DOCKER_EXE swarm init ${swarm_interface_arg}`
info "running '$cmd'"
eval $cmd

@GabyCT
Copy link
Contributor

GabyCT commented Jun 8, 2017

@jcvenegas I tested them using centos 7 with docker version 17.05.0-ce, build 89658be and with cc-oci-runtime version: 2.1.10 commit: b82bbc2 and I noticed that the mtu.bats test is hanging

@jcvenegas
Copy link
Contributor Author

@GabyCT this patch fix a hang in docker 17 for mtu dns swarm test. The fix only removes the hang from the setup function ( the test was not running, it was an issue with the test itself) . Is it only mtu test haivng issues? If so that should be a different issue from here. Could you please give more information what part of the test is having issues? In case this PR dont introduce any other regression could you please create a different issue for that ?

@GabyCT
Copy link
Contributor

GabyCT commented Jun 8, 2017

@jcvenegas , the replicas are always in the state of created but they are never in the state of up or running

@GabyCT
Copy link
Contributor

GabyCT commented Jun 8, 2017

$ docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
d2d1595bd972 nginx:latest "/bin/bash -c 'hos..." 4 seconds ago Created testswarm.1.ma3jp2jeo0ffna2wiypeeq7vs

@jcvenegas
Copy link
Contributor Author

that is not good :( . I tested in Clear Linux and is working and the CI seems that passed as well. I double check the function check_swarm_replicas it should not hang unless one of the docker commands be hanging ( that should involve a bigger PR to add timeout to all docker commands in our test).

@jcvenegas jcvenegas force-pushed the dont-run-swarm-docker-17 branch from 7ff0e65 to 5c585ad Compare June 8, 2017 15:52
@jodh-intel
Copy link
Contributor

(late reply) @jcvenegas - \o/. glad checkcommits it treating you better this time! 😄

@jcvenegas
Copy link
Contributor Author

@jodh-intel changes applied

@GabyCT
Copy link
Contributor

GabyCT commented Jun 8, 2017

@jcvenegas thanks for the discussion that we have, this lgtm as it is doing what it supposed to do even if we have still issues with swarm specially in centos

@GabyCT
Copy link
Contributor

GabyCT commented Jun 8, 2017

@jcvenegas , I created the issue #959 about Centos thanks

@chavafg
Copy link
Contributor

chavafg commented Jun 8, 2017

qa-passed

@jodh-intel
Copy link
Contributor

jodh-intel commented Jun 8, 2017

Looks like both pullapprove and checkcommits have found a couple of minor issues. Once those are resolved,

lgtm.

Approved with PullApprove

@jcvenegas jcvenegas force-pushed the dont-run-swarm-docker-17 branch from 5c585ad to f4d6ef7 Compare June 8, 2017 16:48
@jcvenegas
Copy link
Contributor Author

thank you for review the issue @GabyCT

This patch uses the function check_swarm_replicas in swarm tests.
It removes the infinite loop that potentially can hang the test.

Fixes: intel#938

Signed-off-by: Jose Carlos Venegas Munoz <[email protected]>
@chavafg
Copy link
Contributor

chavafg commented Jun 8, 2017

qa-failed

Rejected with PullApprove

@jcvenegas jcvenegas force-pushed the dont-run-swarm-docker-17 branch from f4d6ef7 to 9c34a01 Compare June 8, 2017 18:05
@chavafg
Copy link
Contributor

chavafg commented Jun 8, 2017

The timeout function seems that is working as expected, but a test does not finish correctly:

not ok 2 check that the replicas' names are different 
# (in test file swarm.bats, line 93) 
# `REPLICAS[$i]="$(curl $url 2> /dev/null)"' failed 
# Error response from daemon: This node is not a swarm manager. Use "docker swarm init" or "docker swarm join" to connect this node to swarm and try again. 
# Swarm initialized: current node (wxsynbmcbwqt5uiwi1o6lkot0) is now a manager. 
#  
# To add a worker to this swarm, run the following command: 
#  
# docker swarm join \ 
# --token SWMTKN-1-5vm4azrqaliu4sqnxdt1yz41w4zqfnehplh0o10lydvi71kbnm-2h68qegyksx9umdo0qy548mnp \ 
# 172.17.0.5:2377 
#  
# To add a manager to this swarm, run 'docker swarm join-token manager' and follow the instructions. 
#  
# us8mnv7shfx14gpch8dxp9fbe 
# Since --detach=false was not specified, tasks will be created in the background. 
# In a future release, --detach=false will become the default. 
# ID NAME MODE REPLICAS IMAGE PORTS 
# us8mnv7shfx1 testswarm replicated 0/4 nginx:latest *:8080->80/tcp 
# INFO: wait for 120 
# INFO: try 1 
# ID NAME MODE REPLICAS IMAGE PORTS 
# us8mnv7shfx1 testswarm replicated 0/4 nginx:latest *:8080->80/tcp 
# ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS 
# 8zsfpywsoo9x testswarm.1 nginx:latest fedora-cc-ci-vm.localdomain Running Preparing less than a second ago  
# j0p9cxp15y40 testswarm.2 nginx:latest fedora-cc-ci-vm.localdomain Running Preparing less than a second ago  
# 4megrnnw155u testswarm.3 nginx:latest fedora-cc-ci-vm.localdomain Running Accepted less than a second ago  
# fukvd534lhdl testswarm.4 nginx:latest fedora-cc-ci-vm.localdomain Running Preparing less than a second ago  
# INFO: replicas running : 0/4 
# INFO: try 2 
# ID NAME MODE REPLICAS IMAGE PORTS 
# us8mnv7shfx1 testswarm replicated 0/4 nginx:latest *:8080->80/tcp 
# ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS 
# 8zsfpywsoo9x testswarm.1 nginx:latest fedora-cc-ci-vm.localdomain Running Starting 1 second ago  
# j0p9cxp15y40 testswarm.2 nginx:latest fedora-cc-ci-vm.localdomain Running Starting 1 second ago  
# 4megrnnw155u testswarm.3 nginx:latest fedora-cc-ci-vm.localdomain Running Starting 1 second ago  
# fukvd534lhdl testswarm.4 nginx:latest fedora-cc-ci-vm.localdomain Running Starting 1 second ago  
# INFO: replicas running : 0/4 
# INFO: try 3 
# ID NAME MODE REPLICAS IMAGE PORTS 
# us8mnv7shfx1 testswarm replicated 1/4 nginx:latest *:8080->80/tcp 
# ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS 
# 8zsfpywsoo9x testswarm.1 nginx:latest fedora-cc-ci-vm.localdomain Running Starting 2 seconds ago  
# j0p9cxp15y40 testswarm.2 nginx:latest fedora-cc-ci-vm.localdomain Running Running less than a second ago  
# 4megrnnw155u testswarm.3 nginx:latest fedora-cc-ci-vm.localdomain Running Starting 2 seconds ago  
# fukvd534lhdl testswarm.4 nginx:latest fedora-cc-ci-vm.localdomain Running Starting 2 seconds ago  
# INFO: replicas running : 0/4 
# INFO: try 4 
# ID NAME MODE REPLICAS IMAGE PORTS 
# us8mnv7shfx1 testswarm replicated 2/4 nginx:latest *:8080->80/tcp 
# ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS 
# 8zsfpywsoo9x testswarm.1 nginx:latest fedora-cc-ci-vm.localdomain Running Starting 3 seconds ago  
# j0p9cxp15y40 testswarm.2 nginx:latest fedora-cc-ci-vm.localdomain Running Running 1 second ago  
# 4megrnnw155u testswarm.3 nginx:latest fedora-cc-ci-vm.localdomain Running Running less than a second ago  
# fukvd534lhdl testswarm.4 nginx:latest fedora-cc-ci-vm.localdomain Running Starting 3 seconds ago  
# INFO: replicas running : 0/4 
# INFO: try 5 
# ID NAME MODE REPLICAS IMAGE PORTS 
# us8mnv7shfx1 testswarm replicated 4/4 nginx:latest *:8080->80/tcp 
# ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS 
# 8zsfpywsoo9x testswarm.1 nginx:latest fedora-cc-ci-vm.localdomain Running Running less than a second ago  
# j0p9cxp15y40 testswarm.2 nginx:latest fedora-cc-ci-vm.localdomain Running Running 2 seconds ago  
# 4megrnnw155u testswarm.3 nginx:latest fedora-cc-ci-vm.localdomain Running Running 1 second ago  
# fukvd534lhdl testswarm.4 nginx:latest fedora-cc-ci-vm.localdomain Running Running less than a second ago  
# INFO: replicas running : 1/4 
# INFO: try 6 
# ID NAME MODE REPLICAS IMAGE PORTS 
# us8mnv7shfx1 testswarm replicated 4/4 nginx:latest *:8080->80/tcp 
# ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS 
# 8zsfpywsoo9x testswarm.1 nginx:latest fedora-cc-ci-vm.localdomain Running Running 1 second ago  
# j0p9cxp15y40 testswarm.2 nginx:latest fedora-cc-ci-vm.localdomain Running Running 3 seconds ago  
# 4megrnnw155u testswarm.3 nginx:latest fedora-cc-ci-vm.localdomain Running Running 2 seconds ago  
# fukvd534lhdl testswarm.4 nginx:latest fedora-cc-ci-vm.localdomain Running Running 1 second ago  
# INFO: replicas running : 2/4 
# INFO: try 7 
# ID NAME MODE REPLICAS IMAGE PORTS 
# us8mnv7shfx1 testswarm replicated 4/4 nginx:latest *:8080->80/tcp 
# ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS 
# 8zsfpywsoo9x testswarm.1 nginx:latest fedora-cc-ci-vm.localdomain Running Running 2 seconds ago  
# j0p9cxp15y40 testswarm.2 nginx:latest fedora-cc-ci-vm.localdomain Running Running 4 seconds ago  
# 4megrnnw155u testswarm.3 nginx:latest fedora-cc-ci-vm.localdomain Running Running 3 seconds ago  
# fukvd534lhdl testswarm.4 nginx:latest fedora-cc-ci-vm.localdomain Running Running 2 seconds ago  
# INFO: replicas running : 4/4 
# testswarm 
# ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS 
# wxsynbmcbwqt5uiwi1o6lkot0 * fedora-cc-ci-vm.localdomain Ready Active Leader 
# Node left the swarm. 
# Error response from daemon: This node is not a swarm manager. Use "docker swarm init" or "docker swarm join" to connect this node to swarm and try again. 

@jcvenegas
Copy link
Contributor Author

lets create an issue for that, I can skip that test for now when docker 17 is used.

@chavafg
Copy link
Contributor

chavafg commented Jun 8, 2017

qa-failed

@chavafg
Copy link
Contributor

chavafg commented Jun 8, 2017

opened #969 to track the swarm failure on Fedora.

@jcvenegas
Copy link
Contributor Author

@chavafg tests skiped:
f9b300f
Please tests again.

This patch removes some harcoded variables from swarm tests.

Signed-off-by: Jose Carlos Venegas Munoz <[email protected]>
After re-enable swarm test for docker 17. The
test "2 check that the replicas' names are different"
started to fail ( note that ever run in docker 17 before).

Signed-off-by: Jose Carlos Venegas Munoz <[email protected]>
@chavafg
Copy link
Contributor

chavafg commented Jun 9, 2017

qa-passed

Approved with PullApprove

@jcvenegas
Copy link
Contributor Author

@chavafg @gorozco1 can we merge this PR now ?

@gorozco1
Copy link
Contributor

gorozco1 commented Jun 13, 2017

lgtm

Approved with PullApprove

1 similar comment
@chavafg
Copy link
Contributor

chavafg commented Jun 13, 2017

lgtm

Approved with PullApprove

@chavafg chavafg merged commit c071416 into intel:master Jun 13, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants