Skip to content
This repository has been archived by the owner on Feb 20, 2020. It is now read-only.

agent : restart go service fails #46

Open
davidlwillson opened this issue Nov 28, 2016 · 7 comments
Open

agent : restart go service fails #46

davidlwillson opened this issue Nov 28, 2016 · 7 comments

Comments

@davidlwillson
Copy link

On a healthy cluster, running site.yml fails in the following way.

RUNNING HANDLER [agent : restart all agents] ***********************************
changed: [10.0.2.30]
changed: [10.0.2.32]
changed: [10.0.2.31]

RUNNING HANDLER [agent : restart go service] ***********************************
failed: [10.0.2.30] (item=go_services.stdout_lines) => {"failed": true, "item": "go_services.stdout_lines", "msg": "Could not find the requested service \"'go_services.stdout_lines'\": "}
failed: [10.0.2.32] (item=go_services.stdout_lines) => {"failed": true, "item": "go_services.stdout_lines", "msg": "Could not find the requested service \"'go_services.stdout_lines'\": "}
failed: [10.0.2.31] (item=go_services.stdout_lines) => {"failed": true, "item": "go_services.stdout_lines", "msg": "Could not find the requested service \"'go_services.stdout_lines'\": "}
	to retry, use: --limit @/home/dwilso004c/git/spectra-ops/issues/2016-11-04-US827369-Learn_GoCD/ansible-gocd/site.retry

PLAY RECAP *********************************************************************
10.0.2.29                  : ok=24   changed=3    unreachable=0    failed=0   
10.0.2.30                  : ok=30   changed=5    unreachable=0    failed=1   
10.0.2.31                  : ok=30   changed=5    unreachable=0    failed=1   
10.0.2.32                  : ok=30   changed=5    unreachable=0    failed=1   

As an aside, it seems the hosts must have real names. I was never able to get the cluster to build when all hosts have is IP addresses to talk with each other.

@davidlwillson
Copy link
Author

davidlwillson commented Nov 28, 2016

The error message is improved, but not eliminated by changing the value for with_items from go_services.stdout_lines to "{{ go_services.stdout_lines }}"

[dwilso004c@localhost ansible-gocd]$ git diff
diff --git a/roles/agent/handlers/main.yml b/roles/agent/handlers/main.yml
index a63e84e..2c2f273 100644
--- a/roles/agent/handlers/main.yml
+++ b/roles/agent/handlers/main.yml
@@ -14,7 +14,7 @@
 
 - name: restart go service
   service: "name={{ item | basename }} state=restarted"
-  with_items: go_services.stdout_lines
+  with_items: "{{ go_services.stdout_lines }}"
   become: yes
 
 - name: restart go-server
[dwilso004c@localhost ansible-gocd]$ 

@davidlwillson
Copy link
Author

Now, my output is:

RUNNING HANDLER [agent : restart all agents] ***********************************
changed: [10.0.2.32]

RUNNING HANDLER [agent : restart go service] ***********************************
failed: [10.0.2.32] (item=/etc/default/go-agent1) => {"failed": true, "item": "/etc/default/go-agent1", "msg": "Unable to restart service go-agent1: Job for go-agent1.service failed because the control process exited with error code. See \"systemctl status go-agent1.service\" and \"journalctl -xe\" for details.\n"}
	to retry, use: --limit @/home/dwilso004c/git/spectra-ops/issues/2016-11-04-US827369-Learn_GoCD/ansible-gocd/site.retry

@davidlwillson
Copy link
Author

To get this output just stop and disable the go-agent1 service and re-run the site playbook.

@michaelbannister
Copy link
Contributor

I'm also having trouble having recently pulled the latest version of this role from galaxy.
I wonder if it might have something to do with the 'service' module in the latest version of Ansible and the way it deals with services defined in System V style with init.d scripts on a target system with systemd?
I'm also having trouble with the ssh-agent stuff failing to load my defined GOCD_SSH_PRIVATE_KEY, not sure if that's related.
I'd be willing to have a go at creating a systemd service file definition, if anyone thinks that might be useful / help solve this problem?!

@davidlwillson
Copy link
Author

Actually, it appears that we're deploying the service brokenly.

[ ~]$ sudo systemctl restart go-agent1
Job for go-agent1.service failed. See 'systemctl status go-agent1.service' and 'journalctl -xn' for details.
[ ~]$ sudo systemctl restart go-agent2
Job for go-agent2.service failed. See 'systemctl status go-agent2.service' and 'journalctl -xn' for details.
[ ~]$ sudo systemctl status go-agent1
go-agent1.service - LSB: Go Agent1
   Loaded: loaded (/etc/rc.d/init.d/go-agent1)
   Active: failed (Result: exit-code) since Tue 2016-12-20 00:11:59 UTC; 18s ago
  Process: 19404 ExecStart=/etc/rc.d/init.d/go-agent1 start (code=exited, status=255)

Dec 20 00:11:44 spectra-ch2-a8p.sys.comcast.net systemd[1]: Starting LSB: Go Agent1...
Dec 20 00:11:44 spectra-ch2-a8p.sys.comcast.net su[19407]: (to go) root on none
Dec 20 00:11:44 spectra-ch2-a8p.sys.comcast.net su[19407]: pam_unix(su:session): session opened for user go by (uid=0)
Dec 20 00:11:44 spectra-ch2-a8p.sys.comcast.net go-agent1[19404]: [Tue Dec 20 00:11:44 UTC 2016] using default settings from /etc/def...gent1
Dec 20 00:11:59 spectra-ch2-a8p.sys.comcast.net go-agent1[19404]: Error starting Go Agent1.
Dec 20 00:11:59 spectra-ch2-a8p.sys.comcast.net systemd[1]: go-agent1.service: control process exited, code=exited status=255
Dec 20 00:11:59 spectra-ch2-a8p.sys.comcast.net systemd[1]: Failed to start LSB: Go Agent1.
Dec 20 00:11:59 spectra-ch2-a8p.sys.comcast.net systemd[1]: Unit go-agent1.service entered failed state.
Hint: Some lines were ellipsized, use -l to show in full.
...

@michaelbannister
Copy link
Contributor

michaelbannister commented Dec 23, 2016

The go-agent-service script has this check_proc function which looks like perhaps it's expecting an older version of GoCD?

check_proc() {
		pgrep -u go -f go-agent{{ item }}-running
}

Compare to the current go-agent.init script from GoCD:

check_proc() {
    pgrep -u go -f /usr/share/${SERVICE_NAME}/agent-bootstrapper.jar >/dev/null
}

@cnatan
Copy link

cnatan commented Jan 6, 2017

This check_proc function is always failing looking for "go-agent{{ item }}-running" in the process "full command line". I've sent a pull request to fix it, check it out.

Pull request: #51

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants