Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Import Cluster fails on "node X doesn't have network details populated" with Ansible 2.8 #1084

Open
GowthamShanmugam opened this issue Apr 24, 2019 · 11 comments

Comments

@GowthamShanmugam
Copy link
Contributor

GowthamShanmugam commented Apr 24, 2019

With Beta 1 build of Ansible 2.8, it's not possible to import Gluster Trusted
Storage pool into Tendrl, as Cluster Import task fails with error:

Node doesn't have network details populated

GowthamShanmugam added a commit to GowthamShanmugam/commons that referenced this issue Apr 24, 2019
…d" with Ansible 2.8

bugzilla: 1702412
tendrl-bug-id: Tendrl#1084

Signed-off-by: GowthamShanmugasundaram <[email protected]>
GowthamShanmugam added a commit to GowthamShanmugam/commons that referenced this issue Apr 24, 2019
…d" with Ansible 2.8

bugzilla: 1702412
tendrl-bug-id: Tendrl#1084

Signed-off-by: GowthamShanmugasundaram <[email protected]>
@SalsaBr
Copy link

SalsaBr commented Jun 17, 2019

I ran across this same sympton and could not recover from the error downgrading to ansible 2.7 nor trying to unmanage the cluster. The cluster is stuck in an error state.

@GowthamShanmugam
Copy link
Contributor Author

Does /tmp directory have execution permission? if not, please remount /tmp directory with exec permission: https://askubuntu.com/questions/311438/how-to-make-tmp-executable

@SalsaBr
Copy link

SalsaBr commented Jun 18, 2019

Yes, it does. On all nodes

@GowthamShanmugam
Copy link
Contributor Author

Please unmanage the cluster and wait for all the nodes will be detected by tendrl server. Fire import after all the nodes are listed with fqdn.

@SalsaBr
Copy link

SalsaBr commented Jun 21, 2019

The unmanage function is not working either. Nothing happens and I can't check on it's progress.

@GowthamShanmugam
Copy link
Contributor Author

Oh ok you already mentioned like un-manage is not working sorry :), Please check the log file in /var/log/messages. I think in each sync it may populate the error in the log file.

@GowthamShanmugam
Copy link
Contributor Author

If nothing working I will help you in a remote call to solve this problem

@SalsaBr
Copy link

SalsaBr commented Jun 24, 2019

I tried to clear everything by deleting my etcd partition, then installed tendrl againand this is what I got now:

Child jobs failed are [u'02239281-7b28-4cec-8d11-7c9f6b82fca1']

Failed atom: tendrl.objects.Cluster.atoms.ImportCluster on flow: Import existing Gluster Cluster

Failure in Job fdd906f4-4089-4e0b-9680-4f88aea55963 Flow tendrl.flows.ImportCluster with error: Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/tendrl/commons/jobs/init.py", line 240, in process_job the_flow.run() File "/usr/lib/python2.7/site-packages/tendrl/commons/flows/import_cluster/init.py", line 131, in run exc_traceback) FlowExecutionFailedError: ['Traceback (most recent call last):\n', ' File "/usr/lib/python2.7/site-packages/tendrl/commons/flows/import_cluster/init.py", line 98, in run\n super(ImportCluster, self).run()\n', ' File "/usr/lib/python2.7/site-packages/tendrl/commons/flows/init.py", line 186, in run\n (atom_fqn, self._defs['help'])\n', 'AtomExecutionFailedError: Atom Execution failed. Error: Error executing atom: tendrl.objects.Cluster.atoms.ImportCluster on flow: Import existing Gluster Cluster\n']

Failure in Job 02239281-7b28-4cec-8d11-7c9f6b82fca1 Flow tendrl.flows.ImportCluster with error: Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/tendrl/commons/jobs/init.py", line 240, in process_job the_flow.run() File "/usr/lib/python2.7/site-packages/tendrl/commons/flows/import_cluster/init.py", line 131, in run exc_traceback) FlowExecutionFailedError: ['Traceback (most recent call last):\n', ' File "/usr/lib/python2.7/site-packages/tendrl/commons/flows/import_cluster/init.py", line 98, in run\n super(ImportCluster, self).run()\n', ' File "/usr/lib/python2.7/site-packages/tendrl/commons/flows/init.py", line 213, in run\n ret_val = self._execute_atom(atom_fqn)\n', ' File "/usr/lib/python2.7/site-packages/tendrl/commons/flows/init.py", line 252, in _execute_atom\n parameters=self.parameters\n', ' File "/usr/lib/python2.7/site-packages/tendrl/commons/objects/cluster/atoms/configure_monitoring/init.py", line 110, in run\n "interface": self.get_node_interface(NS.node_context.fqdn),\n', ' File "/usr/lib/python2.7/site-packages/tendrl/commons/objects/cluster/atoms/configure_monitoring/init.py", line 80, in get_node_interface\n ip = socket.gethostbyname(fqdn)\n', 'TypeError: must be string, not None\n']

@GowthamShanmugam
Copy link
Contributor Author

It clearly says FQDN of the node is not populated, some problem with node_context details sync. what is the version of tendrl rpms and ansible:
rpm -qa | grep tendrl
rpm -qa | grep ansible
run these command in the server as well as storage nodes.

@SalsaBr
Copy link

SalsaBr commented Jun 27, 2019

Node-01:
tendrl-collectd-selinux-1.5.4-2.el7.centos.noarch
tendrl-selinux-1.5.4-2.el7.centos.noarch
tendrl-commons-1.6.3-11.el7.noarch
tendrl-gluster-integration-1.6.3-10.el7.noarch
tendrl-node-agent-1.6.3-9.el7.noarch

ansible-2.5.3-1.el7.noarch

Node-02:
tendrl-collectd-selinux-1.5.4-2.el7.centos.noarch
tendrl-selinux-1.5.4-2.el7.centos.noarch
tendrl-node-agent-1.6.3-9.el7.noarch
tendrl-gluster-integration-1.6.3-10.el7.noarch
tendrl-commons-1.6.3-11.el7.noarch

centos-release-ansible26-1-3.el7.centos.noarch
ansible-2.8.0-2.el7.noarch

Node-03:
tendrl-collectd-selinux-1.5.4-2.el7.centos.noarch
tendrl-gluster-integration-1.6.3-10.el7.noarch
tendrl-selinux-1.5.4-2.el7.centos.noarch
tendrl-node-agent-1.6.3-9.el7.noarch
tendrl-commons-1.6.3-11.el7.noarch

ansible-2.8.0-2.el7.noarch

node-remote:
tendrl-notifier-1.6.3-4.el7.noarch
tendrl-ansible-1.6.3-2.el7.centos.noarch
tendrl-monitoring-integration-1.6.3-11.el7.noarch
tendrl-grafana-selinux-1.5.4-2.el7.centos.noarch
tendrl-collectd-selinux-1.5.4-2.el7.centos.noarch
tendrl-gluster-integration-1.6.3-10.el7.noarch
tendrl-selinux-1.5.4-2.el7.centos.noarch
tendrl-commons-1.6.3-11.el7.noarch
tendrl-api-1.6.3-7.el7.noarch
tendrl-api-httpd-1.6.3-7.el7.noarch
tendrl-node-agent-1.6.3-9.el7.noarch
tendrl-ui-1.6.3-10.el7.noarch
tendrl-grafana-plugins-1.6.3-11.el7.noarch

tendrl-ansible-1.6.3-2.el7.centos.noarch
ansible-2.8.0-2.el7.noarch
centos-release-ansible26-1-3.el7.centos.noarch

Note that running versions may differ as every node has ansible 2.7.0 for example:
ansible --version
ansible 2.7.0
config file = /etc/ansible/ansible.cfg
configured module search path = [u'/root/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules']
ansible python module location = /usr/lib/python2.7/site-packages/ansible
executable location = /usr/bin/ansible
python version = 2.7.5 (default, Apr 9 2019, 14:30:50) [GCC 4.8.5 20150623 (Red Hat 4.8.5-36)]

Another sympton:
Tendrl mentios 4 hosts discovered in the cluster but when I try to view these hosts I get a table with 3 hosts only. Tendrl server - which is a geo-rep hosts for the cluster - is missing. May be related as the 3 hosts being shown have correct names and IPs.

@GowthamShanmugam
Copy link
Contributor Author

Ah! I got the problem, in the upstream release we are not yet included ansible 2.8 fix, it is still in the master repo only.

Here except node1, all the nodes ansible version are 2.8, it should be less than 2.8 and greater than 2.5 (including tendrl-server).

after downgraded restart tendrl-node-agent service in node as well as the server.

Note:
Don't install any gluster packages in tendrl-server

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants