Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scaling the cluster with --limit and without access_ip or ip doesn't work #11587

Closed
sirkrypt0 opened this issue Sep 29, 2024 · 0 comments · Fixed by #11598
Closed

Scaling the cluster with --limit and without access_ip or ip doesn't work #11587

sirkrypt0 opened this issue Sep 29, 2024 · 0 comments · Fixed by #11598
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@sirkrypt0
Copy link

What happened?

I have a cluster with 3 nodes, 2 control and 1 worker. I want to scale my cluster with another worker. I have only specified ansible_host in my inventory and neither access_ip nor ip.
I tried to run the scale playbook with --limit to scale my cluster, but it failed with:

TASK [kubernetes_sigs.kubespray.kubernetes/node : Nginx-proxy | Write nginx-proxy configuration] ***
An exception occurred during task execution. To see the full traceback, use -vvv. The error was: ansible.errors.AnsibleUndefinedVariable: 'dict object' has no attribute 'k8s-02'. 'dict object' has no attribute 'k8s-02'
fatal: [k8s-04]: FAILED! => {"changed": false, "msg": "AnsibleUndefinedVariable: 'dict object' has no attribute 'k8s-02'. 'dict object' has no attribute 'k8s-02'"}

The reason for that to happen is that the nginx loadbalancer template iterates over the groups['kube_control_plane'] and tries to set the upstream for all control plane nodes:

{% for host in groups['kube_control_plane'] -%}
server {{ hostvars[host]['access_ip'] | default(hostvars[host]['ip'] | default(fallback_ips[host])) }}:{{ kube_apiserver_port }};
{% endfor -%}

If no access_ip or ip are defined, as in my case, it uses the fallback_ips. As the error message indicates, the second control node k8s-02 doesn't have a fallback_ip.

I think that this issue was only recently introduced in #11370 which restricted the gathering of the fallback_ip to only the first control node when --limit is specified.

What did you expect to happen?

Scaling should work, just as creating the cluster did. Code-wise, a fallback_ip for all nodes required for the playbook to work with --limit should be gathered.

How can we reproduce it (as minimally and precisely as possible)?

Create a cluster using an inventory like the following. Things to note:

  • Only ansible_host, no access_ip nor ip
  • Two control nodes
all:
  hosts:
    k8s-01:
      ansible_host: 10.0.0.11
    k8s-02:
      ansible_host: 10.0.0.12
    k8s-03:
      ansible_host: 10.0.0.13
    # k8s-04:
    #   ansible_host: 10.0.0.14
  children:
    kube_control_plane:
      hosts:
        k8s-01:
        k8s-02:
    kube_node:
      hosts:
        k8s-01:
        k8s-02:
        k8s-03:
        # k8s-04:
    etcd:
      hosts:
        k8s-01:
        k8s-02:
        k8s-03:
    k8s_cluster:
      children:
        kube_control_plane:
        kube_node:

Then, uncomment the 4th node and run the scale playbook with --limit k8s-04.

OS

Linux 6.6.47-1-MANJARO x86_64
NAME="Manjaro Linux"
PRETTY_NAME="Manjaro Linux"
ID=manjaro
ID_LIKE=arch
BUILD_ID=rolling

Version of Ansible

ansible [core 2.16.11]

Version of Python

3.12.5

Version of Kubespray (commit)

v2.26.0

Network plugin used

calico

Full inventory with variables

See sample inventory above. I use all the defaults from the sample inventory.

Command used to invoke ansible

ansible-playbook --become --ask-become-pass -i inventory/hosts.yaml kubernetes_sigs.kubespray.scale --limit=k8s-04

Output of ansible run

TASK [kubernetes_sigs.kubespray.kubernetes/node : Nginx-proxy | Write nginx-proxy configuration] ***
An exception occurred during task execution. To see the full traceback, use -vvv. The error was: ansible.errors.AnsibleUndefinedVariable: 'dict object' has no attribute 'k8s-02'. 'dict object' has no attribute 'k8s-02'
fatal: [k8s-04]: FAILED! => {"changed": false, "msg": "AnsibleUndefinedVariable: 'dict object' has no attribute 'k8s-02'. 'dict object' has no attribute 'k8s-02'"}

Anything else we need to know

No response

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant