[HA] - Add haproxy #57

robdaemon · 2017-05-02T22:54:14Z

This adds haproxy as a frontend to the kube-apiserver on all nodes

Runs the kubelet on all nodes

Fixes warnings with the TLS certificates on the etcd nodes

Restarts applicable services when the TLS certificates change

Tested off the master branch of velum, and the ha branch (so, a single master or multiple master nodes)

robdaemon · 2017-05-02T22:54:43Z

Note: I'll rebase / reword the commits when this is approved

Also: need to make a SUSE-based haproxy image, this currently uses the haproxy image from Docker Hub.

robdaemon · 2017-05-02T22:55:28Z

salt/etcd/init.sls

-  cmd.run:
-    - name: rm -rf /var/lib/etcd/*
-    - prereq:
-      - service: etcd


I intentionally took this out - this is destructive, and can lead to a situation where your etcd cluster is destroyed if the initial orchestration run happens multiple times.

Yea, that scares me. Maybe it should be split off and proposed separately to master?

It is already known that we are currently not prepared for running the orchestration several times (in fact I think there is no way that orchestration can be triggered again once the cluster has been created). We should study how to solve these problems in the (near) future...

We are currently tracking this issue in this card

robdaemon · 2017-05-02T22:56:07Z

salt/haproxy/haproxy.manifest

+      requests:
+        memory: 128Mi
+      limits:
+        memory: 128Mi


this will need testing to know if these values are too big or too small - I just made a guess to start with.

PI-Victor · 2017-05-02T23:15:20Z

salt/haproxy/haproxy.manifest

+      name: haproxy-cfg
+  volumes:
+  - hostPath:
+      path: /etc/haproxy


just one question, is /etc/haproxy a RW path?

Only Salt should be writing to that - it's intention is to contain the haproxy configuration only.

i understand that, what i don't understand is if, under MicroOS, it can write to the path at all, or it's RO. sorry, i might be missing something.

/etc should be RW in MicroOS, so this shouldn't be a problem...

@PI-Victor ahh, sorry, I misunderstood the why on your question :)

robdaemon · 2017-05-03T04:19:10Z

I do have a TODO here - HUP / restart haproxy when the configuration changes. Will likely need to use docker kill -s HUP <haproxy container> since Kubernetes does not have a primitive for sending signals at this time. See kubernetes/kubernetes#24957

mssola · 2017-05-03T07:37:42Z

salt/haproxy/haproxy.cfg.jinja

+{% set addr = addrlist|first -%}
+{% endif -%}
+        server master-{{ minion_id }} {{ addr }}:6443 check
+{% endfor -%}


Something to think in the future: what happens if I add a new node and I want it to become a master, will this work ? In the past we've been using confd to handle this, but maybe this is something that salt can handle as well ?

If a new master is added (or removed), salt needs to be ran across all hosts and this file will be regenerated. Rob noted a TODO of actually triggering the haproxy config to be reloaded.

kiall · 2017-05-03T11:14:00Z

salt/haproxy/haproxy.cfg.jinja

@@ -0,0 +1,28 @@
+global
+        log /dev/log    local0
+        log /dev/log    local1 notice


Where do these logs end up?

They don't, really. haproxy cannot log to stdout. The "solutions" to this are not really solutions: dockerfile/haproxy#3

kiall · 2017-05-03T11:15:19Z

salt/haproxy/haproxy.cfg.jinja

+{% set addr = addrlist|first -%}
+{% endif -%}
+        server master-{{ minion_id }} {{ addr }}:6443 check
+{% endfor -%}


If a new master is added (or removed), salt needs to be ran across all hosts and this file will be regenerated. Rob noted a TODO of actually triggering the haproxy config to be reloaded.

kiall · 2017-05-03T11:17:13Z

salt/haproxy/init.sls

+  - pkgs:
+    - kubernetes-node
+  - require:
+    - file: /etc/zypp/repos.d/containers.repo


Is this still necessary? It feels odd to be asking for kubernetes-node to be installed here

is it not needed for kubelet ?

yeah, I probably have these backwards. Oops. Will reverse.

kiall · 2017-05-03T11:28:37Z

salt/haproxy/haproxy.cfg.jinja

+        timeout server 50000
+
+listen kubernetes-master
+        bind 127.0.0.1:6444


Ideally, we should listen on the standard kube-apiserver port here so any pre-existing configs don't need a port update once this lands, and the default port is more obvious for people when thinking it through later.

On the masters, we should probably also bind to 0.0.0.0, so end users will enter through the same LB.

grahamhayes

Looks good - a few inline comments

grahamhayes · 2017-05-03T13:54:39Z

salt/haproxy/haproxy.cfg.jinja

+{% else -%}
+{% set addr = addrlist|first -%}
+{% endif -%}
+        server master-{{ minion_id }} {{ addr }}:6443 check


We should make the connection go to <minion-id>.infra.caasp.local - means less HAProxy changes / reloads if an IP changes.

grahamhayes · 2017-05-03T13:55:33Z

salt/haproxy/init.sls

+  - pkgs:
+    - kubernetes-node
+  - require:
+    - file: /etc/zypp/repos.d/containers.repo


is it not needed for kubelet ?

grahamhayes · 2017-05-03T13:56:42Z

salt/kubeconfig/kubeconfig.jinja

 apiVersion: v1
 clusters:
 - cluster:
    certificate-authority: {{ pillar['paths']['ca_dir'] }}/{{ pillar['paths']['ca_filename'] }}
-    server: {{ api_server_url }}
+    server: https://api.{{ pillar['internal_infra_domain'] }}:6444


We should probably leave this file as is, and update the pillar values

This isn't really coming from the pillar.

{% set api_ssl_port = salt['pillar.get']('api:ssl_port', '6443') -%} {% set api_server_url = 'https://' + api_server + ':' + api_ssl_port -%}

grahamhayes · 2017-05-03T13:57:42Z

salt/kubelet/kubelet.jinja

@@ -15,7 +11,7 @@ KUBELET_PORT="--port=10250"
 KUBELET_HOSTNAME="--hostname-override={{ grains['id'] }}.{{ pillar['internal_infra_domain'] }}"

 # location of the api-server
-KUBELET_API_SERVER="--api-servers={{ api_server_url }}"
+KUBELET_API_SERVER="--api-servers=https://api.{{ pillar['internal_infra_domain'] }}:6444"


same as salt/kubeconfig/kubeconfig.jinja - we should leave this file as is, and update pillars.

grahamhayes · 2017-05-03T13:58:32Z

salt/kubernetes-master/config.jinja

+# note: uses haproxy and TLS for HA and resiliency
+KUBE_MASTER="--master=https://api.{{ pillar['internal_infra_domain'] }}:6444"


we should look at the same logic as other places we use the {{ api_server_url }}

robdaemon · 2017-05-03T19:11:53Z

all feedback has been addressed, please review the latest commit.

inercia · 2017-05-04T07:33:31Z

salt/haproxy/haproxy.manifest

+        memory: 128Mi
+      limits:
+        memory: 128Mi
+    image: haproxy:1.7.5


I think this image should be changed for something based on SLE

Agree, however we don't have one right and will treat that separately (This PR isn't targeted at master, it's targeted to a WIP/temporary HA branch).

I understand.

inercia · 2017-05-04T07:41:39Z

salt/etcd/init.sls

-  cmd.run:
-    - name: rm -rf /var/lib/etcd/*
-    - prereq:
-      - service: etcd


It is already known that we are currently not prepared for running the orchestration several times (in fact I think there is no way that orchestration can be triggered again once the cluster has been created). We should study how to solve these problems in the (near) future...

inercia · 2017-05-04T07:50:03Z

salt/kubernetes-master/init.sls

+  /etc/haproxy/haproxy.cfg:
+    file.managed:
+      - context:
+        bind_ip: "0.0.0.0"


I guess this will make haproxy listen on all interfaces, right? In that case, what is the reason?

This is targeted to the HA branch, where multiple masters will be allowed and end users need a mechanism to balanced over the pool of servers. The mechanism they use to be balanced should be no different the mechanism we use for workers etc, hence haproxy binds to 0.0.0.0:6443 in place of kube-apiserver.

Ok, I see, thanks for the clarification.

inercia · 2017-05-04T07:52:19Z

salt/kubernetes-master/init.sls

@@ -65,7 +77,8 @@ kube-apiserver:
    - match:      state
    - connstate:  NEW
    - dports:
-        - {{ api_ssl_port }}
+        - {{ salt['pillar.get']('api:ssl_port', '6443') }}
+        - {{ salt['pillar.get']('api:lb_ssl_port', '6443') }}


Is the same port for lb_ssl_port and ssl_port?

inercia · 2017-05-04T07:53:40Z

salt/kubernetes-master/init.sls

@@ -99,6 +116,9 @@ kube-scheduler:
    - watch:
      - file:     /etc/kubernetes/config
      - file:     kube-scheduler
+      - file: /etc/pki/minion.crt
+      - file: /etc/pki/minion.key
+      - file: {{ pillar['paths']['ca_dir'] }}/{{ pillar['paths']['ca_filename'] }}


Please align with the previous lines.

inercia · 2017-05-04T07:53:51Z

salt/kubernetes-master/init.sls

@@ -114,6 +134,9 @@ kube-controller-manager:
    - watch:
      - file:     /etc/kubernetes/config
      - file:     kube-controller-manager
+      - file: /etc/pki/minion.crt
+      - file: /etc/pki/minion.key
+      - file: {{ pillar['paths']['ca_dir'] }}/{{ pillar['paths']['ca_filename'] }}


inercia · 2017-05-04T08:05:53Z

From my understanding of the code, I think you are running haproxy in the masters for client connections. Is that right? Don't you think that is an overkill? Is this just for the case that the local api server is dead while the local haproxy is alive?. Don't you think this is a quite improbable case?

kiall

From my understanding of the code, I think you are running haproxy in the masters for client connections. Is that right?

That's correct. Every kube-apiserver API call takes the same route, irregardless of if it's coming from an end user, a worker, or a master.

Don't you think that is an overkill? Is this just for the case that the local api server is dead while the local haproxy is alive?

Not at all - that honestly wasn't even a consideration. It allows for significantly easier reasoning about how a request will be handled. Being consistent means we have e.g. the same timeouts, the same proxy behaviours, the same logging applied. When this is deployed by a customer, at any kind of scale, this will be important.

Don't you think this is a quite improbable case?

It's not for that case, but still - I've seen much stranger things in production...

kiall · 2017-05-04T09:47:55Z

salt/haproxy/haproxy.manifest

+        memory: 128Mi
+      limits:
+        memory: 128Mi
+    image: haproxy:1.7.5


Agree, however we don't have one right and will treat that separately (This PR isn't targeted at master, it's targeted to a WIP/temporary HA branch).

kiall · 2017-05-04T09:54:33Z

salt/kubernetes-master/init.sls

+  /etc/haproxy/haproxy.cfg:
+    file.managed:
+      - context:
+        bind_ip: "0.0.0.0"


This is targeted to the HA branch, where multiple masters will be allowed and end users need a mechanism to balanced over the pool of servers. The mechanism they use to be balanced should be no different the mechanism we use for workers etc, hence haproxy binds to 0.0.0.0:6443 in place of kube-apiserver.

ereslibre

Looking great, thank you!

ereslibre · 2017-05-04T10:23:35Z

salt/kubernetes-master/init.sls

-      - subjectAltName: "{{ ", ".join(extra_names + ip_addresses) }}"
-
-{% set api_ssl_port = salt['pillar.get']('api:ssl_port', '6443') %}
+      - subjectAltName: "{{ ", ".join(ip_addresses) }}"


extra_names and ip_addresses were meant to correspond to DNS: and IP: respectively in our SAN certificates. Fine from my point of view if we want to merge them.

ereslibre · 2017-05-04T10:28:14Z

salt/kubernetes-minion/init.sls

+  {% endfor %}
+{% endfor %}
+# add some extra names the API server could have
+{% do ip_addresses.append("DNS: " + grains['fqdn']) %}


We are mixing names and ip addresses here again. Maybe we can rename the ip_addresses var to something like subject_alt_names or something like that (everywhere, if we are going to mix DNS and IP entries on our SAN cert request). It's a minor thing, but when debugging can lead to mistakes...

robdaemon · 2017-05-05T17:53:06Z

Okay, I've addressed all the feedback and added the HUP on haproxy when the configuration changes. I'm pretty sure I'm done with this one, unless someone has more feedback? @kiall @grahamhayes @ereslibre ?

inercia

Overall it looks good to me (only a small comment)

inercia · 2017-05-05T19:33:12Z

salt/haproxy/init.sls

+            for i in $(docker ps -a | grep haproxy-k8-api | awk '{print $1}')
+            do 
+              docker kill -HUP $i 
+            done


I think you wouldn't need this loop if you use the onlyif parameter...

Ahh okay, that's good info for next time, thanks. This is the largest amount of work I've done in a SaltStack codebase ever.

Maybe a onchanges can help to only execute this HUP signal sending when the relevant config file has changed. A watch should also help to make it reactive.

@ereslibre There is already a watch for file: /etc/haproxy/haproxy.cfg, so this state cmd.runs when the file being watched changes. But it would be better if we could do that onlyif haproxy is running (instead of looping). As @ereslibre also mentioned, maybe cmd.run and onchanges would be a better solution...

robdaemon · 2017-05-05T19:36:34Z

Thanks for the approval, @inercia - I'll go ahead and rebase this all into one commit.

Add haproxy to all nodes, configure it to point to the nodes designated as master, and set the local kubernetes services to use it as the api server endpoint Writing the api server hostname as localhost on the minions so that we don't need to add an IP SAN for 127.0.0.1 to the kube-apiserver certificates Moving kubeconfig to its own entity Moving kubelet to its own entity Setting the proper subjectAltNames on the certs on the kube-minions Making the kube-master use https via haproxy Running kubelet on the master nodes Watch the TLS certificates and root cert, and restart affected services if that changes Send a HUP to haproxy when the configuration changes Uses docker to perform the HUP, since Kubernetes does not currently support sending a signal to a running pod. The docker approach is safer than a standard pkill, because we can target our specific haproxy instance without indiscriminately sending a HUP to all haproxy instances running on a node (like, within a user workload)

ereslibre

Awesome job. Thank you!

ereslibre · 2017-05-06T00:10:50Z

salt/haproxy/init.sls

+            for i in $(docker ps -a | grep haproxy-k8-api | awk '{print $1}')
+            do 
+              docker kill -HUP $i 
+            done


Maybe a onchanges can help to only execute this HUP signal sending when the relevant config file has changed. A watch should also help to make it reactive.

robdaemon requested a review from grahamhayes May 2, 2017 22:54

robdaemon commented May 2, 2017

View reviewed changes

robdaemon requested a review from kiall May 2, 2017 23:08

robdaemon changed the title ~~Add haproxy~~ [HA] - Add haproxy May 2, 2017

PI-Victor reviewed May 2, 2017

View reviewed changes

mssola reviewed May 3, 2017

View reviewed changes

kiall reviewed May 3, 2017

View reviewed changes

grahamhayes reviewed May 3, 2017

View reviewed changes

inercia suggested changes May 4, 2017

View reviewed changes

kiall reviewed May 4, 2017

View reviewed changes

ereslibre approved these changes May 4, 2017

View reviewed changes

inercia approved these changes May 5, 2017

View reviewed changes

robdaemon force-pushed the rar_haproxy branch from 42de63e to de5e307 Compare May 5, 2017 19:39

ereslibre approved these changes May 6, 2017

View reviewed changes

kiall approved these changes May 8, 2017

View reviewed changes

kiall merged commit 470c2a0 into ha May 8, 2017

kiall deleted the rar_haproxy branch May 8, 2017 16:21

		# note: uses haproxy and TLS for HA and resiliency
		KUBE_MASTER="--master=https://api.{{ pillar['internal_infra_domain'] }}:6444"

[HA] - Add haproxy #57

[HA] - Add haproxy #57

Conversation

robdaemon commented May 2, 2017

robdaemon commented May 2, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

robdaemon commented May 3, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

grahamhayes left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

robdaemon commented May 3, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

inercia May 4, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

inercia commented May 4, 2017

kiall left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ereslibre left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

robdaemon commented May 5, 2017

inercia left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

robdaemon commented May 5, 2017

ereslibre left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

robdaemon commented May 3, 2017 •

edited

Loading

inercia May 4, 2017 •

edited

Loading

kiall left a comment •

edited

Loading