Skip to content
This repository has been archived by the owner on Oct 29, 2019. It is now read-only.

[HA] - Add haproxy #57

Merged
merged 1 commit into from
May 8, 2017
Merged

[HA] - Add haproxy #57

merged 1 commit into from
May 8, 2017

Conversation

robdaemon
Copy link
Contributor

This adds haproxy as a frontend to the kube-apiserver on all nodes

Runs the kubelet on all nodes

Fixes warnings with the TLS certificates on the etcd nodes

Restarts applicable services when the TLS certificates change

Tested off the master branch of velum, and the ha branch (so, a single master or multiple master nodes)

@robdaemon robdaemon requested a review from grahamhayes May 2, 2017 22:54
@robdaemon
Copy link
Contributor Author

Note: I'll rebase / reword the commits when this is approved

Also: need to make a SUSE-based haproxy image, this currently uses the haproxy image from Docker Hub.

cmd.run:
- name: rm -rf /var/lib/etcd/*
- prereq:
- service: etcd
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I intentionally took this out - this is destructive, and can lead to a situation where your etcd cluster is destroyed if the initial orchestration run happens multiple times.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, that scares me. Maybe it should be split off and proposed separately to master?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is already known that we are currently not prepared for running the orchestration several times (in fact I think there is no way that orchestration can be triggered again once the cluster has been created). We should study how to solve these problems in the (near) future...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are currently tracking this issue in this card

requests:
memory: 128Mi
limits:
memory: 128Mi
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will need testing to know if these values are too big or too small - I just made a guess to start with.

@robdaemon robdaemon requested a review from kiall May 2, 2017 23:08
@robdaemon robdaemon changed the title Add haproxy [HA] - Add haproxy May 2, 2017
name: haproxy-cfg
volumes:
- hostPath:
path: /etc/haproxy
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just one question, is /etc/haproxy a RW path?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only Salt should be writing to that - it's intention is to contain the haproxy configuration only.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i understand that, what i don't understand is if, under MicroOS, it can write to the path at all, or it's RO. sorry, i might be missing something.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/etc should be RW in MicroOS, so this shouldn't be a problem...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@PI-Victor ahh, sorry, I misunderstood the why on your question :)

@robdaemon
Copy link
Contributor Author

robdaemon commented May 3, 2017

I do have a TODO here - HUP / restart haproxy when the configuration changes. Will likely need to use docker kill -s HUP <haproxy container> since Kubernetes does not have a primitive for sending signals at this time. See kubernetes/kubernetes#24957

{% set addr = addrlist|first -%}
{% endif -%}
server master-{{ minion_id }} {{ addr }}:6443 check
{% endfor -%}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something to think in the future: what happens if I add a new node and I want it to become a master, will this work ? In the past we've been using confd to handle this, but maybe this is something that salt can handle as well ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a new master is added (or removed), salt needs to be ran across all hosts and this file will be regenerated. Rob noted a TODO of actually triggering the haproxy config to be reloaded.

@@ -0,0 +1,28 @@
global
log /dev/log local0
log /dev/log local1 notice
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where do these logs end up?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They don't, really. haproxy cannot log to stdout. The "solutions" to this are not really solutions: dockerfile/haproxy#3

{% set addr = addrlist|first -%}
{% endif -%}
server master-{{ minion_id }} {{ addr }}:6443 check
{% endfor -%}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a new master is added (or removed), salt needs to be ran across all hosts and this file will be regenerated. Rob noted a TODO of actually triggering the haproxy config to be reloaded.

- pkgs:
- kubernetes-node
- require:
- file: /etc/zypp/repos.d/containers.repo
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this still necessary? It feels odd to be asking for kubernetes-node to be installed here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it not needed for kubelet ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I probably have these backwards. Oops. Will reverse.

timeout server 50000

listen kubernetes-master
bind 127.0.0.1:6444
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, we should listen on the standard kube-apiserver port here so any pre-existing configs don't need a port update once this lands, and the default port is more obvious for people when thinking it through later.

On the masters, we should probably also bind to 0.0.0.0, so end users will enter through the same LB.

Copy link
Contributor

@grahamhayes grahamhayes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good - a few inline comments

{% else -%}
{% set addr = addrlist|first -%}
{% endif -%}
server master-{{ minion_id }} {{ addr }}:6443 check
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should make the connection go to <minion-id>.infra.caasp.local - means less HAProxy changes / reloads if an IP changes.

- pkgs:
- kubernetes-node
- require:
- file: /etc/zypp/repos.d/containers.repo
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it not needed for kubelet ?

apiVersion: v1
clusters:
- cluster:
certificate-authority: {{ pillar['paths']['ca_dir'] }}/{{ pillar['paths']['ca_filename'] }}
server: {{ api_server_url }}
server: https://api.{{ pillar['internal_infra_domain'] }}:6444
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably leave this file as is, and update the pillar values

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't really coming from the pillar.

{% set api_ssl_port = salt['pillar.get']('api:ssl_port', '6443') -%}
{% set api_server_url = 'https://' + api_server + ':' + api_ssl_port -%}

@@ -15,7 +11,7 @@ KUBELET_PORT="--port=10250"
KUBELET_HOSTNAME="--hostname-override={{ grains['id'] }}.{{ pillar['internal_infra_domain'] }}"

# location of the api-server
KUBELET_API_SERVER="--api-servers={{ api_server_url }}"
KUBELET_API_SERVER="--api-servers=https://api.{{ pillar['internal_infra_domain'] }}:6444"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as salt/kubeconfig/kubeconfig.jinja - we should leave this file as is, and update pillars.

# note: uses haproxy and TLS for HA and resiliency
KUBE_MASTER="--master=https://api.{{ pillar['internal_infra_domain'] }}:6444"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should look at the same logic as other places we use the {{ api_server_url }}

@robdaemon
Copy link
Contributor Author

all feedback has been addressed, please review the latest commit.

memory: 128Mi
limits:
memory: 128Mi
image: haproxy:1.7.5
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this image should be changed for something based on SLE

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, however we don't have one right and will treat that separately (This PR isn't targeted at master, it's targeted to a WIP/temporary HA branch).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand.

cmd.run:
- name: rm -rf /var/lib/etcd/*
- prereq:
- service: etcd
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is already known that we are currently not prepared for running the orchestration several times (in fact I think there is no way that orchestration can be triggered again once the cluster has been created). We should study how to solve these problems in the (near) future...

/etc/haproxy/haproxy.cfg:
file.managed:
- context:
bind_ip: "0.0.0.0"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this will make haproxy listen on all interfaces, right? In that case, what is the reason?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is targeted to the HA branch, where multiple masters will be allowed and end users need a mechanism to balanced over the pool of servers. The mechanism they use to be balanced should be no different the mechanism we use for workers etc, hence haproxy binds to 0.0.0.0:6443 in place of kube-apiserver.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I see, thanks for the clarification.

@@ -65,7 +77,8 @@ kube-apiserver:
- match: state
- connstate: NEW
- dports:
- {{ api_ssl_port }}
- {{ salt['pillar.get']('api:ssl_port', '6443') }}
- {{ salt['pillar.get']('api:lb_ssl_port', '6443') }}
Copy link
Contributor

@inercia inercia May 4, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the same port for lb_ssl_port and ssl_port?

@@ -99,6 +116,9 @@ kube-scheduler:
- watch:
- file: /etc/kubernetes/config
- file: kube-scheduler
- file: /etc/pki/minion.crt
- file: /etc/pki/minion.key
- file: {{ pillar['paths']['ca_dir'] }}/{{ pillar['paths']['ca_filename'] }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please align with the previous lines.

@@ -114,6 +134,9 @@ kube-controller-manager:
- watch:
- file: /etc/kubernetes/config
- file: kube-controller-manager
- file: /etc/pki/minion.crt
- file: /etc/pki/minion.key
- file: {{ pillar['paths']['ca_dir'] }}/{{ pillar['paths']['ca_filename'] }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

@inercia
Copy link
Contributor

inercia commented May 4, 2017

From my understanding of the code, I think you are running haproxy in the masters for client connections. Is that right? Don't you think that is an overkill? Is this just for the case that the local api server is dead while the local haproxy is alive?. Don't you think this is a quite improbable case?

Copy link
Contributor

@kiall kiall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my understanding of the code, I think you are running haproxy in the masters for client connections. Is that right?

That's correct. Every kube-apiserver API call takes the same route, irregardless of if it's coming from an end user, a worker, or a master.

Don't you think that is an overkill? Is this just for the case that the local api server is dead while the local haproxy is alive?

Not at all - that honestly wasn't even a consideration. It allows for significantly easier reasoning about how a request will be handled. Being consistent means we have e.g. the same timeouts, the same proxy behaviours, the same logging applied. When this is deployed by a customer, at any kind of scale, this will be important.

Don't you think this is a quite improbable case?

It's not for that case, but still - I've seen much stranger things in production...

memory: 128Mi
limits:
memory: 128Mi
image: haproxy:1.7.5
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, however we don't have one right and will treat that separately (This PR isn't targeted at master, it's targeted to a WIP/temporary HA branch).

/etc/haproxy/haproxy.cfg:
file.managed:
- context:
bind_ip: "0.0.0.0"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is targeted to the HA branch, where multiple masters will be allowed and end users need a mechanism to balanced over the pool of servers. The mechanism they use to be balanced should be no different the mechanism we use for workers etc, hence haproxy binds to 0.0.0.0:6443 in place of kube-apiserver.

Copy link
Contributor

@ereslibre ereslibre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking great, thank you!

- subjectAltName: "{{ ", ".join(extra_names + ip_addresses) }}"

{% set api_ssl_port = salt['pillar.get']('api:ssl_port', '6443') %}
- subjectAltName: "{{ ", ".join(ip_addresses) }}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extra_names and ip_addresses were meant to correspond to DNS: and IP: respectively in our SAN certificates. Fine from my point of view if we want to merge them.

{% endfor %}
{% endfor %}
# add some extra names the API server could have
{% do ip_addresses.append("DNS: " + grains['fqdn']) %}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are mixing names and ip addresses here again. Maybe we can rename the ip_addresses var to something like subject_alt_names or something like that (everywhere, if we are going to mix DNS and IP entries on our SAN cert request). It's a minor thing, but when debugging can lead to mistakes...

@robdaemon
Copy link
Contributor Author

Okay, I've addressed all the feedback and added the HUP on haproxy when the configuration changes. I'm pretty sure I'm done with this one, unless someone has more feedback? @kiall @grahamhayes @ereslibre ?

Copy link
Contributor

@inercia inercia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall it looks good to me (only a small comment)

for i in $(docker ps -a | grep haproxy-k8-api | awk '{print $1}')
do
docker kill -HUP $i
done
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you wouldn't need this loop if you use the onlyif parameter...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh okay, that's good info for next time, thanks. This is the largest amount of work I've done in a SaltStack codebase ever.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a onchanges can help to only execute this HUP signal sending when the relevant config file has changed. A watch should also help to make it reactive.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ereslibre There is already a watch for file: /etc/haproxy/haproxy.cfg, so this state cmd.runs when the file being watched changes. But it would be better if we could do that onlyif haproxy is running (instead of looping). As @ereslibre also mentioned, maybe cmd.run and onchanges would be a better solution...

@robdaemon
Copy link
Contributor Author

Thanks for the approval, @inercia - I'll go ahead and rebase this all into one commit.

Add haproxy to all nodes, configure it to point to the nodes designated
as master, and set the local kubernetes services to use it as the api
server endpoint

Writing the api server hostname as localhost on the
minions so that we don't need to add an IP SAN for 127.0.0.1 to the
kube-apiserver certificates

Moving kubeconfig to its own entity

Moving kubelet to its own entity

Setting the proper subjectAltNames on the certs on the kube-minions

Making the kube-master use https via haproxy

Running kubelet on the master nodes

Watch the TLS certificates and root cert, and restart affected services
if that changes

Send a HUP to haproxy when the configuration changes

Uses docker to perform the HUP, since Kubernetes does not currently
support sending a signal to a running pod.

The docker approach is safer than a standard pkill, because we can
target our specific haproxy instance without indiscriminately sending a
HUP to all haproxy instances running on a node (like, within a user
workload)
Copy link
Contributor

@ereslibre ereslibre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome job. Thank you!

for i in $(docker ps -a | grep haproxy-k8-api | awk '{print $1}')
do
docker kill -HUP $i
done
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a onchanges can help to only execute this HUP signal sending when the relevant config file has changed. A watch should also help to make it reactive.

@kiall kiall merged commit 470c2a0 into ha May 8, 2017
@kiall kiall deleted the rar_haproxy branch May 8, 2017 16:21
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants