-
Notifications
You must be signed in to change notification settings - Fork 29
Conversation
Note: I'll rebase / reword the commits when this is approved Also: need to make a SUSE-based haproxy image, this currently uses the haproxy image from Docker Hub. |
cmd.run: | ||
- name: rm -rf /var/lib/etcd/* | ||
- prereq: | ||
- service: etcd |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I intentionally took this out - this is destructive, and can lead to a situation where your etcd cluster is destroyed if the initial orchestration run happens multiple times.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, that scares me. Maybe it should be split off and proposed separately to master?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is already known that we are currently not prepared for running the orchestration several times (in fact I think there is no way that orchestration can be triggered again once the cluster has been created). We should study how to solve these problems in the (near) future...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are currently tracking this issue in this card
requests: | ||
memory: 128Mi | ||
limits: | ||
memory: 128Mi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this will need testing to know if these values are too big or too small - I just made a guess to start with.
name: haproxy-cfg | ||
volumes: | ||
- hostPath: | ||
path: /etc/haproxy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just one question, is /etc/haproxy
a RW path?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only Salt should be writing to that - it's intention is to contain the haproxy configuration only.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i understand that, what i don't understand is if, under MicroOS, it can write to the path at all, or it's RO. sorry, i might be missing something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/etc
should be RW in MicroOS, so this shouldn't be a problem...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@PI-Victor ahh, sorry, I misunderstood the why on your question :)
I do have a TODO here - HUP / restart haproxy when the configuration changes. Will likely need to use |
salt/haproxy/haproxy.cfg.jinja
Outdated
{% set addr = addrlist|first -%} | ||
{% endif -%} | ||
server master-{{ minion_id }} {{ addr }}:6443 check | ||
{% endfor -%} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something to think in the future: what happens if I add a new node and I want it to become a master, will this work ? In the past we've been using confd to handle this, but maybe this is something that salt can handle as well ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If a new master is added (or removed), salt needs to be ran across all hosts and this file will be regenerated. Rob noted a TODO of actually triggering the haproxy config to be reloaded.
@@ -0,0 +1,28 @@ | |||
global | |||
log /dev/log local0 | |||
log /dev/log local1 notice |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where do these logs end up?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They don't, really. haproxy cannot log to stdout. The "solutions" to this are not really solutions: dockerfile/haproxy#3
salt/haproxy/haproxy.cfg.jinja
Outdated
{% set addr = addrlist|first -%} | ||
{% endif -%} | ||
server master-{{ minion_id }} {{ addr }}:6443 check | ||
{% endfor -%} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If a new master is added (or removed), salt needs to be ran across all hosts and this file will be regenerated. Rob noted a TODO of actually triggering the haproxy config to be reloaded.
salt/haproxy/init.sls
Outdated
- pkgs: | ||
- kubernetes-node | ||
- require: | ||
- file: /etc/zypp/repos.d/containers.repo |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this still necessary? It feels odd to be asking for kubernetes-node to be installed here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it not needed for kubelet
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, I probably have these backwards. Oops. Will reverse.
salt/haproxy/haproxy.cfg.jinja
Outdated
timeout server 50000 | ||
|
||
listen kubernetes-master | ||
bind 127.0.0.1:6444 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally, we should listen on the standard kube-apiserver port here so any pre-existing configs don't need a port update once this lands, and the default port is more obvious for people when thinking it through later.
On the masters, we should probably also bind to 0.0.0.0, so end users will enter through the same LB.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good - a few inline comments
salt/haproxy/haproxy.cfg.jinja
Outdated
{% else -%} | ||
{% set addr = addrlist|first -%} | ||
{% endif -%} | ||
server master-{{ minion_id }} {{ addr }}:6443 check |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should make the connection go to <minion-id>.infra.caasp.local
- means less HAProxy changes / reloads if an IP changes.
salt/haproxy/init.sls
Outdated
- pkgs: | ||
- kubernetes-node | ||
- require: | ||
- file: /etc/zypp/repos.d/containers.repo |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it not needed for kubelet
?
salt/kubeconfig/kubeconfig.jinja
Outdated
apiVersion: v1 | ||
clusters: | ||
- cluster: | ||
certificate-authority: {{ pillar['paths']['ca_dir'] }}/{{ pillar['paths']['ca_filename'] }} | ||
server: {{ api_server_url }} | ||
server: https://api.{{ pillar['internal_infra_domain'] }}:6444 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably leave this file as is, and update the pillar values
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't really coming from the pillar.
{% set api_ssl_port = salt['pillar.get']('api:ssl_port', '6443') -%}
{% set api_server_url = 'https://' + api_server + ':' + api_ssl_port -%}
salt/kubelet/kubelet.jinja
Outdated
@@ -15,7 +11,7 @@ KUBELET_PORT="--port=10250" | |||
KUBELET_HOSTNAME="--hostname-override={{ grains['id'] }}.{{ pillar['internal_infra_domain'] }}" | |||
|
|||
# location of the api-server | |||
KUBELET_API_SERVER="--api-servers={{ api_server_url }}" | |||
KUBELET_API_SERVER="--api-servers=https://api.{{ pillar['internal_infra_domain'] }}:6444" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as salt/kubeconfig/kubeconfig.jinja
- we should leave this file as is, and update pillars.
salt/kubernetes-master/config.jinja
Outdated
# note: uses haproxy and TLS for HA and resiliency | ||
KUBE_MASTER="--master=https://api.{{ pillar['internal_infra_domain'] }}:6444" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should look at the same logic as other places we use the {{ api_server_url }}
all feedback has been addressed, please review the latest commit. |
memory: 128Mi | ||
limits: | ||
memory: 128Mi | ||
image: haproxy:1.7.5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this image should be changed for something based on SLE
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, however we don't have one right and will treat that separately (This PR isn't targeted at master, it's targeted to a WIP/temporary HA branch).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand.
cmd.run: | ||
- name: rm -rf /var/lib/etcd/* | ||
- prereq: | ||
- service: etcd |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is already known that we are currently not prepared for running the orchestration several times (in fact I think there is no way that orchestration can be triggered again once the cluster has been created). We should study how to solve these problems in the (near) future...
/etc/haproxy/haproxy.cfg: | ||
file.managed: | ||
- context: | ||
bind_ip: "0.0.0.0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this will make haproxy listen on all interfaces, right? In that case, what is the reason?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is targeted to the HA branch, where multiple masters will be allowed and end users need a mechanism to balanced over the pool of servers. The mechanism they use to be balanced should be no different the mechanism we use for workers etc, hence haproxy binds to 0.0.0.0:6443 in place of kube-apiserver.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I see, thanks for the clarification.
salt/kubernetes-master/init.sls
Outdated
@@ -65,7 +77,8 @@ kube-apiserver: | |||
- match: state | |||
- connstate: NEW | |||
- dports: | |||
- {{ api_ssl_port }} | |||
- {{ salt['pillar.get']('api:ssl_port', '6443') }} | |||
- {{ salt['pillar.get']('api:lb_ssl_port', '6443') }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the same port for lb_ssl_port
and ssl_port
?
salt/kubernetes-master/init.sls
Outdated
@@ -99,6 +116,9 @@ kube-scheduler: | |||
- watch: | |||
- file: /etc/kubernetes/config | |||
- file: kube-scheduler | |||
- file: /etc/pki/minion.crt | |||
- file: /etc/pki/minion.key | |||
- file: {{ pillar['paths']['ca_dir'] }}/{{ pillar['paths']['ca_filename'] }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please align with the previous lines.
salt/kubernetes-master/init.sls
Outdated
@@ -114,6 +134,9 @@ kube-controller-manager: | |||
- watch: | |||
- file: /etc/kubernetes/config | |||
- file: kube-controller-manager | |||
- file: /etc/pki/minion.crt | |||
- file: /etc/pki/minion.key | |||
- file: {{ pillar['paths']['ca_dir'] }}/{{ pillar['paths']['ca_filename'] }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
From my understanding of the code, I think you are running |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From my understanding of the code, I think you are running haproxy in the masters for client connections. Is that right?
That's correct. Every kube-apiserver API call takes the same route, irregardless of if it's coming from an end user, a worker, or a master.
Don't you think that is an overkill? Is this just for the case that the local api server is dead while the local haproxy is alive?
Not at all - that honestly wasn't even a consideration. It allows for significantly easier reasoning about how a request will be handled. Being consistent means we have e.g. the same timeouts, the same proxy behaviours, the same logging applied. When this is deployed by a customer, at any kind of scale, this will be important.
Don't you think this is a quite improbable case?
It's not for that case, but still - I've seen much stranger things in production...
memory: 128Mi | ||
limits: | ||
memory: 128Mi | ||
image: haproxy:1.7.5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, however we don't have one right and will treat that separately (This PR isn't targeted at master, it's targeted to a WIP/temporary HA branch).
/etc/haproxy/haproxy.cfg: | ||
file.managed: | ||
- context: | ||
bind_ip: "0.0.0.0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is targeted to the HA branch, where multiple masters will be allowed and end users need a mechanism to balanced over the pool of servers. The mechanism they use to be balanced should be no different the mechanism we use for workers etc, hence haproxy binds to 0.0.0.0:6443 in place of kube-apiserver.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking great, thank you!
salt/kubernetes-master/init.sls
Outdated
- subjectAltName: "{{ ", ".join(extra_names + ip_addresses) }}" | ||
|
||
{% set api_ssl_port = salt['pillar.get']('api:ssl_port', '6443') %} | ||
- subjectAltName: "{{ ", ".join(ip_addresses) }}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
extra_names
and ip_addresses
were meant to correspond to DNS:
and IP:
respectively in our SAN certificates. Fine from my point of view if we want to merge them.
salt/kubernetes-minion/init.sls
Outdated
{% endfor %} | ||
{% endfor %} | ||
# add some extra names the API server could have | ||
{% do ip_addresses.append("DNS: " + grains['fqdn']) %} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are mixing names and ip addresses here again. Maybe we can rename the ip_addresses
var to something like subject_alt_names
or something like that (everywhere, if we are going to mix DNS and IP entries on our SAN cert request). It's a minor thing, but when debugging can lead to mistakes...
Okay, I've addressed all the feedback and added the HUP on haproxy when the configuration changes. I'm pretty sure I'm done with this one, unless someone has more feedback? @kiall @grahamhayes @ereslibre ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall it looks good to me (only a small comment)
for i in $(docker ps -a | grep haproxy-k8-api | awk '{print $1}') | ||
do | ||
docker kill -HUP $i | ||
done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you wouldn't need this loop if you use the onlyif
parameter...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh okay, that's good info for next time, thanks. This is the largest amount of work I've done in a SaltStack codebase ever.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe a onchanges
can help to only execute this HUP signal sending when the relevant config file has changed. A watch
should also help to make it reactive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ereslibre There is already a watch
for file: /etc/haproxy/haproxy.cfg
, so this state cmd.run
s when the file being watch
ed changes. But it would be better if we could do that onlyif
haproxy is running (instead of looping). As @ereslibre also mentioned, maybe cmd.run
and onchanges
would be a better solution...
Thanks for the approval, @inercia - I'll go ahead and rebase this all into one commit. |
Add haproxy to all nodes, configure it to point to the nodes designated as master, and set the local kubernetes services to use it as the api server endpoint Writing the api server hostname as localhost on the minions so that we don't need to add an IP SAN for 127.0.0.1 to the kube-apiserver certificates Moving kubeconfig to its own entity Moving kubelet to its own entity Setting the proper subjectAltNames on the certs on the kube-minions Making the kube-master use https via haproxy Running kubelet on the master nodes Watch the TLS certificates and root cert, and restart affected services if that changes Send a HUP to haproxy when the configuration changes Uses docker to perform the HUP, since Kubernetes does not currently support sending a signal to a running pod. The docker approach is safer than a standard pkill, because we can target our specific haproxy instance without indiscriminately sending a HUP to all haproxy instances running on a node (like, within a user workload)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome job. Thank you!
for i in $(docker ps -a | grep haproxy-k8-api | awk '{print $1}') | ||
do | ||
docker kill -HUP $i | ||
done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe a onchanges
can help to only execute this HUP signal sending when the relevant config file has changed. A watch
should also help to make it reactive.
This adds haproxy as a frontend to the kube-apiserver on all nodes
Runs the kubelet on all nodes
Fixes warnings with the TLS certificates on the etcd nodes
Restarts applicable services when the TLS certificates change
Tested off the master branch of velum, and the ha branch (so, a single master or multiple master nodes)