Terraform first approach to install an OpenShift 4 cluster under KVM / libvirt and HAProxy as the load balancer.
Notable highlights of this approach:
- 1. Usage
- 2. Details
- 3. Automatic Certificate Signing Request Approvals
- 4. Optimizations
- 5. Acknowledgements
-
Install libvirt:
-
OpenShift Installer
-
Pull Secret
-
Python 3
-
In
/etc/NetworkManager/dnsmasq.d/
create a file for the DNS configuration. This will allow the host system to resolve addresses for the cluster using the libvirt DNS mask instance listening on the specified IP address. For example, we might use:
cat <<EOF > sudo /etc/NetworkManager/dnsmasq.d/ocp.dnsmasq.conf
server=/ocp.example.io/192.168.200.1
EOF
-
In
/etc/NetworkManager/conf.d/
create a file to instruct Network Manager to consult dnsmasq for DNS resolutions:
cat <<EOF > sudo /etc/NetworkMnager/conf.d/nm.global.conf
[main]
dns=dnsmasq
EOF
-
Restart Network Manager:
sudo systemctl restart NetworkManager
In terraform.tfvars
:
openshift_installer = "<path-to-openshift-installer>"
pull_secret_file = "<path-to-pull-secret-file>"
Note
|
Paths may be absolute or relative. |
Every cluster has simple and basic information that will uniquely identify it.
In terraform.tfvars
:
cluster_name = "mycluster" # (1)
base_domain = "example.io" # (2)
cluster_network_cidr_address = "192.168.200.0/24" # (3)
-
Cluster name as a simple host name
-
Cluster base DNS domain
-
Cluster network expressed in CIDR
A cluster consists of a set amount of nodes for: * Control plane (e.g., masters). * Compute plane (e.g., workers). * Infrastructure plane
Note
|
Infrastructure nodes were formally removed in OpenShift 4.0 and have never made a true formal reappearance from an installer perspective. However, an infrastructure node is merely a re-labeled compute (e.g., worker) node. This is a post configuration step of the installed cluster. |
In terraform.tfvars
:
master_count = 3 # (1)
worker_count = 2 # (2)
infra_count = 0 # (3)
-
Desired number of nodes for the control plane
-
Desired number of compute/worker nodes
-
Desired number of infrastructure nodes
WarningCurrently ignored
Each node type can be customized for CPU, memory and disk.
Control plane (e.g., master nodes):
master_cpu = 4 # (1)
master_memory = 16 # (2)
master_disk_size = 120 # (3)
-
Number of CPUs
-
Memory in GB
-
Disk size in GB
Compute plane (e.g., worker/compure nodes):
worker_cpu = 2 # (1)
worker_memory = 8 # (2)
worker_disk_size = 120 # (3)
-
Number of CPUs
-
Memory in GB
-
Disk size in GB
infra_cpu = 2 # (1)
infra_memory = 8 # (2)
infra_disk_size = 120 # (3)
-
Number of CPUs
-
Memory in GB
-
Disk size in GB
terraform init
terraform plan
terraform apply -auto-approve
HAProxy statistics page is enabled and you can reach it via: http://loadbalancer.<cluster-name>.<base-domain-name>:8404/stats
We automatically perform the following:
-
Generation of
install-config.yaml
for the cluster. -
Cluster SSH key pair generation.
-
Load balanacer node SSH key pair generation.
All cluster assets are generated to ${gen_dir}/cluster
and include:
-
Kubernetes authentication details:
auth/kubeadmin-password
andauth/kubeconfig
-
Ignition files (manifests are generated as well by definition but are consumed by the installer and the ignition files are the end product)
-
SSH key pair:
keys/
Load balancer SSH key pair assets are generated to ${gen_dir}/keys
We dynamically generate the infrastructure inventory data that will be used to provision the infrastructure. This is
done in the machines-info
module and merely use a Terraform local-exec
provisioner with a custom Python script. We
chose Python due to its inherent ease of use to interoperate with JSON. As in all other Terraform resources we then
simply create the necessary dependencies. The true salient point here is the use of jsondecode
which was introduced in
Terraform v0.12:
link:machines-info/main.tf[role=include]
Refer to: machines-info/main.tf
We use a procedural based approach when generating the MAC addresses for the nodes. This is used to setup DHCP reservations on the libvirt network for every node. Doing this assures that we know what they are ahead of time instead of having to come back and query libvirt for the IP address assigned to each node while the infrastructure is being provisioned. The idea is to determistically generate MAC addresses based on a cluster ID and node role for each node.
Effectively this results in DHCP static host address reservations effectively mapping a given cluster node deterministic MAC address to its deterministic IP address counterpart and yet in a dynamic manner.
The rationale behind this is succinct and elegant with the IP addresses for each node type (e.g., role) being smartly incremented along with their corresponding MAC address:
-
We are given a MAC OUI, in this case for KVM / libvirt:
52:54:00
and we encapsulate via the Terraform variablecluster_network_mac_oui
-
The load balancer is always placed at the IP network address + 2
-
The bootstrap node is always placed at the IP network address + 10
-
The control plane nodes (e.g., masters) are always placed at the IP network address +20
-
The compute plane nodes (e.g., workers) are always placed at the IP network address +30
-
The infrastructure nodes are always placed at the IP network address +40
Refer to: machines-info/generate-hosts.py
Reference: MAC Address
We leverage Terraform’s ability to ingest JSON and use it as a source of variables. We exploit this to generate the
necessary cluster node assets prior to provisioning them. This is where Terraform’s jsondecode
truly saves us:
Reference: Terraform JSON Decode
Here we simply ingest our own generated JSON from Dynamic Inventory Generation as a regular Terraform directive with the relevant dependency.
We take advantage of libvirt Network Namespaces to set up wildcard DNS for default ingress so you don’t have to. This allows us to apply an XSL transform to libvirt’s generated XML for the network before the libvirt network is provisioned. To accomplish all of this we arrange to apply the XSL transform using the generated inventory that was previously done and ingested as JSON (which then we can use these effectively as variables in most areas of Terraform) using Terraform’s built in templating capability. The magic looks like this:
link:network/main.tf[role=include]
Refer to: network/main.tf
In the XSL we merely copy every element until we get to the parts we need:
-
DHCP reservations for every node type for the cluster, including the load balancer and bootstrap nodes..
-
Ensuring dnsmasq is configured with our default ingress VIP to the load balancer node.
To accomplish this however we also need to ensure that we arrange for the stylesheet to enable the dnsmasq extensions for the resultant XML.
xmlns:dnsmasq="http://libvirt.org/schemas/network/dnsmasq/1.0"
link:network/templates/network.xsl[role=include]
link:network/templates/network.xsl[role=include]
Refer to: network/templates/network.xsl
A dedicated node is automatically provisioned for the HAProxy node. Once all nodes have been provisioned we wait for the load balancer node to become available. Once it is available we install HAProxy on the node, configure it with our cluster information for public API, private API and default ingress for both HTTP and HTTPS and start HAProxy.
We accomplish this by generating an Ansible inventory file in Terraform and finally have Terraform kick off Ansible and let Ansible "take it from there" for setting up the load balancer. We pass along to Ansible the generated JSON file that we originally created as part of Dynamic Inventory Generation which Ansible is more than happy to accept a JSON file that contains variables.
All of this is accomplished in the ansible
Terraform module.
Certificate Signing Requests (CSR’s) are automatically approved. This is accomplished by having Terraform kick execute
an Ansible playbook whose sole purpose is to monitor the state of the bootstrap node. It specifically waits for the
bootstrap node to first transition from not available to available and then back to unavailable. This is done by polling
the bootstrap nodes' public API port directly (bypassing the HAProxy load balancer). Once the bootstrap node has pulsed
in this fashion then Terraform is orchestrated to run a custom Python script in the csr-approver
module which merely
establishes a standard Kubernetes watch for certificate signing requests and when received, it approves them.
Most Linux distributions at this point have KSM (Kernel Same Page Merging) automatically enabled. This can yield a significant memory savings. However, for those distributions that might not the following may prove useful:
-
David Dreggors, Red Hat
-
For the idea, inspiration and discussion of MAC Address Generation for Nodes
-
A very nice, working and sustainable HAProxy configuration for which we made a Jinja 2 template for! ansible/playbooks/templates/haproxy/haproxy.cfg
-