diff --git a/.github/workflows/docs.yml b/.github/workflows/docs.yml index 0081683..c90414e 100644 --- a/.github/workflows/docs.yml +++ b/.github/workflows/docs.yml @@ -3,7 +3,7 @@ name: docs on: push: branches: - - 'lab-4.10' + - 'lab-4.12To4.14' env: SITE_DIR: "gh-pages" jobs: diff --git a/dev-review/4.10/API-Compatibility.html b/dev-review/4.10/API-Compatibility.html deleted file mode 100644 index f18f93e..0000000 --- a/dev-review/4.10/API-Compatibility.html +++ /dev/null @@ -1,174 +0,0 @@ - - - - - - OCP API Compatibility Policy :: LAB - Telco OCP Upgrade Lab - - - - - - -
- -
-
- -
- -
-
-

OCP API Compatibility Policy

-
-

CNF API Compatibility

-
-
-

OpenShift and Kubernetes are strong because they use APIs for so many functions. This also means that the APIs will change over time as components are updated. Therefore, it is important to verify that an API call is still compatible with the cluster so that you will receive the desired response. Please review the documented section called “Understanding API Tiers”.

-
-
-

The most important thing to understand when considering which Z-release to upgrade to inside of a new Y-release is what patches need to be in the new Z-release. In other words if you are currently at OCP 4.11.28 you will need to make sure to upgrade to a Z-release of 4.12 that has all of the patches in it that were applied to 4.11.28, otherwise you will break the built-in compatibility of Kubernetes.

-
-
-
-
-

Kubernetes Version Skew

-
-
-

Support of specific API versions needs to be maintained by each cluster operator. With new releases of operators come new APIs. Therefore, the changes or skews in APIs need to be maintained. To a certain extent the APIs can be compatible across several releases of an operator. This list of operators and the releases that are compatible are at: https://kubernetes.io/releases/version-skew-policy

-
-
-

The easiest way verify your application functionality will still work, is to make sure that you follow

-
-
-
-
-

OpenShift Upgrade Path

-
-
-

Please also note that not all releases of OCP can be upgraded to any arbitrary Z-release even if they contain all of the required patches. -OpenShift upgrade process mandates that: -If fix “A” is present in a specific X.Y.Z release of OCP -Then fix “A” MUST be present in the X.Y+1.Z release that OCP is upgraded TO

-
-
-

Consequence of the chosen destination version of 4.12.z defines which is the maximum version of OCP4.11.z, OCP4.10.z and OCP4.9.z -not all 4.9.z version will permit to upgrade to a given version of OCP4.12.z -A given version of OCP4.12.z will have requirements to a maximum version of OCP4.9z -This is due to how fixes are backported into older releases of OCP.

-
-
-

You can use the upgrade graph tool to determine if the path is valid for your z-release. You should also always verify with your Sales Engineer or Technical Account Manager at Red Hat to make sure the upgrade path is valid for Telco implementations.

-
-
-
-k8s vers skew -
-
Figure 1. K8s Version Skey
-
-
-
- -
- -
-
-
-
-
-
- -
- -
-
-

Applying MCPs

-
-

First you can run “oc get mcp” to show your current list of MCPs:

-
-
-
-
# oc get mcp
-
-
-
-

List out all of your nodes:

-
-
-
-
# oc get no
-
-
-
-

Determine, from the above suggestions, how you would like to separate out your worker nodes into machine config pools -(MCP).
-In this example we will just use 1 node in each MCP.
-We first need to label the nodes so that they can be put into MCPs. We will do this with the following commands:

-
-
-
-
oc label node euschannel-worker-0.test.corp node-role.kubernetes.io/*mcp-1*=
-
-
-
-

This will show up when you run the “oc get node” command:

-
-
-
-
# oc get no
-
-
-
-

Now you need to create yaml files that will apply the labels as MCPs. Here is one example:

-
-
-
-
apiVersion: machineconfiguration.openshift.io/v1
-
-
-
-

For each of these, just run “oc apply -f {filename.yaml}”:

-
-
-
-
# oc apply -f test-mcp-2.yaml
-
-
-
-

Now you can run “oc get mcp” again and your new MCPs will show. Please note that you will still see the original worker -and master MCPs that are part of the cluster.

-
-
-
-
# oc get mcp
-
-
-
- -
-
-
-
-
-
- -
- -
-
-

CNF Upgrade Preparation

-
-
-
-

The life of a POD is an important topic to understand. This section will describe several topics that are important to -keeping your CNF PODs healthy and allow the cluster to properly schedule them during an upgrade.

-
-
-
-
-

CNF Requirements Document

-
-
-

Before you go any further, please read through the CNF requirements document. -In this section a few of the most important points will be discussed but the CNF Requirements Document has additional -detail and other important topics.

-
-
-
-
-

POD Disruption Budget

-
-
-

Each set of PODs in a deployment can be given a specific minimum number of PODs that should be running in order to keep -from disrupting the functionality of the CNF, thus called the POD disruption budget (PDB). However, this budget can be -improperly configured.
-For example, if you have 4 PODs in a deployment and your PDB is set to 4, this means that you are telling the scheduler -that you NEED 4 PODs running at all times. Therefore, in this scenario ZERO PODs can come down.

-
-
-
-PDB full -
-
Figure 1. Deployment with no PDB
-
-
-

To fix this, the PDB can be set to 2, letting 2 of the 4 pods to be scheduled as down and this would then let the worker -nodes where those PODs are located be rebooted.

-
-
-
-PDB down 2 -
-
Figure 2. Deployment with PDB
-
-
-
-
-

POD Anti-affinity

-
-
-

True high availability requires a duplication of a process to be running on separate hardware, thus making sure that an -application will continue to run if one piece of hardware goes down. OpenShift can easily make that happen since -processes are automatically duplicated in separate PODs within a deployment. However, those PODs need to have -anti-affinity set on them so that they are NOT running on the same hardware. It so happens that anti-affinity also -helps during upgrades because it makes sure that PODs are on different worker nodes, therefore allowing enough PODs to -come down even after considering their PDB.

-
-
-
-
-

Liveness / Readiness Probes

-
-
-

OpenShift and Kubernetes have some built in features that not everyone takes advantage of called -liveness and readiness probes. -These are very important when POD deployments are dependent upon keeping state for their application. This document -won’t go into detail regarding these probes but please review the OpenShift documentation -on how to implement their use.

-
-
-
-
- -
-
-
-
-
-
- -
- -
-
-

OCP Upgrade Preparation

-
-

Firmware compatibility

-
-
-

All hardware vendors will advise that it is always best to be on their latest certified version of firmware for their -hardware. In the telco world this comes with a trust but verify approach due to the high throughput nature of telco -CNFs. Therefore, it is important to have a regimented group who can test the current and latest firmware from any vendor -to make sure that all components will work with both. It is not always recommended to upgrade firmware in conjunction -with an OCP upgrade however if it is possible to test the latest release of firmware that will improve the odds that -you won’t run into issues down the road.
-Upgrading firmware is a very important debate because the process can be very intrusive and has a potential for causing -a node to require manual interventions before the node will come back online. On the other hand it may be imperative to -upgrade the firmware due to security fixes, new required functionality or compatibility with the new release of OCP -components. Therefore, it is up to everyone to verify with their hardware vendors, verify compatibility with OCP -components and perform tests in their lab before moving forward.

-
-
-
-
-

Layer product compatibility

-
-
-

It is important to make sure all layered products will run on the new version of OCP that you will be moving to. This, -very much, includes all Operators.

-
-
-

Verify the current installed list of Operators installed on your cluster. For example:

-
-
-
-
# oc get csv -A
-NAMESPACE                              NAME                                 DISPLAY          VERSION   REPLACES                             PHASE
-chapter2                               gitlab-operator-kubernetes.v0.17.2   GitLab           0.17.2    gitlab-operator-kubernetes.v0.17.1   Succeeded
-openshift-operator-lifecycle-manager   packageserver                        Package Server   0.19.0                                         Succeeded
-
-
-
-
-
-

Prepare MCPs

-
-
-

Prepare your Machine Config Pool (MCP) labels by grouping your nodes, depending on the number of nodes in your cluster. -MCPs should be split up into 8 to 10 nodes per group. However, there is no hard fast rule as to how many nodes need to -be in each MCP. The purpose of these MCPs is to group nodes together so that a group of nodes can be controlled -independently of the rest. Additional information and examples can be found here, under the canary rollout documentation.
-These MCPs will be used to un-pause a set of nodes during the upgrade process, thus allowing them to be upgraded and -rebooted at a determined time instead of at the pleasure of the scheduler. Please review the upgrade process flow -section, below, for more details on the pause/un-pause process.

-
-
-
-5Rack MCP -
-
Figure 1. Worker node MCPs in a 5 rack cluster
-
-
-

The division and size of these MCPs can vary depending on many factors. In general the standard division is between 8 -and 10 nodes per MCP to allow the operations team to control how many nodes are taken down at a time.

-
-
-
-LBorHT MCP -
-
Figure 2. Separate MCPs inside of a group of Load Balancer or purpose built nodes
-
-
-

In larger clusters there is quite often a need to separate out several nodes for purposes like Load Balancing or other -high throughput purposes, which usually have different machine sets to configure SR-IOV. In these cases we do not want -to upgrade all of these nodes without getting a chance to test during the upgrade. Therefore, we need to separate them -out into at least 3 different MCPs and unpause them individually.

-
-
-
-Worker MCP -
-
Figure 3. Small cluster worker MCPs
-
-
-

Smaller cluster example with 1 rack

-
-
-

The process and pace at which you un-pause the MCPs is determined by your CNFs and their configuration. Please review -the sections on PDB and anti-affinity for CNFs. If your CNF can properly handle scheduling within an OpenShift cluster -you can un-pause several MCPs at a time and set the MaxUnavailable to as high as 50%. This will allow as many as half -of the nodes in your MCPs to restart and upgrade. This will reduce the amount of time that is needed for a specific -maintenance window and allow your cluster to upgrade quickly. Hopefully you can see how keeping your PDB and -anti-affinity correctly configured will help in the long run.

-
-
-
-
- -
-
-
-
-
-
- -
- -
-
-

Welcome to the Red Hat Lab - Telco OCP Upgrades

-
-

Red Hat Lab - Telco OCP Upgrades

-
-
-

This hands-on lab …​

-
-
- - - - - -
- - -This is not an official Red Hat Training. Please contact with your Red Hat representative if you want more information about Training or Guidelines about this topic / product. -
-
-
-
-
-

Contributors

-
- ---- - - - - - - - - - - - - - - - - -
NameRole

Rob Fisher

Lab Developer / Maintainer

Lab Reviewer

-
-
- -
- -
-
-
-
-
-
- -
- -
-
-

Introduction

-
-
-
-

Welcome to this Telco OCP Upgrades Lab.

-
-
-

Why are upgrades important? -This is the question that all Telco platform administrators are faced with answering.

-
-
-

There is the simple response: because there is a new release out there.

-
-
-

The CNF (container native function) based response: The CNF requires additional functionality from the platform, therefore we need to upgrade OpenShift to get the new functionality.

-
-
-

The pragmatic response: All software needs to be patched because of bug and due to potential security vulnerabilities that have been found since the installation of the cluster.

-
-
-

With OpenShift and K8s (Kubernetes) we are adding in yet another: The platform is only supported for a specific period of time and new releases are coming out every 4 months.

-
-
-

Platform administrators are also asked why not just wait until we need to upgrade due to support?

-
-
-

If we put all of these together we find that OpenShift and K8s have made this less painful for Telco companies because we can skip every other release. OpenShift has long term support (EUS - extended update support) on all even releases and upgrade paths between these EUS releases. This document will discuss how to best upgrade from one EUS to the next and how to plan for that upgrade.

-
-
-

We want you to experience and learn about the steps and procedures that are needed to perform an OpenShift in service upgrade. Listening to and reading about an upgrade can only give you so much experience. Now we can apply the discussion to an experience that will walk you through the steps.

-
-
-

This document is intended to discuss OpenShift LifeCycle management for Telecommunication Network Function Core clusters. The specific size of the cluster has a few differences which are called out specifically at times in this document but this is meant to cover most clusters from 3 nodes to the largest cluster certified by the telco scale team. This includes some scenarios for mixed workload clusters.

-
-
-

This document will discuss the following upgrade scenarios:

-
-
-
    -
  • -

    Z-Stream

    -
  • -
  • -

    Y-Stream

    -
  • -
  • -

    EUS to EUS

    -
  • -
-
-
-

Each of these scenarios has different considerations which are called out as needed.

-
-
-
-
-

Who is this lab aimed at?

-
-
-

The lab is aimed to technical profiles working with OpenShift who are interested in any of these areas:

-
-
-
    -
  • -

    LifeCycle Management of OpenShift

    -
  • -
  • -

    Telecommunications Core 5G Baremetal Cluster Upgrades

    -
  • -
  • -

    Disconnected Environments

    -
  • -
-
-
-
-
-

Lab Software Versions

-
-
-

The lab is based on the following software versions.

-
-
-
    -
  • -

    OpenShift Container Platform v4.12 upgrading to v4.14

    -
  • -
-
-
-
- -
- -
-
-
-
-
-
- -
- -
-
- -
-

1. LifeCycle Management

-
-
-

1.1. Scope

-
-

This document is intended to discuss OpenShift LifeCycle management for Telecommunication Network Function Core -clusters. The specific size of the cluster has a few differences which are called out specifically at times in this -document but this is meant to cover most clusters from 10 nodes to the largest cluster certified by the telco scale -team. This includes some scenarios for mixed workload clusters.

-
-
-

This document will discuss the following upgrade scenarios:

-
-
-
    -
  • -

    Z-Stream

    -
  • -
  • -

    Y-Stream

    -
  • -
  • -

    EUS to EUS

    -
  • -
  • -

    Y+3 (i.e.: 4.9 to 4.12) -Each of these has different considerations which are called out specifically as needed, in this document.
    -There is also a dedicated section which will review what is called a Blue/Green migration upgrade. This solution is -useful in situations where the recommended in-service upgrade for Y+3 is not enough for a cluster or the upgrade will -encompass more than Y+3.

    -
  • -
-
-
-
-

1.2. OCP API Compatibility Policy

-
-

The most important thing to understand when considering which Z-release to upgrade to inside of a new Y-release is what -patches need to be in the new Z-release. In other words if you are currently at OCP 4.11.28 you will need to make sure -to upgrade to a Z-release of 4.12 that has all of the patches in it that were applied to 4.11.28, otherwise you will -break the built-in compatibility of Kubernetes.
-This is called the Kubernetes version skew policy and can be found at: https://kubernetes.io/releases/version-skew-policy
-This can also be seen in the following graphic:

-
-
-
-k8s vers skew -
-
Figure 1. K8s Version Skey
-
-
-

Please also note that not all releases of OCP can be upgraded to any arbitrary Z-release even if they contain all of -the required patches. You can use the upgrade graph tool (https://access.redhat.com/labs/ocpupgradegraph/update_path) -to determine if the path is valid for your z-release. You should also always verify with your Sales Engineer or -Technical Account Manager at Red Hat to make sure the upgrade path is valid for Telco implementations.

-
-
-
-

1.3. CNF Upgrade Preparation

-
-

The life of a POD is an important topic to understand. This section will describe several topics that are important to -keeping your CNF PODs healthy and allow the cluster to properly schedule them during an upgrade.

-
-
-

1.3.1. CNF Requirements Document

-
-

Before you go any further, please read through the CNF requirements document. -In this section a few of the most important points will be discussed but the CNF Requirements Document has additional -detail and other important topics.

-
-
-
-

1.3.2. POD Disruption Budget

-
-

Each set of PODs in a deployment can be given a specific minimum number of PODs that should be running in order to keep -from disrupting the functionality of the CNF, thus called the POD disruption budget (PDB). However, this budget can be -improperly configured.
-For example, if you have 4 PODs in a deployment and your PDB is set to 4, this means that you are telling the scheduler -that you NEED 4 PODs running at all times. Therefore, in this scenario ZERO PODs can come down.

-
-
-
-PDB full -
-
Figure 2. Deployment with no PDB
-
-
-

To fix this, the PDB can be set to 2, letting 2 of the 4 pods to be scheduled as down and this would then let the worker -nodes where those PODs are located be rebooted.

-
-
-
-PDB down 2 -
-
Figure 3. Deployment with PDB
-
-
-
-

1.3.3. POD Anti-affinity

-
-

True high availability requires a duplication of a process to be running on separate hardware, thus making sure that an -application will continue to run if one piece of hardware goes down. OpenShift can easily make that happen since -processes are automatically duplicated in separate PODs within a deployment. However, those PODs need to have -anti-affinity set on them so that they are NOT running on the same hardware. It so happens that anti-affinity also -helps during upgrades because it makes sure that PODs are on different worker nodes, therefore allowing enough PODs to -come down even after considering their PDB.

-
-
-
-

1.3.4. Liveness / Readiness Probes

-
-

OpenShift and Kubernetes have some built in features that not everyone takes advantage of called -liveness and readiness probes. -These are very important when POD deployments are dependent upon keeping state for their application. This document -won’t go into detail regarding these probes but please review the OpenShift documentation -on how to implement their use.

-
-
-
-
-

1.4. OCP Upgrade Preparation

-
-

1.4.1. Firmware compatibility

-
-

All hardware vendors will advise that it is always best to be on their latest certified version of firmware for their -hardware. In the telco world this comes with a trust but verify approach due to the high throughput nature of telco -CNFs. Therefore, it is important to have a regimented group who can test the current and latest firmware from any vendor -to make sure that all components will work with both. It is not always recommended to upgrade firmware in conjunction -with an OCP upgrade however if it is possible to test the latest release of firmware that will improve the odds that -you won’t run into issues down the road.
-Upgrading firmware is a very important debate because the process can be very intrusive and has a potential for causing -a node to require manual interventions before the node will come back online. On the other hand it may be imperative to -upgrade the firmware due to security fixes, new required functionality or compatibility with the new release of OCP -components. Therefore, it is up to everyone to verify with their hardware vendors, verify compatibility with OCP -components and perform tests in their lab before moving forward.

-
-
-
-

1.4.2. Layer product compatibility

-
-

It is important to make sure all layered products will run on the new version of OCP that you will be moving to. This, -very much, includes all Operators.

-
-
-

Verify the current installed list of Operators installed on your cluster. For example:

-
-
-
-
# oc get csv -A
-NAMESPACE                              NAME                                 DISPLAY          VERSION   REPLACES                             PHASE
-chapter2                               gitlab-operator-kubernetes.v0.17.2   GitLab           0.17.2    gitlab-operator-kubernetes.v0.17.1   Succeeded
-openshift-operator-lifecycle-manager   packageserver                        Package Server   0.19.0                                         Succeeded
-
-
-
-
-

1.4.3. Prepare MCPs

-
-

Prepare your Machine Config Pool (MCP) labels by grouping your nodes, depending on the number of nodes in your cluster. -MCPs should be split up into 8 to 10 nodes per group. However, there is no hard fast rule as to how many nodes need to -be in each MCP. The purpose of these MCPs is to group nodes together so that a group of nodes can be controlled -independently of the rest. Additional information and examples can be found here, under the canary rollout documentation.
-These MCPs will be used to un-pause a set of nodes during the upgrade process, thus allowing them to be upgraded and -rebooted at a determined time instead of at the pleasure of the scheduler. Please review the upgrade process flow -section, below, for more details on the pause/un-pause process.

-
-
-
-5Rack MCP -
-
Figure 4. Worker node MCPs in a 5 rack cluster
-
-
-

The division and size of these MCPs can vary depending on many factors. In general the standard division is between 8 -and 10 nodes per MCP to allow the operations team to control how many nodes are taken down at a time.

-
-
-
-LBorHT MCP -
-
Figure 5. Separate MCPs inside of a group of Load Balancer or purpose built nodes
-
-
-

In larger clusters there is quite often a need to separate out several nodes for purposes like Load Balancing or other -high throughput purposes, which usually have different machine sets to configure SR-IOV. In these cases we do not want -to upgrade all of these nodes without getting a chance to test during the upgrade. Therefore, we need to separate them -out into at least 3 different MCPs and unpause them individually.

-
-
-
-Worker MCP -
-
Figure 6. Small cluster worker MCPs
-
-
-

Smaller cluster example with 1 rack

-
-
-

The process and pace at which you un-pause the MCPs is determined by your CNFs and their configuration. Please review -the sections on PDB and anti-affinity for CNFs. If your CNF can properly handle scheduling within an OpenShift cluster -you can un-pause several MCPs at a time and set the MaxUnavailable to as high as 50%. This will allow as many as half -of the nodes in your MCPs to restart and upgrade. This will reduce the amount of time that is needed for a specific -maintenance window and allow your cluster to upgrade quickly. Hopefully you can see how keeping your PDB and -anti-affinity correctly configured will help in the long run.

-
-
-
1.4.3.1. Applying MCPs
-
-

First you can run “oc get mcp” to show your current list of MCPs:

-
- --- - - - - - -
# oc get mcp
-
-

List out all of your nodes:

-
- --- - - - - - -
# oc get no
-
-

Determine, from the above suggestions, how you would like to separate out your worker nodes into machine config pools -(MCP).
-In this example we will just use 1 node in each MCP.
-We first need to label the nodes so that they can be put into MCPs. We will do this with the following commands:

-
- --- - - - - - -
oc label node euschannel-worker-0.test.corp node-role.kubernetes.io/mcp-1=
-
-

This will show up when you run the “oc get node” command:

-
- --- - - - - - -
# oc get no
-
-

Now you need to create yaml files that will apply the labels as MCPs. Here is one example:

-
- --- - - - - - -
apiVersion: machineconfiguration.openshift.io/v1
-
-

For each of these, just run “oc apply -f {filename.yaml}”:

-
- --- - - - - - -
# oc apply -f test-mcp-2.yaml
-
-

Now you can run “oc get mcp” again and your new MCPs will show. Please note that you will still see the original worker -and master MCPs that are part of the cluster.

-
- --- - - - - - -
# oc get mcp
-
-
-
-

1.4.4. Environment considerations

-
-

In Telecommunications environments most of the clusters are kept in an “air gapped” network. Therefore, you will need to -start by updating your offline image repository. When choosing which images to include, please review the OCP API -Compatibility Policy section to make sure the cluster will be able to upgrade to the new version of OCP.

-
-
-
-

1.4.5. Platform preparation

-
-

This section should be used as a basic set of checks and verifications to make sure that your cluster is ready for an -upgrade. You will more than likely need to add to this list depending on your environment and deployment.

-
-
-
1.4.5.1. Basic cluster checks
-
-

First you will need to verify that there are no issues within the cluster that will stop the upgrade. A very easy first -check is to run:

-
-
-
-
# oc get pod -A | egrep -vi ‘complete|running’
-
-Nothing should be returned from this command
-
-
-
-

Next verify that all nodes within the cluster are available:

-
-
-
-
# oc get node
-NAME                           STATUS   ROLES         AGE   VERSION
-master-0.fish.corp   Ready    master        92d   v1.24.6+deccab3
-master-1.fish.corp   Ready    master        92d   v1.24.6+deccab3
-master-2.fish.corp   Ready    master        92d   v1.24.6+deccab3
-worker-0.fish.corp   Ready    worker,mcp-1  92d   v1.24.6+deccab3
-worker-1.fish.corp   Ready    worker,mcp-2  92d   v1.24.6+deccab3
-
-
-
-

Now verify that all cluster operators are ready:

-
-
-
-
# oc get co
-NAME                                       VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
-authentication                             4.11.28   True        False         False      6d8h
-baremetal                                  4.11.28   True        False         False      92d
-cloud-controller-manager                   4.11.28   True        False         False      92d
-cloud-credential                           4.11.28   True        False         False      92d
-cluster-autoscaler                         4.11.28   True        False         False      92d
-config-operator                            4.11.28   True        False         False      92d
-console                                    4.11.28   True        False         False      8d
-csi-snapshot-controller                    4.11.28   True        False         False      92d
-dns                                        4.11.28   True        False         False      92d
-etcd                                       4.11.28   True        False         False      92d
-image-registry                             4.11.28   True        False         False      8d
-ingress                                    4.11.28   True        False         False      61m
-insights                                   4.11.28   True        False         False      92d
-kube-apiserver                             4.11.28   True        False         False      92d
-kube-controller-manager                    4.11.28   True        False         False      92d
-kube-scheduler                             4.11.28   True        False         False      92d
-kube-storage-version-migrator              4.11.28   True        False         False      8d
-machine-api                                4.11.28   True        False         False      92d
-machine-approver                           4.11.28   True        False         False      92d
-machine-config                             4.11.28   True        False         False      8d
-marketplace                                4.11.28   True        False         False      92d
-monitoring                                 4.11.28   True        False         False      85d
-network                                    4.11.28   True        False         False      92d
-node-tuning                                4.11.28   True        False         False      8d
-openshift-apiserver                        4.11.28   True        False         False      92d
-openshift-controller-manager               4.11.28   True        False         False      3d23h
-openshift-samples                          4.11.28   True        False         False      8d
-operator-lifecycle-manager                 4.11.28   True        False         False      92d
-operator-lifecycle-manager-catalog         4.11.28   True        False         False      92d
-operator-lifecycle-manager-packageserver   4.11.28   True        False         False      92d
-service-ca                                 4.11.28   True        False         False      92d
-storage                                    4.11.28   True        False         False      92d
-
-
-
-
-
-
-

1.5. Upgrade Process Flow

-
-

1.5.1. Overview

-
-

In an effort to specifically sound like a broken record, the preparation phase of the upgrade process is probably the -most important! If everything to this point in the documentation has been followed then barring any unforeseen issues, -like hardware, the rest of this document should all just be step by step.

-
-
-
1.5.1.1. Step 1: Determine your target release
-
-

Utilize the Red Hat update path tool and/or the -cincinnati graph repository to determine which release you will be moving to.

-
-
-
-
1.5.1.2. Step 2: Change your channel (if needed)
-
-

For a review of all channels you can refer to the channel release documentation.
-Determine what channel you are currently pointed to:

-
-
-
-
[root@m640-blade1 ~]# oc get clusterversion -o=jsonpath='{.items[*].spec}' | jq
-{
-  "channel": "eus-4.10",
-  "clusterID": "81ffff37-1234-4b78-1234-1c12347687c6"
-}
-
-
-
-

Change your channel to point to the new channel:

-
-
-
-
[root@m640-blade1 ~]# oc adm upgrade channel eus-4.12
-[root@m640-blade1 ~]# oc get clusterversion -o=jsonpath='{.items[*].spec}' | jq
-{
-  "channel": "eus-4.12",
-  "clusterID": "81ffff37-1234-4b78-1234-1c12347687c6"
-}
-
-
-
- - - - - -
- - -This is moving from EUS to EUS. It would be similar if you were going Y-stream to Y-stream, it would just be -stable-4.10 moving to stable-4.11. However, if it was a change in Z-stream you do not need to change the channel, -because the Z-release will be within that same Y-stream channel. -
-
-
-
-
1.5.1.3. Step 3: Pause your worker node MCPs
-
-

Here is a very simple example that can be used on a small cluster with only one worker node MCP:

-
-
-
-
[root@m640-blade1 ~]# oc patch mcp/worker --type merge --patch '{"spec":{"paused":true}}'
-
-[root@m640-blade1 ~]# oc describe mcp | egrep 'Name:|Paused' | egrep -v 'rendered|[0-9][0-9]-'
-Name:         master
-  Paused:                              false
-Name:         mcp-1
-  Paused:                             true
-Name:         mcp-2
-  Paused:                             true
-Name:         worker
-  Paused:                              false
-
-
-
-

For larger clusters with multiple MCPs, please refer to the canary rollout documentation.

-
-
-
-
1.5.1.4. Step 4: Double check your cluster health
-
-

Some suggested checks at this time are:

-
-
-
    -
  • -

    Cluster operators

    -
  • -
  • -

    Node status

    -
  • -
  • -

    Look for failed pods

    -
  • -
-
-
-
-
[root@m640-blade1 ~]# oc get co
-NAME                                       VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
-authentication                             4.11.28   True        False         False      10d
-baremetal                                  4.11.28   True        False         False      117d
-cloud-controller-manager                   4.11.28   True        False         False      117d
-cloud-credential                           4.11.28   True        False         False      118d
-cluster-autoscaler                         4.11.28   True        False         False      117d
-config-operator                            4.11.28   True        False         False      117d
-console                                    4.11.28   True        False         False      3d22h
-csi-snapshot-controller                    4.11.28   True        False         False      17d
-dns                                        4.11.28   True        False         False      117d
-etcd                                       4.11.28   True        False         False      117d
-image-registry                             4.11.28   True        False         False      17d
-ingress                                    4.11.28   True        False         False      17d
-insights                                   4.11.28   True        False         False      117d
-kube-apiserver                             4.11.28   True        False         False      117d
-kube-controller-manager                    4.11.28   True        False         False      117d
-kube-scheduler                             4.11.28   True        False         False      117d
-kube-storage-version-migrator              4.11.28   True        False         False      17d
-machine-api                                4.11.28   True        False         False      117d
-machine-approver                           4.11.28   True        False         False      117d
-machine-config                             4.11.28   True        False         False      17d
-marketplace                                4.11.28   True        False         False      117d
-monitoring                                 4.11.28   True        False         False      17d
-network                                    4.11.28   True        False         False      117d
-node-tuning                                4.11.28   True        False         False      33d
-openshift-apiserver                        4.11.28   True        False         False      117d
-openshift-controller-manager               4.11.28   True        False         False      28d
-openshift-samples                          4.11.28   True        False         False      33d
-operator-lifecycle-manager                 4.11.28   True        False         False      117d
-operator-lifecycle-manager-catalog         4.11.28   True        False         False      117d
-operator-lifecycle-manager-packageserver   4.11.28   True        False         False      17d
-service-ca                                 4.11.28   True        False         False      117d
-storage                                    4.11.28   True        False         False      117d
-
-[root@m640-blade1 ~]# oc get no
-NAME                           STATUS   ROLES         AGE    VERSION
-rftest1-master-0.test.corp   Ready    master        118d   v1.24.6+deccab3
-rftest1-master-1.test.corp   Ready    master        118d   v1.24.6+deccab3
-rftest1-master-2.test.corp   Ready    master        118d   v1.24.6+deccab3
-rftest1-worker-0.test.corp   Ready    mcp,worker    117d   v1.24.6+deccab3
-rftest1-worker-1.test.corp   Ready    mcp1,worker   117d   v1.24.6+deccab3
-
-[root@m640-blade1 ~]# oc get po -A | egrep -iv 'running|complete'
-(Note: this should return NOTHING)
-
-
-
-
-
1.5.1.5. Step 5: Begin the cluster upgrade
- --- - - - - - -

[root@m640-blade1 ~]# oc adm upgrade --to-latest

-

Requesting update to 4.9.57

-
-
-
1.5.1.6. Step 6: Monitor the upgrade
-
-
-
[root@m640-blade1 ~]# watch "oc get clusterversion; echo; oc get co | head -1; oc get co | grep 4.9.57; oc get co | grep 4.8.55; echo; oc get no; echo; oc get po -A | egrep -iv 'running|complete'"
-
-
-NAME	  VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
-version   4.8.55    True        True          31m     Working towards 4.9.57: 206 of 738 done (27% complete), waiting on openshift-apiserver
-
-NAME                                       VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
-cloud-controller-manager                   4.9.57    True        False         False	  5m32s
-config-operator                            4.9.57    True        False         False	  68m
-etcd                                       4.9.57    True        False         False	  67m
-kube-apiserver                             4.9.57    True        False         False	  63m
-kube-controller-manager                    4.9.57    True        False         False	  67m
-kube-scheduler                             4.9.57    True        False         False	  66m
-machine-api                                4.9.57    True        False         False	  68m
-openshift-apiserver                        4.9.57    True        False         False	  64m
-authentication                             4.8.55    True        False         False	  49m
-baremetal                                  4.8.55    True        False         False	  68m
-cloud-credential                           4.8.55    True        False         False	  70m
-cluster-autoscaler                         4.8.55    True        False         False	  67m
-console                                    4.8.55    True        False         False	  53m
-csi-snapshot-controller                    4.8.55    True        False         False	  53m
-dns                                        4.8.55    True        False         False	  67m
-image-registry                             4.8.55    True        False         False	  14m
-ingress                                    4.8.55    True        False         False	  61m
-insights                                   4.8.55    True        False         False	  62m
-kube-storage-version-migrator              4.8.55    True        False         False	  14m
-machine-approver                           4.8.55    True        False         False	  68m
-machine-config                             4.8.55    True        False         False	  67m
-marketplace                                4.8.55    True        False         False	  67m
-monitoring                                 4.8.55    True        False         False	  60m
-network                                    4.8.55    True        False         False	  69m
-node-tuning                                4.8.55    True        False         False	  67m
-openshift-controller-manager               4.8.55    True        False         False	  67m
-openshift-samples                          4.8.55    True        False         False	  64m
-operator-lifecycle-manager                 4.8.55    True        False         False	  68m
-operator-lifecycle-manager-catalog         4.8.55    True        False         False	  68m
-operator-lifecycle-manager-packageserver   4.8.55    True        False         False	  53m
-service-ca                                 4.8.55    True        False         False	  68m
-storage                                    4.8.55    True        False         False	  68m
-
-NAME                              STATUS   ROLES    AGE   VERSION
-euschannel-master-0.test.corp   Ready    master   70m   v1.21.14+a17bdb3
-euschannel-master-1.test.corp   Ready    master   70m   v1.21.14+a17bdb3
-euschannel-master-2.test.corp   Ready    master   70m   v1.21.14+a17bdb3
-euschannel-worker-0.test.corp   Ready    worker   62m   v1.21.14+a17bdb3
-euschannel-worker-1.test.corp   Ready    worker   62m   v1.21.14+a17bdb3
-
-NAMESPACE                                          NAME                                                         READY   STATUS      RESTARTS      AGE
-
-
-
-
-
1.5.1.7. Step 6b: EUS to EUS
-
-

When upgrading from EUS to EUS release, refer to the section below for details.

-
-
-
-
1.5.1.8. Step 7: Un-Pause the worker MCP(s)
-
-

Once the Control Plane PODs and nodes have completed their upgrade then you can begin the work on the worker node -upgrades.
-Refer to the section above discussing recommendations for when to un-pause specific MCPs

-
-
-
-
-

1.5.2. Z-Stream

-
-

The Z-stream updates to a cluster should become business as usual at some point with as minimal of an impact as -possible. There are many reasons for needing a Z-stream update usually coming in the form of security updates and bug -fixes. -Fortunately, during a Z-stream update there is minimal change to the control plane and therefore there is less of an -impact to the cluster. -Unfortunately, there is still a high likelihood that all nodes within the cluster will need to be rebooted. -Z-stream updates are a good time to focus on adding in firmware updates as they have less cluster impact. Firmware -updates also might be required shortly after Y-stream or EUS updates as hardware vendors may have released bug fixes for -problems that were found with new releases of RHCOS. There might also be new functionality that is released in some NIC -firmware as new releases of RHCOS come out. Please work closely with your hardware vendors to keep up to date with your -firmware when possible.

-
-
-
-

1.5.3. EUS to EUS

-
-

EUS (Extended update support) is an even numbered release of OpenShift -that will have extended support for up to 24 months. This will allow Red Hat customers and partners to stay on a -specific release much longer than the standard 4 months between OpenShift or Kubernetes releases or the 18 months of -maintenance support. This will reduce the number of upgrades that most Telcos will need to perform as they will be able -to jump 2 minor releases with each upgrade.

-
-
-

In an upgrade like this each application will need to verify against the API deprecated list -to make sure all your CNFs will still function once you upgrade to the new version of OCP.

-
-
-

This jump in releases does come at a small price, due to the need to have the control plane upgraded to the intermediate -Y-stream before getting to the next even release. Here is a quick walk through of the process the cluster will go -through during the EUS to EUS upgrade:

-
-
-
-Before upgrade ctrl plane -
-
Figure 7. Control Plane before upgrade starts
-
-
-

This example shows a rack of servers in a cluster on the left with a blown up explanation of 6 of these -servers on the right.

-
-
-
-Upgrade ctrl plane Yp1 -
-
Figure 8. Control Plane after first step of the upgrade
-
-
-

In this figure it shows the first step of the upgrade process where the control plane is upgraded to Y+1 -along with all of the operator PODs in the cluster (even on those running on worker nodes). The last part of this step -is to reboot the control plane nodes.

-
-
-
-Upgrade ctrl plane new EUS -
-
Figure 9. Control Plane after upgrade to the next EUS release
-
-
-

The second step in this upgrade is to upgrade to Y+2. Just the same as the previous step, incrementing the -cluster operators, then the application operators and last of all rebooting and updating RCHOS on the control plane -nodes.

-
-
-

At this point in the upgrade the cluster will have the cluster control plane and all operators running on the newer EUS -release or Y+2. The worker nodes will still be on the older EUS release. Since OCP and K8s control planes are backward -compatible up to Y-2, there should be no problems in communications or control of PODs running on the worker nodes.

-
-
-

The last part of the upgrade is to reboot and update all of the worker nodes within the cluster. Here is an example of -how this could occur within one specific MCP of worker nodes. We will set the maxUnavailable to 20% and allow 2 out of -the 10 worker nodes reboot and upgrade at a time.

-
-
-
-upgrade beginning worker mcp -
-
Figure 10. Showing list of worker nodes in MCP1 before upgrading them
-
-
-
-upgrade worker mcp 0 1 -
-
Figure 11. Upgrade Worker-0 & Worker-1
-
-
-
-upgrade worker mcp 2 3 -
-
Figure 12. Upgrade Worker-2 & Worker-3
-
-
-
-upgrade worker mcp 4 5 -
-
Figure 13. Upgrade Worker-4 & Worker-5
-
-
-
-upgrade worker mcp 6 7 -
-
Figure 14. Upgrade Worker-6 & Worker-7
-
-
-
-upgrade worker mcp 8 9 -
-
Figure 15. Upgrade Worker-8 & Worker-9
-
-
-
-upgrade worker mcp complete -
-
Figure 16. Complete upgrade of MCP2
-
-
-

Each MCP does not have to be independently updated, that part is completely up to the operations team and may also -relate to several factors in the CNF(s) that are running on the cluster.

-
-
-

Once all of the worker node MCPs have been updated you will then have a fully upgraded cluster.

-
-
-
-
-
-
- -
-
-
-
-
-
- -
- -
-
-

Page Not Found

-
-

The page you’re looking for does not exist. It may have been moved.

-
-
-

If you arrived on this page by clicking on a link, please notify the owner of the site that the link is broken. -If you typed the URL of this page manually, please double check that you entered the address correctly.

-
-
- -
-
-
-