Merge pull request #8 from rfisher001/lab-4.12To4.14

Added the last set of documents
RHsyseng · May 16, 2024 · cc87b07 · cc87b07
2 parents e6b6012 + 3f67d83
commit cc87b07
Show file tree

Hide file tree

Showing 10 changed files with 887 additions and 569 deletions.
diff --git a/documentation/modules/ROOT/assets/images/4.12to4.14-upgrade-graph.png b/documentation/modules/ROOT/assets/images/4.12to4.14-upgrade-graph.png
diff --git a/documentation/modules/ROOT/nav.adoc b/documentation/modules/ROOT/nav.adoc
@@ -29,5 +29,37 @@
 **** xref:OCP-upgrade-prep.adoc#labeling-nodes[Labeling nodes]
 **** xref:OCP-upgrade-prep.adoc#applying-mcps-according-to-label[Applying MCPs according to label]
 **** xref:OCP-upgrade-prep.adoc#monitor-mcps[Monitor MCP formation]
+** xref:OCP-upgrade-prep.adoc#enviro-considerations[Environment considerations]
+** xref:OCP-upgrade-prep.adoc#platform-prep[Platform preparation]
+*** xref:OCP-upgrade-prep.adoc#basic-cluster-checks[Basic cluster checks]
 
-* xref:Applying-MCPs.adoc[Applying MCPs]
+* xref:Upgrade-process.adoc[OCP Upgrade Process Flow]
+** xref:Upgrade-process.adoc#overview[Overview]
+** xref:Upgrade-process.adoc#step-1[Step 1: Determine your target release]
+** xref:Upgrade-process.adoc#step-2[Step 2: Change your channel]
+*** xref:Upgrade-process.adoc#z-stream-upgrade[Z-Stream Upgrade]
+*** xref:Upgrade-process.adoc#eus-eus-upgrade[EUS to EUS Upgrade]
+*** xref:Upgrade-process.adoc#early-eus-upgrade-testing[Early testing of EUS to EUS upgrade]
+*** xref:Upgrade-process.adoc#y-stream-upgrade[Y-Stream Upgrade]
+
+* xref:Upgrade-process-step-3.adoc[OCP Upgrade Process Flow - Continued - Step 3]
+** xref:Upgrade-process-step-3.adoc#step-3-pause-mcp[Step 3: Pause your worker node MCPs]
+** xref:Upgrade-process-step-3.adoc#step-4-backup-etcd[Step 4: Backup etcd]
+** xref:Upgrade-process-step-3.adoc##step-5-health-check[Step 5: Double check your cluster health]
+
+* xref:Upgrade-process-step-6.adoc[OCP Upgrade Process Flow - Continued - Step 6]
+** xref:Upgrade-process-step-6.adoc#step-6-admin-acknowledge[Step 6: Acknowledge the upgrade]
+** xref:Upgrade-process-step-6.adoc#step-7-begin-upgrade[Step 7: Begin the cluster upgrade]
+** xref:Upgrade-process-step-6.adoc#step-8-monitor[Step 8: Monitor the upgrade]
+
+* xref:Upgrade-process-step-9.adoc[OCP Upgrade Process Flow - Continued - Step 9]
+** xref:Upgrade-process-step-9.adoc#step-9-upgrade-operators[Step 9: Upgrade OLM Operators]
+** xref:Upgrade-process-step-9.adoc#if-then-goto[If-Then GO TO]
+** xref:Upgrade-process-step-9.adoc#step-10-y-stream[Step 10: Second Y-stream update]
+*** xref:Upgrade-process-step-9.adoc#admin-ack[Admin Acknowledge]
+*** xref:Upgrade-process-step-9.adoc#start-y-stream-ctrl-pln-upgrade[Start Y-stream Control Plane Upgrade]
+** xref:Upgrade-process-step-9.adoc#step-12-upgrade-operators[Step 12: Upgrade All of the OLM Operators]
+
+* xref:Upgrade-process-step-13.adoc[OCP Upgrade Process Flow - Continued - Step 13]
+** xref:Upgrade-process-step-13.adoc#step-13-un-pause-worker[Step 13: Un-Pause the worker MCP(s)]
+** xref:Upgrade-process-step-13.adoc#step-14-verify-health[Step 14: Verify Health of Cluster]
diff --git a/documentation/modules/ROOT/pages/API-Compatibility.adoc b/documentation/modules/ROOT/pages/API-Compatibility.adoc
@@ -12,9 +12,6 @@ The most important thing to understand when considering which Z-release to upgra
 == Kubernetes Version Skew
 Support of specific API versions needs to be maintained by each cluster operator. With new releases of operators come new APIs. Therefore, the changes or skews in APIs need to be maintained. To a certain extent the APIs can be compatible across several releases of an operator. This list of operators and the releases that are compatible are at: https://kubernetes.io/releases/version-skew-policy 
 
-The easiest way verify your application functionality will still work, is to make sure that you follow 
-
-
 [#ocp-upgrade-path]
 == OpenShift Upgrade Path
 Can I choose any Z-release in the new EUS or Y-stream version? -NO

diff --git a/documentation/modules/ROOT/pages/OCP-upgrade-prep.adoc b/documentation/modules/ROOT/pages/OCP-upgrade-prep.adoc
@@ -37,7 +37,7 @@ openshift-operator-lifecycle-manager   packageserver                        Pack
 
 [#OLM-Operator-compatibility]
 === OLM Operator compatibility
-There is a set of Red Hat Operators that are NOT part of the cluster operators which are otherwise known as the OLM installed operators. To determine the compatibility of these OLM installed operators there is a great web based tool that can be used to determine which versions of OCP are compatible with specific releases of an Operator. This tool is meant to tell you if you need to upgrade an Operator after each Y-Stream upgrade or if you can wait until you have fully upgraded to the next EUS release.
+There is a set of Red Hat Operators that are NOT part of the cluster operators which are otherwise known as the OLM installed operators. To determine the compatibility of these OLM installed operators there is a great web based tool that can be used to determine which versions of OCP are compatible with specific releases of an Operator. https://access.redhat.com/labs/ocpouic/?upgrade_path=4.12%20to%204.14[This tool] is meant to tell you if you need to upgrade an Operator after each Y-Stream upgrade or if you can wait until you have fully upgraded to the next EUS release.
 In Step 9 under the “Upgrade Process Flow” section you will find additional information regarding what you need to do if an Operator needs to be upgraded after performing the first Y-Stream Control Plane upgrade.
 
 NOTE: Some Operators are compatible with several releases of OCP. So, you may not need to upgrade until you complete the cluster upgrade. This is shown in Step 13 of the Upgrade Process Flow.
@@ -117,11 +117,11 @@ NOTE: Review what is listed in the “ROLES” column, this will get updated as
 ----
 # oc get no
 NAME                                           STATUS   ROLES    AGE   VERSION
-euschannel-ctlplane-0.test.corp   Ready    master   25d   v1.23.12+8a6bfe4
-euschannel-ctlplane-1.test.corp   Ready    master   25d   v1.23.12+8a6bfe4
-euschannel-ctlplane-2.test.corp   Ready    master   25d   v1.23.12+8a6bfe4
-euschannel-worker-0.test.corp     Ready    worker   25d   v1.23.12+8a6bfe4
-euschannel-worker-1.test.corp     Ready    worker   25d   v1.23.12+8a6bfe4
+ctrl-plane-0   Ready    control-plane,master   39d   v1.25.10+28ed2d7
+ctrl-plane-1   Ready    control-plane,master   39d   v1.25.10+28ed2d7
+ctrl-plane-2   Ready    control-plane,master   39d   v1.25.10+28ed2d7
+worker-0       Ready    worker           39d   v1.25.10+28ed2d7
+worker-1       Ready    worker           39d   v1.25.10+28ed2d7
 ----
 
 Determine, from the above suggestions, how you would like to separate out your worker nodes into machine config pools (MCP).
@@ -143,8 +143,8 @@ We first need to label the nodes so that they can be put into MCPs. We will do t
 
 [source, bash]
 ----
-oc label node euschannel-worker-0.test.corp node-role.kubernetes.io/mcp-1=
-oc label node euschannel-worker-1.test.corp node-role.kubernetes.io/mcp-2=
+oc label node worker-0 node-role.kubernetes.io/mcp-1=
+oc label node worker-1 node-role.kubernetes.io/mcp-2=
 ----
 
 NOTE: The labels will show up when you run the “oc get node” command:
@@ -153,11 +153,11 @@ NOTE: The labels will show up when you run the “oc get node” command:
 ----
 # oc get no
 NAME                                STATUS   ROLES          AGE   VERSION
-euschannel-ctlplane-0.test.corp   Ready    master         25d   v1.23.12+8a6bfe4
-euschannel-ctlplane-1.test.corp   Ready    master         25d   v1.23.12+8a6bfe4
-euschannel-ctlplane-2.test.corp   Ready    master         25d   v1.23.12+8a6bfe4
-euschannel-worker-0.test.corp     Ready    mcp-1,worker   25d   v1.23.12+8a6bfe4
-euschannel-worker-1.test.corp     Ready    mcp-2,worker   25d   v1.23.12+8a6bfe4
+ctrl-plane-0   Ready    control-plane,master   39d   v1.25.10+28ed2d7
+ctrl-plane-1   Ready    control-plane,master   39d   v1.25.10+28ed2d7
+ctrl-plane-2   Ready    control-plane,master   39d   v1.25.10+28ed2d7
+worker-0       Ready    mcp-1,worker           39d   v1.25.10+28ed2d7
+worker-1       Ready    mcp-2,worker           39d   v1.25.10+28ed2d7
 ----
 
 [#applying-mcps-according-to-label]
@@ -237,4 +237,66 @@ master   rendered-master-b…e83   True      False      False      3
 mcp-1    rendered-mcp-1-2…c4f    True      False      False      1              1                 1                     0                      7m33s
 mcp-2    rendered-mcp-2-2…c4f    True      False      False      1              1                 1                     0                      51s
 worker   rendered-worker-2…c4f   True      False      False      0              0                 0                     0                      25d
-----
+----
+
+[#enviro-considerations]
+== Environment considerations
+
+In Telecommunications environments most of the clusters are kept in an “air gapped” or disconnected network. Therefore, you will need to update your offline image repository. When choosing which images to include, please review the OCP API Compatibility Policy section to make sure the cluster will be able to upgrade to the new version of OCP. Setting up and managing an offline image repository is currently out of scope at this time but will be added at a later date.
+
+[#platform-prep]
+== Platform preparation
+
+This section should be used as a basic set of checks and verifications to make sure that your cluster is ready for an upgrade.
+
+[#basic-cluster-checks]
+=== Basic cluster checks
+
+First you will need to verify that there are no issues with failed pods within the cluster that will stop the upgrade. A very easy first check is to run:
+
+[source, bash]
+----
+[cnf@utility ~]$ oc get po -A | egrep -vi 'complete|running'
+NAMESPACE         NAME       READY   STATUS      RESTARTS        AGE
+[cnf@utility ~]$
+----
+
+NOTE: You may need to run this twice if there are pods that are in pending state as they may just have moved around due to normal operating conditions of the cluster.
+
+If there are problems with pods, please review the troubleshooting documentation to determine what the issue is with the pod(s).
+
+Next verify that all nodes within the cluster are available:
+[source, bash]
+----
+jcl@utility ~]$ oc get no
+NAME           STATUS   ROLES                  AGE   VERSION
+ctrl-plane-0   Ready    control-plane,master   32d   v1.25.14+a52e8df
+ctrl-plane-1   Ready    control-plane,master   32d   v1.25.14+a52e8df
+ctrl-plane-2   Ready    control-plane,master   32d   v1.25.14+a52e8df
+worker-0       Ready    mcp-1,worker           32d   v1.25.14+a52e8df
+worker-1       Ready    mcp-2,worker           32d   v1.25.14+a52e8df
+----
+
+Verify that all baremetal nodes are fully provisioned and ready in the cluster. Here is an example of a cluster that has a baremetal node that had an error while provisioning:
+[source, bash]
+----
+cnf@utility ~]$ oc get bmh -n openshift-machine-api
+NAME           STATE       CONSUMER                   ONLINE   ERROR   AGE
+ctrl-plane-0   unmanaged   cnf-58879-master-0         true             33d
+ctrl-plane-1   unmanaged   cnf-58879-master-1         true             33d
+ctrl-plane-2   unmanaged   cnf-58879-master-2         true             33d
+worker-0       unmanaged   cnf-58879-worker-0-45879   true             33d
+worker-1       unmanaged   cnf-58879-worker-0-dszsh   true             33d
+----
+
+Now verify that all cluster operators are ready:
+[source, bash]
+----
+[cnf@utility ~]$ oc get co
+NAME                                       VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
+authentication                             4.12.45   True        False         False      17h
+baremetal                                  4.12.45   True        False         False      32d
+...
+service-ca                                 4.12.45   True        False         False      32d
+storage                                    4.12.45   True        False         False      32d
+----
diff --git a/documentation/modules/ROOT/pages/Upgrade-process-step-13.adoc b/documentation/modules/ROOT/pages/Upgrade-process-step-13.adoc
@@ -0,0 +1,136 @@
+= OCP Upgrade Process Flow - Continued
+include::_attributes.adoc[]
+:profile: core-lcm-lab
+
+[#step-13-un-pause-worker]
+== Step 13: Un-Pause the worker MCP(s)
+Now you have gotten to the fun but sometimes long part of the upgrade process. Each of the worker nodes in the cluster will need to reboot to upgrade to the new EUS, Y-stream or Z-stream version.
+
+You will need to determine how many MCPs you will want to upgrade at a time, depending on how many CNF pods can be taken down at a time and how your PDS and affinity are set up.
+
+Here is a simple check and list of nodes with MCP:
+
+[source, bash]
+----
+[cnf@utility ~]$ oc get mcp
+NAME     CONFIG                                             UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
+master   rendered-master-c9a52144456dbff9c9af9c5a37d1b614   True      False      False      3              3                   3                     0                      36d
+mcp-1    rendered-mcp-1-07fe50b9ad51fae43ed212e84e1dcc8e    False     False      False      1              0                   0                     0                      47h
+mcp-2    rendered-mcp-2-07fe50b9ad51fae43ed212e84e1dcc8e    False     False      False      1              0                   0                     0                      47h
+worker   rendered-worker-f1ab7b9a768e1b0ac9290a18817f60f0   True      False      False      0              0                   0                     0                      36d
+
+[cnf@utility ~]$ oc get no
+NAME           STATUS   ROLES                  AGE   VERSION
+ctrl-plane-0   Ready    control-plane,master   36d   v1.27.10+28ed2d7
+ctrl-plane-1   Ready    control-plane,master   36d   v1.27.10+28ed2d7
+ctrl-plane-2   Ready    control-plane,master   36d   v1.27.10+28ed2d7
+worker-0       Ready    mcp-1,worker           36d   v1.25.14+a52e8df
+worker-1       Ready    mcp-2,worker           36d   v1.25.14+a52e8df
+
+[cnf@utility ~]$ oc get mcp -o json | jq -r '["MCP","Paused"], ["---","------"], (.items[] | [(.metadata.name), (.spec.paused)]) | @tsv' | grep -v worker
+MCP     Paused
+---     ------
+master  false
+mcp-1   true
+mcp-2   true
+----
+
+Unpause a MCP with:
+[source, bash]
+----
+[jcl@utility ~]$ oc patch mcp/mcp-1 --type merge --patch '{"spec":{"paused":false}}'
+
+machineconfigpool.machineconfiguration.openshift.io/mcp-1 patched
+
+[jcl@utility ~]$ oc get mcp -o json | jq -r '["MCP","Paused"], ["---","------"], (.items[] | [(.metadata.name), (.spec.paused)]) | @tsv' | grep -v worker
+
+MCP     Paused
+---     ------
+master  false
+mcp-1   false
+mcp-2   true
+----
+
+As each MCP is complete, then you can unpause the next MCP.
+[source, bash]
+----
+NAME           STATUS                        ROLES                  AGE   VERSION
+ctrl-plane-0   Ready                         control-plane,master   36d   v1.27.10+28ed2d7
+ctrl-plane-1   Ready                         control-plane,master   36d   v1.27.10+28ed2d7
+ctrl-plane-2   Ready                         control-plane,master   36d   v1.27.10+28ed2d7
+worker-0       Ready                         mcp-1,worker           36d   v1.27.10+28ed2d7
+worker-1       NotReady,SchedulingDisabled   mcp-2,worker           36d   v1.25.14+a52e8df
+----
+
+[#step-14-verify-health]
+== Step 14: Verify Health of Cluster
+
+Here is a set of commands that you should run after upgrading the cluster to verify everything is back up and running properly:
+
+* oc get clusterversion +
+This should return showing the new cluster version and the “progressing” column should show “false”
+* oc get node +
+All nodes in the cluster should have a status of “ready” and should be at the same version 
+* oc get mcp -o json | jq -r '["MCP","Paused"], ["---","------"], (.items[] | [(.metadata.name), (.spec.paused)]) | @tsv' | grep -v worker +
+This should show “false” for the paused column for all MCPs
+* oc get co +
+All cluster operators should show available = true, progressing = false & degraded = false
+* oc get po -A | egrep -iv 'complete|running' +
+This should return completely empty but you may show a few pods still moving around right after the upgrade. You may need to watch this for a while to make sure everything is clear. 
+
+[source, bash]
+----
+[jcl@utility ~]$ oc get clusterversion
+NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
+version   4.14.11   True        False         3d21h   Cluster version is 4.14.11
+[jcl@utility ~]$ oc get no
+NAME           STATUS   ROLES                  AGE   VERSION
+ctrl-plane-0   Ready    control-plane,master   39d   v1.27.10+28ed2d7
+ctrl-plane-1   Ready    control-plane,master   39d   v1.27.10+28ed2d7
+ctrl-plane-2   Ready    control-plane,master   39d   v1.27.10+28ed2d7
+worker-0       Ready    mcp-1,worker           39d   v1.27.10+28ed2d7
+worker-1       Ready    mcp-2,worker           39d   v1.27.10+28ed2d7
+[jcl@utility ~]$ oc get mcp -o json | jq -r '["MCP","Paused"], ["---","------"], (.items[] | [(.metadata.name), (.spec.paused)]) | @tsv' | grep -v worker
+MCP     Paused
+---     ------
+master  false
+mcp-1   false
+mcp-2   false
+[jcl@utility ~]$ oc get co
+NAME                                       VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
+authentication                             4.14.11   True        False         False      7d13h
+baremetal                                  4.14.11   True        False         False      39d
+cloud-controller-manager                   4.14.11   True        False         False      39d
+cloud-credential                           4.14.11   True        False         False      39d
+cluster-autoscaler                         4.14.11   True        False         False      39d
+config-operator                            4.14.11   True        False         False      39d
+console                                    4.14.11   True        False         False      38d
+control-plane-machine-set                  4.14.11   True        False         False      39d
+csi-snapshot-controller                    4.14.11   True        False         False      39d
+dns                                        4.14.11   True        False         False      39d
+etcd                                       4.14.11   True        False         False      39d
+image-registry                             4.14.11   True        False         False      39d
+ingress                                    4.14.11   True        False         False      38d
+insights                                   4.14.11   True        False         False      39d
+kube-apiserver                             4.14.11   True        False         False      39d
+kube-controller-manager                    4.14.11   True        False         False      39d
+kube-scheduler                             4.14.11   True        False         False      39d
+kube-storage-version-migrator              4.14.11   True        False         False      3d18h
+machine-api                                4.14.11   True        False         False      39d
+machine-approver                           4.14.11   True        False         False      39d
+machine-config                             4.14.11   True        False         False      39d
+marketplace                                4.14.11   True        False         False      39d
+monitoring                                 4.14.11   True        False         False      38d
+network                                    4.14.11   True        False         False      39d
+node-tuning                                4.14.11   True        False         False      3d22h
+openshift-apiserver                        4.14.11   True        False         False      7d13h
+openshift-controller-manager               4.14.11   True        False         False      39d
+openshift-samples                          4.14.11   True        False         False      3d22h
+operator-lifecycle-manager                 4.14.11   True        False         False      39d
+operator-lifecycle-manager-catalog         4.14.11   True        False         False      39d
+operator-lifecycle-manager-packageserver   4.14.11   True        False         False      39d
+service-ca                                 4.14.11   True        False         False      39d
+storage                                    4.14.11   True        False         False      39d
+[jcl@utility ~]$ oc get po -A | egrep -iv 'complete|running'
+NAMESPACE                                          NAME                                                        READY   STATUS      RESTARTS        AGE
+----