Skip to content

Commit

Permalink
Merge pull request #8 from rfisher001/lab-4.12To4.14
Browse files Browse the repository at this point in the history
Added the last set of documents
  • Loading branch information
rfisher001 authored May 16, 2024
2 parents e6b6012 + 3f67d83 commit cc87b07
Show file tree
Hide file tree
Showing 10 changed files with 887 additions and 569 deletions.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
34 changes: 33 additions & 1 deletion documentation/modules/ROOT/nav.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -29,5 +29,37 @@
**** xref:OCP-upgrade-prep.adoc#labeling-nodes[Labeling nodes]
**** xref:OCP-upgrade-prep.adoc#applying-mcps-according-to-label[Applying MCPs according to label]
**** xref:OCP-upgrade-prep.adoc#monitor-mcps[Monitor MCP formation]
** xref:OCP-upgrade-prep.adoc#enviro-considerations[Environment considerations]
** xref:OCP-upgrade-prep.adoc#platform-prep[Platform preparation]
*** xref:OCP-upgrade-prep.adoc#basic-cluster-checks[Basic cluster checks]
* xref:Applying-MCPs.adoc[Applying MCPs]
* xref:Upgrade-process.adoc[OCP Upgrade Process Flow]
** xref:Upgrade-process.adoc#overview[Overview]
** xref:Upgrade-process.adoc#step-1[Step 1: Determine your target release]
** xref:Upgrade-process.adoc#step-2[Step 2: Change your channel]
*** xref:Upgrade-process.adoc#z-stream-upgrade[Z-Stream Upgrade]
*** xref:Upgrade-process.adoc#eus-eus-upgrade[EUS to EUS Upgrade]
*** xref:Upgrade-process.adoc#early-eus-upgrade-testing[Early testing of EUS to EUS upgrade]
*** xref:Upgrade-process.adoc#y-stream-upgrade[Y-Stream Upgrade]
* xref:Upgrade-process-step-3.adoc[OCP Upgrade Process Flow - Continued - Step 3]
** xref:Upgrade-process-step-3.adoc#step-3-pause-mcp[Step 3: Pause your worker node MCPs]
** xref:Upgrade-process-step-3.adoc#step-4-backup-etcd[Step 4: Backup etcd]
** xref:Upgrade-process-step-3.adoc##step-5-health-check[Step 5: Double check your cluster health]
* xref:Upgrade-process-step-6.adoc[OCP Upgrade Process Flow - Continued - Step 6]
** xref:Upgrade-process-step-6.adoc#step-6-admin-acknowledge[Step 6: Acknowledge the upgrade]
** xref:Upgrade-process-step-6.adoc#step-7-begin-upgrade[Step 7: Begin the cluster upgrade]
** xref:Upgrade-process-step-6.adoc#step-8-monitor[Step 8: Monitor the upgrade]
* xref:Upgrade-process-step-9.adoc[OCP Upgrade Process Flow - Continued - Step 9]
** xref:Upgrade-process-step-9.adoc#step-9-upgrade-operators[Step 9: Upgrade OLM Operators]
** xref:Upgrade-process-step-9.adoc#if-then-goto[If-Then GO TO]
** xref:Upgrade-process-step-9.adoc#step-10-y-stream[Step 10: Second Y-stream update]
*** xref:Upgrade-process-step-9.adoc#admin-ack[Admin Acknowledge]
*** xref:Upgrade-process-step-9.adoc#start-y-stream-ctrl-pln-upgrade[Start Y-stream Control Plane Upgrade]
** xref:Upgrade-process-step-9.adoc#step-12-upgrade-operators[Step 12: Upgrade All of the OLM Operators]
* xref:Upgrade-process-step-13.adoc[OCP Upgrade Process Flow - Continued - Step 13]
** xref:Upgrade-process-step-13.adoc#step-13-un-pause-worker[Step 13: Un-Pause the worker MCP(s)]
** xref:Upgrade-process-step-13.adoc#step-14-verify-health[Step 14: Verify Health of Cluster]
3 changes: 0 additions & 3 deletions documentation/modules/ROOT/pages/API-Compatibility.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -12,9 +12,6 @@ The most important thing to understand when considering which Z-release to upgra
== Kubernetes Version Skew
Support of specific API versions needs to be maintained by each cluster operator. With new releases of operators come new APIs. Therefore, the changes or skews in APIs need to be maintained. To a certain extent the APIs can be compatible across several releases of an operator. This list of operators and the releases that are compatible are at: https://kubernetes.io/releases/version-skew-policy

The easiest way verify your application functionality will still work, is to make sure that you follow


[#ocp-upgrade-path]
== OpenShift Upgrade Path
Can I choose any Z-release in the new EUS or Y-stream version? -NO
Expand Down
90 changes: 76 additions & 14 deletions documentation/modules/ROOT/pages/OCP-upgrade-prep.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ openshift-operator-lifecycle-manager packageserver Pack

[#OLM-Operator-compatibility]
=== OLM Operator compatibility
There is a set of Red Hat Operators that are NOT part of the cluster operators which are otherwise known as the OLM installed operators. To determine the compatibility of these OLM installed operators there is a great web based tool that can be used to determine which versions of OCP are compatible with specific releases of an Operator. This tool is meant to tell you if you need to upgrade an Operator after each Y-Stream upgrade or if you can wait until you have fully upgraded to the next EUS release.
There is a set of Red Hat Operators that are NOT part of the cluster operators which are otherwise known as the OLM installed operators. To determine the compatibility of these OLM installed operators there is a great web based tool that can be used to determine which versions of OCP are compatible with specific releases of an Operator. https://access.redhat.com/labs/ocpouic/?upgrade_path=4.12%20to%204.14[This tool] is meant to tell you if you need to upgrade an Operator after each Y-Stream upgrade or if you can wait until you have fully upgraded to the next EUS release.
In Step 9 under the “Upgrade Process Flow” section you will find additional information regarding what you need to do if an Operator needs to be upgraded after performing the first Y-Stream Control Plane upgrade.

NOTE: Some Operators are compatible with several releases of OCP. So, you may not need to upgrade until you complete the cluster upgrade. This is shown in Step 13 of the Upgrade Process Flow.
Expand Down Expand Up @@ -117,11 +117,11 @@ NOTE: Review what is listed in the “ROLES” column, this will get updated as
----
# oc get no
NAME STATUS ROLES AGE VERSION
euschannel-ctlplane-0.test.corp Ready master 25d v1.23.12+8a6bfe4
euschannel-ctlplane-1.test.corp Ready master 25d v1.23.12+8a6bfe4
euschannel-ctlplane-2.test.corp Ready master 25d v1.23.12+8a6bfe4
euschannel-worker-0.test.corp Ready worker 25d v1.23.12+8a6bfe4
euschannel-worker-1.test.corp Ready worker 25d v1.23.12+8a6bfe4
ctrl-plane-0 Ready control-plane,master 39d v1.25.10+28ed2d7
ctrl-plane-1 Ready control-plane,master 39d v1.25.10+28ed2d7
ctrl-plane-2 Ready control-plane,master 39d v1.25.10+28ed2d7
worker-0 Ready worker 39d v1.25.10+28ed2d7
worker-1 Ready worker 39d v1.25.10+28ed2d7
----

Determine, from the above suggestions, how you would like to separate out your worker nodes into machine config pools (MCP).
Expand All @@ -143,8 +143,8 @@ We first need to label the nodes so that they can be put into MCPs. We will do t

[source, bash]
----
oc label node euschannel-worker-0.test.corp node-role.kubernetes.io/mcp-1=
oc label node euschannel-worker-1.test.corp node-role.kubernetes.io/mcp-2=
oc label node worker-0 node-role.kubernetes.io/mcp-1=
oc label node worker-1 node-role.kubernetes.io/mcp-2=
----

NOTE: The labels will show up when you run the “oc get node” command:
Expand All @@ -153,11 +153,11 @@ NOTE: The labels will show up when you run the “oc get node” command:
----
# oc get no
NAME STATUS ROLES AGE VERSION
euschannel-ctlplane-0.test.corp Ready master 25d v1.23.12+8a6bfe4
euschannel-ctlplane-1.test.corp Ready master 25d v1.23.12+8a6bfe4
euschannel-ctlplane-2.test.corp Ready master 25d v1.23.12+8a6bfe4
euschannel-worker-0.test.corp Ready mcp-1,worker 25d v1.23.12+8a6bfe4
euschannel-worker-1.test.corp Ready mcp-2,worker 25d v1.23.12+8a6bfe4
ctrl-plane-0 Ready control-plane,master 39d v1.25.10+28ed2d7
ctrl-plane-1 Ready control-plane,master 39d v1.25.10+28ed2d7
ctrl-plane-2 Ready control-plane,master 39d v1.25.10+28ed2d7
worker-0 Ready mcp-1,worker 39d v1.25.10+28ed2d7
worker-1 Ready mcp-2,worker 39d v1.25.10+28ed2d7
----

[#applying-mcps-according-to-label]
Expand Down Expand Up @@ -237,4 +237,66 @@ master rendered-master-b…e83 True False False 3
mcp-1 rendered-mcp-1-2…c4f True False False 1 1 1 0 7m33s
mcp-2 rendered-mcp-2-2…c4f True False False 1 1 1 0 51s
worker rendered-worker-2…c4f True False False 0 0 0 0 25d
----
----

[#enviro-considerations]
== Environment considerations

In Telecommunications environments most of the clusters are kept in an “air gapped” or disconnected network. Therefore, you will need to update your offline image repository. When choosing which images to include, please review the OCP API Compatibility Policy section to make sure the cluster will be able to upgrade to the new version of OCP. Setting up and managing an offline image repository is currently out of scope at this time but will be added at a later date.

[#platform-prep]
== Platform preparation

This section should be used as a basic set of checks and verifications to make sure that your cluster is ready for an upgrade.

[#basic-cluster-checks]
=== Basic cluster checks

First you will need to verify that there are no issues with failed pods within the cluster that will stop the upgrade. A very easy first check is to run:

[source, bash]
----
[cnf@utility ~]$ oc get po -A | egrep -vi 'complete|running'
NAMESPACE NAME READY STATUS RESTARTS AGE
[cnf@utility ~]$
----

NOTE: You may need to run this twice if there are pods that are in pending state as they may just have moved around due to normal operating conditions of the cluster.

If there are problems with pods, please review the troubleshooting documentation to determine what the issue is with the pod(s).

Next verify that all nodes within the cluster are available:
[source, bash]
----
jcl@utility ~]$ oc get no
NAME STATUS ROLES AGE VERSION
ctrl-plane-0 Ready control-plane,master 32d v1.25.14+a52e8df
ctrl-plane-1 Ready control-plane,master 32d v1.25.14+a52e8df
ctrl-plane-2 Ready control-plane,master 32d v1.25.14+a52e8df
worker-0 Ready mcp-1,worker 32d v1.25.14+a52e8df
worker-1 Ready mcp-2,worker 32d v1.25.14+a52e8df
----

Verify that all baremetal nodes are fully provisioned and ready in the cluster. Here is an example of a cluster that has a baremetal node that had an error while provisioning:
[source, bash]
----
cnf@utility ~]$ oc get bmh -n openshift-machine-api
NAME STATE CONSUMER ONLINE ERROR AGE
ctrl-plane-0 unmanaged cnf-58879-master-0 true 33d
ctrl-plane-1 unmanaged cnf-58879-master-1 true 33d
ctrl-plane-2 unmanaged cnf-58879-master-2 true 33d
worker-0 unmanaged cnf-58879-worker-0-45879 true 33d
worker-1 unmanaged cnf-58879-worker-0-dszsh true 33d
----

Now verify that all cluster operators are ready:
[source, bash]
----
[cnf@utility ~]$ oc get co
NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE
authentication 4.12.45 True False False 17h
baremetal 4.12.45 True False False 32d
...
service-ca 4.12.45 True False False 32d
storage 4.12.45 True False False 32d
----
136 changes: 136 additions & 0 deletions documentation/modules/ROOT/pages/Upgrade-process-step-13.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,136 @@
= OCP Upgrade Process Flow - Continued
include::_attributes.adoc[]
:profile: core-lcm-lab

[#step-13-un-pause-worker]
== Step 13: Un-Pause the worker MCP(s)
Now you have gotten to the fun but sometimes long part of the upgrade process. Each of the worker nodes in the cluster will need to reboot to upgrade to the new EUS, Y-stream or Z-stream version.

You will need to determine how many MCPs you will want to upgrade at a time, depending on how many CNF pods can be taken down at a time and how your PDS and affinity are set up.

Here is a simple check and list of nodes with MCP:

[source, bash]
----
[cnf@utility ~]$ oc get mcp
NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE
master rendered-master-c9a52144456dbff9c9af9c5a37d1b614 True False False 3 3 3 0 36d
mcp-1 rendered-mcp-1-07fe50b9ad51fae43ed212e84e1dcc8e False False False 1 0 0 0 47h
mcp-2 rendered-mcp-2-07fe50b9ad51fae43ed212e84e1dcc8e False False False 1 0 0 0 47h
worker rendered-worker-f1ab7b9a768e1b0ac9290a18817f60f0 True False False 0 0 0 0 36d
[cnf@utility ~]$ oc get no
NAME STATUS ROLES AGE VERSION
ctrl-plane-0 Ready control-plane,master 36d v1.27.10+28ed2d7
ctrl-plane-1 Ready control-plane,master 36d v1.27.10+28ed2d7
ctrl-plane-2 Ready control-plane,master 36d v1.27.10+28ed2d7
worker-0 Ready mcp-1,worker 36d v1.25.14+a52e8df
worker-1 Ready mcp-2,worker 36d v1.25.14+a52e8df
[cnf@utility ~]$ oc get mcp -o json | jq -r '["MCP","Paused"], ["---","------"], (.items[] | [(.metadata.name), (.spec.paused)]) | @tsv' | grep -v worker
MCP Paused
--- ------
master false
mcp-1 true
mcp-2 true
----

Unpause a MCP with:
[source, bash]
----
[jcl@utility ~]$ oc patch mcp/mcp-1 --type merge --patch '{"spec":{"paused":false}}'

machineconfigpool.machineconfiguration.openshift.io/mcp-1 patched

[jcl@utility ~]$ oc get mcp -o json | jq -r '["MCP","Paused"], ["---","------"], (.items[] | [(.metadata.name), (.spec.paused)]) | @tsv' | grep -v worker

MCP Paused
--- ------
master false
mcp-1 false
mcp-2 true
----

As each MCP is complete, then you can unpause the next MCP.
[source, bash]
----
NAME STATUS ROLES AGE VERSION
ctrl-plane-0 Ready control-plane,master 36d v1.27.10+28ed2d7
ctrl-plane-1 Ready control-plane,master 36d v1.27.10+28ed2d7
ctrl-plane-2 Ready control-plane,master 36d v1.27.10+28ed2d7
worker-0 Ready mcp-1,worker 36d v1.27.10+28ed2d7
worker-1 NotReady,SchedulingDisabled mcp-2,worker 36d v1.25.14+a52e8df
----

[#step-14-verify-health]
== Step 14: Verify Health of Cluster

Here is a set of commands that you should run after upgrading the cluster to verify everything is back up and running properly:

* oc get clusterversion +
This should return showing the new cluster version and the “progressing” column should show “false”
* oc get node +
All nodes in the cluster should have a status of “ready” and should be at the same version
* oc get mcp -o json | jq -r '["MCP","Paused"], ["---","------"], (.items[] | [(.metadata.name), (.spec.paused)]) | @tsv' | grep -v worker +
This should show “false” for the paused column for all MCPs
* oc get co +
All cluster operators should show available = true, progressing = false & degraded = false
* oc get po -A | egrep -iv 'complete|running' +
This should return completely empty but you may show a few pods still moving around right after the upgrade. You may need to watch this for a while to make sure everything is clear.
[source, bash]
----
[jcl@utility ~]$ oc get clusterversion
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.14.11 True False 3d21h Cluster version is 4.14.11
[jcl@utility ~]$ oc get no
NAME STATUS ROLES AGE VERSION
ctrl-plane-0 Ready control-plane,master 39d v1.27.10+28ed2d7
ctrl-plane-1 Ready control-plane,master 39d v1.27.10+28ed2d7
ctrl-plane-2 Ready control-plane,master 39d v1.27.10+28ed2d7
worker-0 Ready mcp-1,worker 39d v1.27.10+28ed2d7
worker-1 Ready mcp-2,worker 39d v1.27.10+28ed2d7
[jcl@utility ~]$ oc get mcp -o json | jq -r '["MCP","Paused"], ["---","------"], (.items[] | [(.metadata.name), (.spec.paused)]) | @tsv' | grep -v worker
MCP Paused
--- ------
master false
mcp-1 false
mcp-2 false
[jcl@utility ~]$ oc get co
NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE
authentication 4.14.11 True False False 7d13h
baremetal 4.14.11 True False False 39d
cloud-controller-manager 4.14.11 True False False 39d
cloud-credential 4.14.11 True False False 39d
cluster-autoscaler 4.14.11 True False False 39d
config-operator 4.14.11 True False False 39d
console 4.14.11 True False False 38d
control-plane-machine-set 4.14.11 True False False 39d
csi-snapshot-controller 4.14.11 True False False 39d
dns 4.14.11 True False False 39d
etcd 4.14.11 True False False 39d
image-registry 4.14.11 True False False 39d
ingress 4.14.11 True False False 38d
insights 4.14.11 True False False 39d
kube-apiserver 4.14.11 True False False 39d
kube-controller-manager 4.14.11 True False False 39d
kube-scheduler 4.14.11 True False False 39d
kube-storage-version-migrator 4.14.11 True False False 3d18h
machine-api 4.14.11 True False False 39d
machine-approver 4.14.11 True False False 39d
machine-config 4.14.11 True False False 39d
marketplace 4.14.11 True False False 39d
monitoring 4.14.11 True False False 38d
network 4.14.11 True False False 39d
node-tuning 4.14.11 True False False 3d22h
openshift-apiserver 4.14.11 True False False 7d13h
openshift-controller-manager 4.14.11 True False False 39d
openshift-samples 4.14.11 True False False 3d22h
operator-lifecycle-manager 4.14.11 True False False 39d
operator-lifecycle-manager-catalog 4.14.11 True False False 39d
operator-lifecycle-manager-packageserver 4.14.11 True False False 39d
service-ca 4.14.11 True False False 39d
storage 4.14.11 True False False 39d
[jcl@utility ~]$ oc get po -A | egrep -iv 'complete|running'
NAMESPACE NAME READY STATUS RESTARTS AGE
----
Loading

0 comments on commit cc87b07

Please sign in to comment.