Skip to content

Commit

Permalink
Merge pull request #4 from rfisher001/lab-4.12To4.14
Browse files Browse the repository at this point in the history
Updated introductory paragraphs and added them to navigation
  • Loading branch information
rfisher001 authored Dec 20, 2023
2 parents cd450ab + 2e19fc3 commit 0cfc96b
Show file tree
Hide file tree
Showing 4 changed files with 47 additions and 25 deletions.
12 changes: 12 additions & 0 deletions documentation/modules/ROOT/nav.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -10,3 +10,15 @@
** xref:API-Compatibility.adoc#k8s-skew[Kubernetes Version Skew]
** xref:API-Compatibility.adoc#ocp-upgrade-path[OpenShift Upgrade Path]
* xref:CNF-Upgrade-Prep.adoc[CNF Upgrade Preparation]
** xref:CNF-Upgrade-Prep.adoc#life-of-a-pod[Life of a POD]
** xref:CNF-Upgrade-Prep.adoc#cnf-req-doc[CNF Requirements Document]
** xref:CNF-Upgrade-Prep.adoc#pdb[POD Disruption Budget]
** xref:CNF-Upgrade-Prep.adoc#pod-anti-affinity[POD Anti-affinity]
* xref:OCP-upgrade-prep.adoc[OCP Upgrade Preparation]
** xref:OCP-upgrade-prep.adoc#firmware-compatibility[Firmware compatibility]
** xref:OCP-upgrade-prep.adoc#layer-product-compatibility[Layer product compatibility]
** xref:OCP-upgrade-prep.adoc#prepare-mcp[Prepare MCPs]
* xref:Applying-MCPs.adoc[Applying MCPs]
15 changes: 8 additions & 7 deletions documentation/modules/ROOT/pages/API-Compatibility.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -17,17 +17,18 @@ The easiest way verify your application functionality will still work, is to mak

[#ocp-upgrade-path]
== OpenShift Upgrade Path
Please also note that not all releases of OCP can be upgraded to any arbitrary Z-release even if they contain all of the required patches.
Can I choose any Z-release in the new EUS or Y-stream version? -NO
The new 4.Y+2(or+1).Z release needs to have the same patch level that your currency 4.Y.Z release has.

Why does 4.14.1 not have the same patches as 4.12.45?
All “new” patches are applied upstream first. This means that after 4.14.0 was release, 4.15 became the upstream version.
For example: the way that patches are applied to new and old releases is with new z-releases so a patch that is applied in X.Y+2.4 might have also been applied to X.Y.36.

OpenShift upgrade process mandates that:
If fix “A” is present in a specific X.Y.Z release of OCP
Then fix “A” MUST be present in the X.Y+1.Z release that OCP is upgraded TO

Consequence of the chosen destination version of 4.12.z defines which is the maximum version of OCP4.11.z, OCP4.10.z and OCP4.9.z
not all 4.9.z version will permit to upgrade to a given version of OCP4.12.z
A given version of OCP4.12.z will have requirements to a maximum version of OCP4.9z
This is due to how fixes are backported into older releases of OCP.

You can use the https://access.redhat.com/labs/ocpupgradegraph/update_path[upgrade graph tool] to determine if the path is valid for your z-release. You should also always verify with your Sales Engineer or Technical Account Manager at Red Hat to make sure the upgrade path is valid for Telco implementations.
You can use the upgrade graph tool to determine if the path is valid for your z-release. You should also always verify with your Sales Engineer or Technical Account Manager at Red Hat to make sure the upgrade path is valid for Telco implementations.

.K8s Version Skey
image::k8s-vers-skew.png[]
39 changes: 24 additions & 15 deletions documentation/modules/ROOT/pages/CNF-Upgrade-Prep.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -5,37 +5,46 @@ include::_attributes.adoc[]
The life of a POD is an important topic to understand. This section will describe several topics that are important to
keeping your CNF PODs healthy and allow the cluster to properly schedule them during an upgrade.

[#life-of-a-pod]
== Life of a POD
Why is this important?

Pods don’t move or reboot. Pods are deleted and a new pod takes its place.

There isn’t (or shouldn’t be) a single pod with it’s own set of specifications but instead it should have lots of other pods that are the exact same in a group, called a deployment. A deployment should spread the workload across all of the pods.

This is specified because we need to move away from the idea that each and every single pod needs to be cared for like it is that only strand holding things together. The old saying goes, a rope is made up of many strands, which is what makes it stronger than any single strand.

[#cnf-req-doc]
== CNF Requirements Document

Before you go any further, please read through the https://connect.redhat.com/sites/default/files/2022-05/Cloud%20Native%20Network%20Function%20Requirements%201-3.pdf[CNF requirements document].
In this section a few of the most important points will be discussed but the CNF Requirements Document has additional
detail and other important topics.

[#pdb]
== POD Disruption Budget

Each set of PODs in a deployment can be given a specific minimum number of PODs that should be running in order to keep
from disrupting the functionality of the CNF, thus called the POD disruption budget (PDB). However, this budget can be
improperly configured. +
For example, if you have 4 PODs in a deployment and your PDB is set to 4, this means that you are telling the scheduler
that you NEED 4 PODs running at all times. Therefore, in this scenario ZERO PODs can come down.
Each set of PODs in a deployment can be given a specific minimum number of PODs that should be running in order to keep from disrupting the functionality of the CNF, thus called the POD disruption budget (PDB). However, this budget can be improperly configured.

For example, if you have 4 PODs in a deployment and your PDB is set to 4, this means that you are telling the scheduler that you NEED 4 PODs running at all times. Therefore, in this scenario ZERO PODs can come down.

.Deployment with no PDB
image::../assets/images/PDB-full.jpg[]
image::PDB-full.jpg[]

To fix this, the PDB can be set to 2, letting 2 of the 4 PODs to be scheduled as down and this would then let the worker nodes where those PODs are located be rebooted.

To fix this, the PDB can be set to 2, letting 2 of the 4 pods to be scheduled as down and this would then let the worker
nodes where those PODs are located be rebooted.
This does NOT mean that your deployment will be running on only 2 pods for a period of time. This means that 2 new pods can be created to replace 2 current pods and there can be a short period of time as the new pods come online and the old pods are deleted.

.Deployment with PDB
image::../assets/images/PDB-down-2.jpg[]
image::PDB-down-2.jpg[]

[#pod-anti-affinity]
== POD Anti-affinity

True high availability requires a duplication of a process to be running on separate hardware, thus making sure that an
application will continue to run if one piece of hardware goes down. OpenShift can easily make that happen since
processes are automatically duplicated in separate PODs within a deployment. However, those PODs need to have
anti-affinity set on them so that they are NOT running on the same hardware. It so happens that anti-affinity also
helps during upgrades because it makes sure that PODs are on different worker nodes, therefore allowing enough PODs to
come down even after considering their PDB.
True high availability requires a duplication of a process to be running on separate hardware, thus making sure that an application will continue to run if one piece of hardware goes down. OpenShift can easily make that happen since processes are automatically duplicated in separate PODs within a deployment. However, those PODs need to have anti-affinity set on them so that they are NOT running on the same hardware.

During an upgrade anti-affinity is important so that there aren’t too many pods on a node when it is time for it to reboot. For example: if there are 4 pods from a single deployment on a node, and the PDB is set to only allow 1 pod be deleted at a time, then it will take 4 times a long for that node to reboot because it will be waiting on all 4 pods to be deleted.

== Liveness / Readiness Probes

Expand Down
6 changes: 3 additions & 3 deletions documentation/modules/ROOT/pages/OCP-upgrade-prep.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -45,13 +45,13 @@ section, below, for more details on the pause/un-pause process.

// insert image for MCP
.Worker node MCPs in a 5 rack cluster
image::../assets/images/5Rack-MCP.jpg[]
image::5Rack-MCP.jpg[]

The division and size of these MCPs can vary depending on many factors. In general the standard division is between 8
and 10 nodes per MCP to allow the operations team to control how many nodes are taken down at a time.

.Separate MCPs inside of a group of Load Balancer or purpose built nodes
image::../assets/images/LBorHT-MCP.jpg[]
image::LBorHT-MCP.jpg[]

In larger clusters there is quite often a need to separate out several nodes for purposes like Load Balancing or other
high throughput purposes, which usually have different machine sets to configure SR-IOV. In these cases we do not want
Expand All @@ -60,7 +60,7 @@ out into at least 3 different MCPs and unpause them individually.

// insert image for MCP
.Small cluster worker MCPs
image::../assets/images/Worker-MCP.jpg[]
image::Worker-MCP.jpg[]

Smaller cluster example with 1 rack

Expand Down

0 comments on commit 0cfc96b

Please sign in to comment.