Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adding livenessProbe and readinessProbe in deployment.yaml for porch … #166

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

mansoor17syed
Copy link
Contributor

…server

@nephio-prow nephio-prow bot requested review from efiacor and liamfallon January 7, 2025 15:31
Copy link
Contributor

@Catalin-Stratulat-Ericsson Catalin-Stratulat-Ericsson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Looks good but if im not mistaken this will only add the readiness probe to the development deployment of porch and not to the deployment of porch in catalog.
  2. Changes made in the Yaml files for deploying porch should be replicated across to the catalog porch deployment here to match https://github.com/nephio-project/catalog/tree/main/nephio/core/porch and here also https://github.com/nephio-project/catalog/tree/main/nephio/optional/porch-cert-manager-webhook
  3. To explain basically there are 3 porch deployment packages. The one used for the development deployment which you are changing here, the one used by the typical user of porch which is gotten from the catalog in /main/nephio/core/porch and lastly in /main/nephio/optional/porch-cert-manager-webhook which is a porch deployment using cert manager to handle some certificates.
  4. its likely that at the very least a new PR needs to be created for adding those changes there along with this. or if the liveliness probe is not required in the development deployment of porch this PR could possibly be closed and only the catalog PR to be merged.
  5. Here is an example of a PR where i had made changes to the RBAC in the yaml of the deployment and had to replicate those changes in a different PR for the catalog deployment files to match. PR#126

@mansoor17syed mansoor17syed requested a review from kispaljr January 8, 2025 11:34
@kispaljr
Copy link
Collaborator

kispaljr commented Jan 10, 2025

  1. Changes made in the Yaml files for deploying porch should be replicated across to the catalog porch deployment here to match https://github.com/nephio-project/catalog/tree/main/nephio/core/porch and here also https://github.com/nephio-project/catalog/tree/main/nephio/optional/porch-cert-manager-webhook

Yes, that is unfortunately true for now.

IMHO this should be automated. It would be nice to have a GitHub action doing this automatically, maybe something similar to the "release" action, that generates a deployment kpt package as a release asset:

run: PATH=./bin:$PATH IMAGE_REPO=docker.io/nephio IMAGE_TAG=${{ github.ref_name }} make deployment-config

@mansoor17syed
Copy link
Contributor Author

Hi Team,

As per our discussion and suggestion, I have pushed the changes to the catalog repository. Here is the PR link for your reference:
nephio-project/catalog#92

Copy link
Member

@liamfallon liamfallon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

@nephio-prow nephio-prow bot added the approved label Jan 15, 2025
@kispaljr
Copy link
Collaborator

  1. its likely that at the very least a new PR needs to be created for adding those changes there along with this. or if the liveliness probe is not required in the development deployment of porch this PR could possibly be closed and only the catalog PR to be merged.

Here is the related PR to the catalog: nephio-project/catalog#92

@kushnaidu
Copy link
Contributor

/approve

Copy link
Contributor

nephio-prow bot commented Jan 15, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: kushnaidu, liamfallon, mansoor17syed

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@mansoor17syed
Copy link
Contributor Author

Hi Team,
Could you please share your feedback or advise on the next steps to move forward?

@kispaljr
Copy link
Collaborator

kispaljr commented Feb 26, 2025

TL;DR: adding this readiness probe to the porch-server effectively prevents us to deploy porch to a Kubernetes cluster older than v1.29.

As discussed here, I believe the probes are correct, however we ran into the following problem while using these probes:
On one hand, Kubernetes clusters older than version 1.29 doesn't have PriorityLevelConfiguration and FlowSchema resources, since flow control was introduced in 1.29. On the other hand, porch-server is currently built with go-client version 1.30, that tries to cache these resources, but this obviously fails for the old clusters. Normally this is harmless, other than leaving annoying errors in the log. However it makes the /readyz check to fail (and return with 500 response code). E.g.:

$ kubectl port-forward -n porch-system pod/porch-server-85bd69c887-lbnpg 8888:4443 &
$ curl -k https://localhost:8888/readyz
[+]ping ok
[+]log ok
[+]poststarthook/start-apiserver-admission-initializer ok
[+]poststarthook/generic-apiserver-start-informers ok
[-]informer-sync failed: reason withheld
[+]poststarthook/priority-and-fairness-config-consumer ok
[+]poststarthook/priority-and-fairness-filter ok
[+]poststarthook/storage-object-count-tracker-hook ok
[+]poststarthook/start-sample-server-informers ok
[+]shutdown ok
readyz check failed

after some investigation:

$ TOKEN=$(kubectl get secret -n kube-system sysadmin-token -o jsonpath='{.data.token}' | base64 --decode)
$ curl -k https://localhost:8888/readyz/informer-sync --header "Authorization: Bearer $TOKEN" 
internal server error: 2 informers not started yet: [*v1.FlowSchema *v1.PriorityLevelConfiguration]

we can see that the failure is indeed caused by the missing flow control resources.

@liamfallon @efiacor @vjayaramrh @gvbalaji @mansoor17syed
All-in-all this makes the minimum required K8s cluster version of porch to be v1.29.
Are we OK with this limitation?
(I am OK with this, since there is an easy workaround for those old clusters: remove the readiness probe.)

@efiacor
Copy link
Collaborator

efiacor commented Feb 26, 2025

TL;DR: adding this readiness probe to the porch-server effectively prevents us to deploy porch to a Kubernetes cluster older than v1.29.

As discussed here, I believe the probes are correct, however we ran into the following problem while using these probes: On one hand, Kubernetes clusters older than version 1.29 doesn't have PriorityLevelConfiguration and FlowSchema resources, since flow control was introduced in 1.29. On the other hand, porch-server is currently built with go-client version 1.30, that tries to cache these resources, but this obviously fails for the old clusters. Normally this is harmless, other than leaving annoying errors in the log. However it makes the /readyz check to fail (and return with 500 response code). E.g.:

$ kubectl port-forward -n porch-system pod/porch-server-85bd69c887-lbnpg 8888:4443 &
$ curl -k https://localhost:8888/readyz
[+]ping ok
[+]log ok
[+]poststarthook/start-apiserver-admission-initializer ok
[+]poststarthook/generic-apiserver-start-informers ok
[-]informer-sync failed: reason withheld
[+]poststarthook/priority-and-fairness-config-consumer ok
[+]poststarthook/priority-and-fairness-filter ok
[+]poststarthook/storage-object-count-tracker-hook ok
[+]poststarthook/start-sample-server-informers ok
[+]shutdown ok
readyz check failed

after some investigation:

$ TOKEN=$(kubectl get secret -n kube-system sysadmin-token -o jsonpath='{.data.token}' | base64 --decode)
$ curl -k https://localhost:8888/readyz/informer-sync --header "Authorization: Bearer $TOKEN" 
internal server error: 2 informers not started yet: [*v1.FlowSchema *v1.PriorityLevelConfiguration]

we can see that the failure is indeed caused by the missing flow control resources.

@liamfallon @efiacor @vjayaramrh @gvbalaji @mansoor17syed All-in-all this makes the minimum required K8s cluster version of porch to be v1.29. Are we OK with this limitation? (I am OK with this, since there is an easy workaround for those old clusters: remove the readiness probe.)

Hi @kispaljr Thanks for taking a closer look at this. I support mandating a k8s version >= 1.29 for sure. I think that should be ok in the catalog pkg also as I think we mention some support matrix for k8s in the docs/sandbox setup somewhere.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants