feat: use 4h timeout default rh-advisories and rh-push-to-registry-redhat-io pipelines #792

creydr · 2025-01-28T08:00:16Z

By default the task timeout in the release pipelines are 2h. This is especially short for larger applications without having RELEASE-1291 in place.
This PR sets the timeout for the tasks in the rh-advisories pipeline to 4h (what we've done for the OpenShift Serverless release).
Also updated rh-push-to-registry-redhat-io as you try to keep those pipelines in sync.

An alternative would be to set the 4h timeout globally for all tasks in https://github.com/redhat-appstudio/infra-deployments

openshift-ci · 2025-01-28T08:00:28Z

Hi @creydr. Thanks for your PR.

I'm waiting for a konflux-ci member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

creydr · 2025-01-28T08:00:38Z

/cc @ralphbean

mmalina · 2025-01-28T08:13:27Z

/ok-to-test

mmalina · 2025-01-28T08:15:05Z

This is especially short for larger applications with having RELEASE-1291 in place.

You mean "without having RELEASE-1291 in place", right?

It's fine with me, let's see what Johnny thinks. But could you please do the same for the push-to-registry-redhat-io pipeline? We try to keep them in sync with the exception of having/not-having the advisory stuff.

Also, you'll need to fix your commit message - see here: https://github.com/konflux-ci/release-service-catalog/blob/development/CONTRIBUTING.md#commit-message-formatting-and-standards

creydr · 2025-01-28T11:07:58Z

This is especially short for larger applications with having RELEASE-1291 in place.

You mean "without having RELEASE-1291 in place", right?

Oh, yes. Updated the PR description

It's fine with me, let's see what Johnny thinks. But could you please do the same for the push-to-registry-redhat-io pipeline? We try to keep them in sync with the exception of having/not-having the advisory stuff.

Updated rh-push-to-registry-redhat-io as well

Also, you'll need to fix your commit message - see here: https://github.com/konflux-ci/release-service-catalog/blob/development/CONTRIBUTING.md#commit-message-formatting-and-standards

Updated

ralphbean · 2025-01-28T16:07:17Z

/ok-to-test

johnbieren · 2025-01-28T16:19:04Z

I don't really get doing this for all tasks. Can you explain? For example, I would never expect verify-access-to-resources to not pass after 5 mins but pass after 200 minutes. Also, gitlint is failing. I think it is due to the capital U in Use in the commit title

davidmogar · 2025-01-28T16:21:03Z

Agree with Johnny. Having a general timeout sounds bad to me to be honest. Maybe timeouts have to be raised but they shouldn't be treated in the same way.

mmalina · 2025-01-28T20:07:17Z

I don't really get doing this for all tasks. Can you explain? For example, I would never expect verify-access-to-resources to not pass after 5 mins but pass after 200 minutes. Also, gitlint is failing. I think it is due to the capital U in Use in the commit title

The current default is 2 hours. You could say the same today - that 2 hours don't make sense in some cases. Raising all of them is just the easiest way to improve the situation - otherwise it's a lot more work to analyse how much is reasonable for each task. Besides, as was discussed in Slack recently, the main issue is currently that taskruns can be waiting for the PV to be freed up for a long time and this will eat away from the timeout value. So it doesn't matter if a task itself never takes more than 5 minutes - it can still time out if some other task blocks it (uses the shared PV). All of this is meant as a temporary measure until that thing is addressed. Of course in the case of verify-access-to-resources you could argue that it runs before any of the time consuming tasks start, so it's not likely that anything else will ever block it, but again, in that case you would have to analyse all the tasks to decide where it makes sense or not.

mmalina · 2025-01-28T20:13:27Z

/ok-to-test

johnbieren · 2025-01-28T20:33:27Z

I don't really get doing this for all tasks. Can you explain? For example, I would never expect verify-access-to-resources to not pass after 5 mins but pass after 200 minutes. Also, gitlint is failing. I think it is due to the capital U in Use in the commit title

The current default is 2 hours. You could say the same today - that 2 hours don't make sense in some cases. Raising all of them is just the easiest way to improve the situation - otherwise it's a lot more work to analyse how much is reasonable for each task. Besides, as was discussed in Slack recently, the main issue is currently that taskruns can be waiting for the PV to be freed up for a long time and this will eat away from the timeout value. So it doesn't matter if a task itself never takes more than 5 minutes - it can still time out if some other task blocks it (uses the shared PV). All of this is meant as a temporary measure until that thing is addressed. Of course in the case of verify-access-to-resources you could argue that it runs before any of the time consuming tasks start, so it's not likely that anything else will ever block it, but again, in that case you would have to analyse all the tasks to decide where it makes sense or not.

We do not have 2 hour timeouts set for every task in our pipeline definitions. If we did, then the diff would just be changing a 2 to a 4. So I disagree on that. But I do agree that setting a 2 hour timeout for most of the tasks makes no sense. Nowhere in the commit or PR does it say this is a temporary workaround. So, for those reasons, I did not approve.

konflux-ci-qe-bot · 2025-01-28T20:35:21Z

@creydr: The following test has Failed, say /retest to rerun failed tests.

PipelineRun Name	Status	Rerun command	Build Log	Test Log
`konflux-e2e-tests-catalog-44tht`	Failed	`/retest`	View Pipeline Log	View Test Logs

Inspecting Test Artifacts

To inspect your test artifacts, follow these steps:

Install ORAS (see the ORAS installation guide).
Download artifacts with the following commands:

mkdir -p oras-artifacts
cd oras-artifacts
oras pull quay.io/konflux-test-storage/konflux-team/release-service-catalog:konflux-e2e-tests-catalog-44tht

Test results analysis

🚨 Failed to provision a cluster, see the log for more details:

Click to view logs

INFO: Log in to your Red Hat account... INFO: Configure AWS Credentials... WARN: The current version (1.2.47) is not up to date with latest rosa cli released version (1.2.49). WARN: It is recommended that you update to the latest version. INFO: Logged in as 'konflux-ci-418295695583' on 'https://api.openshift.com' INFO: Create ROSA with HCP cluster... WARN: The current version (1.2.47) is not up to date with latest rosa cli released version (1.2.49). WARN: It is recommended that you update to the latest version. INFO: Creating cluster 'kx-d0b8565e81' INFO: To view a list of clusters and their status, run 'rosa list clusters' INFO: Cluster 'kx-d0b8565e81' has been created. INFO: Once the cluster is installed you will need to add an Identity Provider before you can login into the cluster. See 'rosa create idp --help' for more information.

Name: kx-d0b8565e81
Domain Prefix: kx-d0b8565e81
Display Name: kx-d0b8565e81
ID: 2gjeqaouo8nd0q86u8e0hi6s5l6p4qk6
External ID: caf1922d-5bdb-458b-97f1-f6abf53f9c46
Control Plane: ROSA Service Hosted
OpenShift Version: 4.15.43
Channel Group: stable
DNS: Not ready
AWS Account: 418295695583
AWS Billing Account: 418295695583
API URL:
Console URL:
Region: us-east-1
Availability:

Control Plane: MultiAZ
Data Plane: SingleAZ

Nodes:

Compute (desired): 3
Compute (current): 0
Network:
Type: OVNKubernetes
Service CIDR: 172.30.0.0/16
Machine CIDR: 10.0.0.0/16
Pod CIDR: 10.128.0.0/14
Host Prefix: /23
Subnets: subnet-05b9daa0609597f68, subnet-04cf6376374bf9e09
EC2 Metadata Http Tokens: optional
Role (STS) ARN: arn:aws:iam::418295695583:role/ManagedOpenShift-HCP-ROSA-Installer-Role
Support Role ARN: arn:aws:iam::418295695583:role/ManagedOpenShift-HCP-ROSA-Support-Role
Instance IAM Roles:
Worker: arn:aws:iam::418295695583:role/ManagedOpenShift-HCP-ROSA-Worker-Role
Operator IAM Roles:
arn:aws:iam::418295695583:role/rosa-hcp-openshift-image-registry-installer-cloud-credentials
arn:aws:iam::418295695583:role/rosa-hcp-openshift-ingress-operator-cloud-credentials
arn:aws:iam::418295695583:role/rosa-hcp-kube-system-kms-provider
arn:aws:iam::418295695583:role/rosa-hcp-kube-system-kube-controller-manager
arn:aws:iam::418295695583:role/rosa-hcp-kube-system-capa-controller-manager
arn:aws:iam::418295695583:role/rosa-hcp-kube-system-control-plane-operator
arn:aws:iam::418295695583:role/rosa-hcp-openshift-cluster-csi-drivers-ebs-cloud-credentials
arn:aws:iam::418295695583:role/rosa-hcp-openshift-cloud-network-config-controller-cloud-credent
Managed Policies: Yes
State: waiting (Waiting for user action)
Private: No
Delete Protection: Disabled
Created: Jan 28 2025 20:16:48 UTC
User Workload Monitoring: Enabled
Details Page: https://console.redhat.com/openshift/details/s/2sGwqKW71ErCwf7CPvomtShzu0g
OIDC Endpoint URL: https://oidc.op1.openshiftapps.com/2du11g36ejmoo4624pofphlrgf4r9tf3 (Managed)
Etcd Encryption: Disabled
Audit Log Forwarding: Disabled
External Authentication: Disabled
Zero Egress: Disabled

INFO: Preparing to create operator roles.
INFO: Operator Roles already exists
INFO: Preparing to create OIDC Provider.
INFO: OIDC provider already exists
INFO: To determine when your cluster is Ready, run 'rosa describe cluster -c kx-d0b8565e81'.
INFO: To watch your cluster installation logs, run 'rosa logs install -c kx-d0b8565e81 --watch'.
INFO: Track the progress of the cluster creation...
WARN: The current version (1.2.47) is not up to date with latest rosa cli released version (1.2.49).
WARN: It is recommended that you update to the latest version.
�[0;33mW:�[m Region flag will be removed from this command in future versions
INFO: Cluster 'kx-d0b8565e81' is in waiting state waiting for installation to begin. Logs will show up within 5 minutes
0001-01-01 00:00:00 +0000 UTC hostedclusters kx-d0b8565e81 Version
2025-01-28 20:21:47 +0000 UTC hostedclusters kx-d0b8565e81 ValidAWSIdentityProvider StatusUnknown
2025-01-28 20:21:50 +0000 UTC hostedclusters kx-d0b8565e81 Condition not found in the CVO.
2025-01-28 20:21:50 +0000 UTC hostedclusters kx-d0b8565e81 Condition not found in the CVO.
2025-01-28 20:21:50 +0000 UTC hostedclusters kx-d0b8565e81 The hosted control plane is not found
2025-01-28 20:21:50 +0000 UTC hostedclusters kx-d0b8565e81 The hosted control plane is not found
2025-01-28 20:21:50 +0000 UTC hostedclusters kx-d0b8565e81 The hosted control plane is not found
2025-01-28 20:21:50 +0000 UTC hostedclusters kx-d0b8565e81 The hosted control plane is not found
2025-01-28 20:21:50 +0000 UTC hostedclusters kx-d0b8565e81 The hosted control plane is not found
2025-01-28 20:21:50 +0000 UTC hostedclusters kx-d0b8565e81 Condition not found in the CVO.
2025-01-28 20:21:50 +0000 UTC hostedclusters kx-d0b8565e81 Waiting for hosted control plane to be healthy
2025-01-28 20:21:50 +0000 UTC hostedclusters kx-d0b8565e81 Condition not found in the CVO.
2025-01-28 20:21:50 +0000 UTC hostedclusters kx-d0b8565e81 Condition not found in the CVO.
2025-01-28 20:21:50 +0000 UTC hostedclusters kx-d0b8565e81 The hosted control plane is not found
2025-01-28 20:21:50 +0000 UTC hostedclusters kx-d0b8565e81 Ignition server deployment not found
2025-01-28 20:21:50 +0000 UTC hostedclusters kx-d0b8565e81 Configuration passes validation
2025-01-28 20:21:50 +0000 UTC hostedclusters kx-d0b8565e81 HostedCluster is supported by operator configuration
2025-01-28 20:21:50 +0000 UTC hostedclusters kx-d0b8565e81 Release image is valid
2025-01-28 20:21:50 +0000 UTC hostedclusters kx-d0b8565e81 The hosted control plane is not found
2025-01-28 20:21:50 +0000 UTC hostedclusters kx-d0b8565e81 Reconciliation active on resource
2025-01-28 20:21:52 +0000 UTC hostedclusters kx-d0b8565e81 Required platform credentials are found
2025-01-28 20:21:52 +0000 UTC hostedclusters kx-d0b8565e81 failed to get referenced secret ocm-production-2gjeqaouo8nd0q86u8e0hi6s5l6p4qk6/cluster-api-cert: Secret "cluster-api-cert" not found
2025-01-28 20:21:52 +0000 UTC hostedclusters kx-d0b8565e81 HostedCluster is at expected version
2025-01-28 20:23:27 +0000 UTC hostedclusters kx-d0b8565e81 OIDC configuration is valid
2025-01-28 20:23:27 +0000 UTC hostedclusters kx-d0b8565e81 Reconciliation completed successfully
2025-01-28 20:23:28 +0000 UTC hostedclusters kx-d0b8565e81 WebIdentityErr
2025-01-28 20:23:29 +0000 UTC hostedclusters kx-d0b8565e81 All is well
2025-01-28 20:23:29 +0000 UTC hostedclusters kx-d0b8565e81 lookup api.kx-d0b8565e81.4we6.p3.openshiftapps.com on 172.30.0.10:53: no such host
2025-01-28 20:23:29 +0000 UTC hostedclusters kx-d0b8565e81 capi-provider deployment has 1 unavailable replicas
2025-01-28 20:23:29 +0000 UTC hostedclusters kx-d0b8565e81 Configuration passes validation
2025-01-28 20:23:29 +0000 UTC hostedclusters kx-d0b8565e81 AWS KMS is not configured
2025-01-28 20:23:29 +0000 UTC hostedclusters kx-d0b8565e81 EtcdAvailable StatefulSetNotFound
2025-01-28 20:23:29 +0000 UTC hostedclusters kx-d0b8565e81 Kube APIServer deployment not found
2025-01-28 20:23:37 +0000 UTC hostedclusters kx-d0b8565e81 All is well
2025-01-28 20:24:37 +0000 UTC hostedclusters kx-d0b8565e81 EtcdAvailable QuorumAvailable
2025-01-28 20:25:41 +0000 UTC hostedclusters kx-d0b8565e81 Kube APIServer deployment is available
2025-01-28 20:25:49 +0000 UTC hostedclusters kx-d0b8565e81 All is well
2025-01-28 20:26:28 +0000 UTC hostedclusters kx-d0b8565e81 The hosted control plane is available
INFO: Cluster 'kx-d0b8565e81' is now ready
INFO: ROSA with HCP cluster is ready, create a cluster admin account for accessing the cluster
WARN: The current version (1.2.47) is not up to date with latest rosa cli released version (1.2.49).
WARN: It is recommended that you update to the latest version.
INFO: Storing login command...
INFO: Check if it's able to login to OCP cluster...
Retried 1 times...
Retried 2 times...
Retried 3 times...
INFO: Check if apiserver is ready...
[INFO] Checking cluster operators' status...
[INFO] Attempt 1/10
NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE
console
csi-snapshot-controller 4.15.43 True False False 4m39s
dns 4.15.43 False False True 4m39s DNS "default" is unavailable.
image-registry False True True 4m Available: The deployment does not have available replicas...
ingress False True True 4m5s The "default" ingress controller reports Available=False: IngressControllerUnavailable: One or more status conditions indicate unavailable: DeploymentAvailable=False (DeploymentUnavailable: The deployment has Available status condition set to False (reason: MinimumReplicasUnavailable) with message: Deployment does not have minimum availability.)
insights
kube-apiserver 4.15.43 True False False 4m28s
kube-controller-manager 4.15.43 True False False 4m28s
kube-scheduler 4.15.43 True False False 4m28s
kube-storage-version-migrator
monitoring
network 4.15.43 True True False 4m13s DaemonSet "/openshift-multus/multus-additional-cni-plugins" is not available (awaiting 2 nodes)...
node-tuning 4.15.43 True True False 35s Waiting for 2/3 Profiles to be applied
openshift-apiserver 4.15.43 True False False 4m28s
openshift-controller-manager 4.15.43 True False False 4m28s
openshift-samples
operator-lifecycle-manager 4.15.43 True False False 4m30s
operator-lifecycle-manager-catalog 4.15.43 True False False 4m25s
operator-lifecycle-manager-packageserver 4.15.43 True False False 4m28s
service-ca
storage 4.15.43 True False False 23s
[INFO] Cluster operators are accessible.
[INFO] Waiting for cluster operators to be in 'Progressing=false' state...
clusteroperator.config.openshift.io/console condition met
clusteroperator.config.openshift.io/csi-snapshot-controller condition met
clusteroperator.config.openshift.io/dns condition met
clusteroperator.config.openshift.io/image-registry condition met
clusteroperator.config.openshift.io/ingress condition met
clusteroperator.config.openshift.io/insights condition met
clusteroperator.config.openshift.io/kube-apiserver condition met
clusteroperator.config.openshift.io/kube-controller-manager condition met
clusteroperator.config.openshift.io/kube-scheduler condition met
clusteroperator.config.openshift.io/kube-storage-version-migrator condition met
clusteroperator.config.openshift.io/monitoring condition met
clusteroperator.config.openshift.io/network condition met
clusteroperator.config.openshift.io/node-tuning condition met
clusteroperator.config.openshift.io/openshift-apiserver condition met
clusteroperator.config.openshift.io/openshift-controller-manager condition met
clusteroperator.config.openshift.io/openshift-samples condition met
clusteroperator.config.openshift.io/operator-lifecycle-manager condition met
clusteroperator.config.openshift.io/operator-lifecycle-manager-catalog condition met
clusteroperator.config.openshift.io/operator-lifecycle-manager-packageserver condition met
clusteroperator.config.openshift.io/service-ca condition met
clusteroperator.config.openshift.io/storage condition met

…egistry-rh-io pipelines Explicitly set the timeout for taskruns in the rh-advisories and rh-push-to-registry-redhat-io pipelines to 4h and thus override the cluster default of (currently) 2h. This is especially helpful for larger components which are running into issues related to RELEASE-1291. Signed-off-by: Christoph Stäbler <[email protected]>

openshift-ci · 2025-01-29T07:22:54Z

New changes are detected. LGTM label has been removed.

mmalina · 2025-01-29T08:42:05Z

We do not have 2 hour timeouts set for every task in our pipeline definitions. If we did, then the diff would just be changing a 2 to a 4. So I disagree on that. But I do agree that setting a 2 hour timeout for most of the tasks makes no sense.

What I meant is that 2 hours is the cluster default, so we currently have 2 hours even for tasks that are not expected to take more than a minute. But I guess that won't change your stance :)

Nowhere in the commit or PR does it say this is a temporary workaround. So, for those reasons, I did not approve.

I think it's implied by mentioning that this is needed because RELEASE-1291 is not fixed. But I might be wrong.

creydr requested a review from a team as a code owner January 28, 2025 08:00

openshift-ci bot added the needs-ok-to-test label Jan 28, 2025

openshift-ci bot requested a review from ralphbean January 28, 2025 08:00

openshift-ci bot added ok-to-test and removed needs-ok-to-test labels Jan 28, 2025

creydr force-pushed the use-4h-timeout-default-rh-advisories-pipeline branch from 3235f13 to b051eb7 Compare January 28, 2025 11:07

creydr changed the title ~~Use 4h timeout default rh-advisories pipeline~~ feat: Use 4h timeout default rh-advisories and rh-push-to-registry-redhat-io pipelines Jan 28, 2025

johnbieren force-pushed the use-4h-timeout-default-rh-advisories-pipeline branch from b051eb7 to 3142d3d Compare January 28, 2025 16:19

mmalina previously approved these changes Jan 28, 2025

View reviewed changes

openshift-ci bot assigned mmalina Jan 28, 2025

openshift-ci bot added the lgtm label Jan 28, 2025

creydr dismissed mmalina’s stale review via 4f17876 January 29, 2025 07:22

creydr force-pushed the use-4h-timeout-default-rh-advisories-pipeline branch from 3142d3d to 4f17876 Compare January 29, 2025 07:22

openshift-ci bot removed the lgtm label Jan 29, 2025

creydr changed the title ~~feat: Use 4h timeout default rh-advisories and rh-push-to-registry-redhat-io pipelines~~ feat: use 4h timeout default rh-advisories and rh-push-to-registry-redhat-io pipelines Jan 29, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: use 4h timeout default rh-advisories and rh-push-to-registry-redhat-io pipelines #792

feat: use 4h timeout default rh-advisories and rh-push-to-registry-redhat-io pipelines #792

creydr commented Jan 28, 2025 •

edited

Loading

openshift-ci bot commented Jan 28, 2025

creydr commented Jan 28, 2025

mmalina commented Jan 28, 2025

mmalina commented Jan 28, 2025 •

edited

Loading

creydr commented Jan 28, 2025

ralphbean commented Jan 28, 2025

johnbieren commented Jan 28, 2025

davidmogar commented Jan 28, 2025

mmalina commented Jan 28, 2025

mmalina commented Jan 28, 2025

johnbieren commented Jan 28, 2025

konflux-ci-qe-bot commented Jan 28, 2025

openshift-ci bot commented Jan 29, 2025

mmalina commented Jan 29, 2025

feat: use 4h timeout default rh-advisories and rh-push-to-registry-redhat-io pipelines #792

Are you sure you want to change the base?

feat: use 4h timeout default rh-advisories and rh-push-to-registry-redhat-io pipelines #792

Conversation

creydr commented Jan 28, 2025 • edited Loading

openshift-ci bot commented Jan 28, 2025

creydr commented Jan 28, 2025

mmalina commented Jan 28, 2025

mmalina commented Jan 28, 2025 • edited Loading

creydr commented Jan 28, 2025

ralphbean commented Jan 28, 2025

johnbieren commented Jan 28, 2025

davidmogar commented Jan 28, 2025

mmalina commented Jan 28, 2025

mmalina commented Jan 28, 2025

johnbieren commented Jan 28, 2025

konflux-ci-qe-bot commented Jan 28, 2025

Inspecting Test Artifacts

Test results analysis

openshift-ci bot commented Jan 29, 2025

mmalina commented Jan 29, 2025

creydr commented Jan 28, 2025 •

edited

Loading

mmalina commented Jan 28, 2025 •

edited

Loading