STF 1.5.5 release ops #628

vkmc · 2024-08-26T16:05:13Z

Add a .gitleaks.toml file to avoid the false positive leak for the example certificate when deploying for Elasticsearch.

Update the check to use bool filter instead of a bar var. By default, ansible parses vars as strings, and without the | bool filter, this check is invalid, as it will always resolve to true, since it is a non-empty string. Other instances of the same check did this, but this one was missed.

* [allow_skip_clone] Use <repo>_dir instead of hardcoding all directories relative to base_dir This will allow configuration of the repo clone destination, so we can use pre-cloned dirs instead of explicitly cloning the dirs each time. This is essential for CI systems like zuul, that set-up the repos with particular versions/branches prior to running the test scripts. * [zuul] List the other infrawatch repos as required for the job * [zuul] Set the {sgo,sg-bridge,sg-core,prometheus-webhook-snmp}_dir vars Add in the repo dir locations where the repos should be pre-cloned by zuul * Replace base_dir with sto_dir * set sto_dir relative to base_dir is it isn't already set * [ci] use absolute dir for requirements.txt * [ci] Update sto_dir using explicit reference zuul.project.src_dir refers to the current project dir. When using the jobs in another infrawatch project, this becomes invalid. Instead, sto_dir is explicitly set using zuul.projects[<project_name>].src_dir, the same way that the other repo dirs are set in vars-zuul-common --------- Co-authored-by: Chris Sibbitt <[email protected]>

* Fix qdr auth one_time_upgrade label check * Fix incorrect variable naming on one_time_upgrade label check * Adjust QDR authentication password generation (#520) Adjust the passwords being generated for QDR authentication since certain characters (such as colon) will cause a failure in the parsing routine within qpid-dispatch. Updates the lookup function to only use ascii_letters and digits and increases the length to 32 characters. --------- Co-authored-by: Leif Madsen <[email protected]>

* [allow_skip_clone] Add docs for clone_repos and *_dir vars * Align README table column spacing (#516) * Align README table column spacing * Update build/stf-run-ci/README.md --------- Co-authored-by: Emma Foley <[email protected]> --------- Co-authored-by: Leif Madsen <[email protected]>

It appears that STO is not included explictly when running jobs from SGO [1]. This will be the case in all the other repos. This change explicitly add it, in case it's not already included by zuul. [1] https://review.rdoproject.org/zuul/build/edd8f17bfdac4360a94186b46c4cea3f

* QDR Auth in smoketest * Added qdr-test as a mock of the OSP-side QDR * Connection from qdr-test -> default-interconnect is TLS+Auth * Collectors point at qdr-test instead of default-interconnect directly * Much more realistic than the existing setup * Eliminated a substitution in sensubility config * Used default QDR basic auth in Jenkinsfile

* QDR Auth for infrared 17.1 script * Fix missing substitution for AMQP_PASS in infrared script

* [allow_skip_clone] Use <repo>_dir instead of hardcoding all directories relative to base_dir This will allow configuration of the repo clone destination, so we can use pre-cloned dirs instead of explicitly cloning the dirs each time. This is essential for CI systems like zuul, that set-up the repos with particular versions/branches prior to running the test scripts. * [zuul] List the other infrawatch repos as required for the job * [zuul] Set the {sgo,sg-bridge,sg-core,prometheus-webhook-snmp}_dir vars Add in the repo dir locations where the repos should be pre-cloned by zuul * Replace base_dir with sto_dir * set sto_dir relative to base_dir is it isn't already set * [ci] use absolute dir for requirements.txt * [ci] Update sto_dir using explicit reference zuul.project.src_dir refers to the current project dir. When using the jobs in another infrawatch project, this becomes invalid. Instead, sto_dir is explicitly set using zuul.projects[<project_name>].src_dir, the same way that the other repo dirs are set in vars-zuul-common * [zuul] Define a project template for stf-crc-jobs Instead of listing all the jobs for each preoject in-repo, and needing to update the list every time that a new job is added, the project template can be updated and the changes propogated to the other infrawatch projects * [zuul] don't enable using the template * Revert "[zuul] don't enable using the template" This reverts commit 56e2009. --------- Co-authored-by: Chris Sibbitt <[email protected]>

* Restart QDR after changing the password * Fixes bug reported here: #517 (comment) * Avoids an extra manual step when changing password * Would affect users who upgrade from earlier STF and subsequently enable basic auth * Also users who need to change their passwords * Fixing ansible lint * Update roles/servicetelemetry/tasks/component_qdr.yml * Adjust QDR restarts to account for HA * [smoketest] Wait for qdr-test to be Running * [smoketest] Wait for QDR password upgrade * Remove zuul QDR auth override

* Add crc_ocp_bundle value to select OCP version * zuul: add log collection post-task to get crc logs * Add ocp v13 and a timeout to the job

* Update README for 17.1 IR test Update the 17.1 infrared test script README to show how to deploy a virtualized workload on the deployed overcloud infrastructure. Helps with testing by providing additional telemetry to STF required in certain dashboards. * Update tests/infrared/17.1/README.md Co-authored-by: Chris Sibbitt <[email protected]> * Update tests/infrared/17.1/README.md --------- Co-authored-by: Chris Sibbitt <[email protected]>

Support STF 1.5.3 starting at OpenShift version 4.12 due to incompatibility with 4.11 due to dependency requirements. Our primary target is support of OCP EUS releases. Closes: STF-1632

The "Question the deployment" task didn't have ignore_errors: true set, so when the task fails, the play is finished. This means that we don't get to the "copy logs" task and can't see the job logs in zuul. ignore_errors is set to true to be consistent with other tasks

* update stf-collect-logs tasks * Update log path * solve log bugs in stf-run-ci tasks * create log directory

Adjust the operator package dependency requirements to align to known required versions. Primarily reduce the version of openshift-cert-manager from 1.10 to 1.7 in order to support the tech-preview channel which was previously used. Lowering the version requirement allows for the openshift-cert-manager-operator installed previously to be used during the STF 1.5.2 to 1.5.3 update, removing the update from being blocked. Related: STF-1636

Update the stf-run-ci base setup to no longer need testing against OCP 4.10 and earlier, meaning we can rely on a single workflow for installation. Also update the deployment to use cluster-observability-operator via the redhat-operators CatalogSource for installation via use_redhat and use_hybrid strategies.

* [zuul] Add job to build locally and do an index-based deployment

* Only require Interconnect and Smart Gateway Update the dependency management within Service Telemetry Operator to only require AMQ Interconnect and Smart Gateway Operator, which is enough to deploy STF with observabilityStrategy: none. Other Operators can be installed in order to satisfy data storage of telemetry and events. Installation of cert-manager is also required, but needs to be pre-installed similar to Cluster Observability Operator, either as a cluster-scoped operator with the tech-preview channel, or a single time on the cluster as a namespace scoped operator, which is how the stable-v1 channel installs. Documentation will be updated to adjust for this change. Related: STF-1636 * Perform CI update to match docs install changes (#542) * Perform CI update to match docs install changes Update the stf-run-ci scripting to match the documented installation procedures which landed in infrawatch/documentation#513. These changes are also reflected in #541. * Update build/stf-run-ci/tasks/setup_base.yml Co-authored-by: Emma Foley <[email protected]> --------- Co-authored-by: Emma Foley <[email protected]> * Also drop cert-manager project The cert-manager project gets created with workload items when deploying the cert-manager from the cert-manager-operator project. When removing cert-manager this project is not cleaned up, so we need to delete it as well. --------- Co-authored-by: Emma Foley <[email protected]>

…545) In [1], the validate_deployment step is successful, despite the deployment not being successful. This causes the job to timeout because the following steps continue to run despite an invalid state. To get the expected behaviour, the output should be checked for a string indicating success. i.e. * [info] CI Build complete. You can now run tests. [2] shows the output for a successful run. [1] https://review.rdoproject.org/zuul/build/245ae63e41884dc09353d938ec9058d7/console#5/0/144/controller [2] https://review.rdoproject.org/zuul/build/802432b23da24649b818985b7b1633bb/console#5/0/82/controller

* Implement dashboard management Implement a new configuration option graphing.grafana.dashboards.enabled which results in dashboards objects being created for the Grafana Operator. Previously loading dashboards would be done manually via 'oc apply' using instructions from documentation. The new CRD parameters to the ServiceTelemetry object allows the Service Telemetry Operator to now make the GrafanaDashboard objects directly. Related: OSPRH-825 * Drop unnecessary cluster roles * Update CSV for owned parameter

* Only openshift auth will be allowed

* This matches recent changes in prometheus[1] and grafana[2] [1] https://github.com/infrawatch/service-telemetry-operator/pull/549/files#diff-2cf84bcf66f12393c86949ec0d3f16c473a650173d55549bb02556d23aa22bd2R46 [2] https://github.com/infrawatch/service-telemetry-operator/pull/550/files#diff-ae71801975adb4f8dd4aa5479a66ad46e46f17de40f9d147b2e09e13ce26633eR45

This reverts commit 0f94fd5.

* Auth to prometheus using token instead of basicauth * Add present/absent logic to prometheus-reader resources * s/password/token in smoketest output * [zuul] Make nightly_bundles jobs non-voting (#551) --------- Co-authored-by: Emma Foley <[email protected]>

I think it got broken by an oops recently[1]. Since that change, working_branch (`branch` at that point) is never used because version_branches.sgo has a default value. This breaks the branch co-ordination in Jenkins[2] and in local testing[3]. [1] https://github.com/infrawatch/service-telemetry-operator/pull/512/files#diff-c073fe1e346d08112920aa0bbc8a7453bbd3032b7a9b09ae8cbc70df4db4ea2dR19 [2] https://github.com/infrawatch/service-telemetry-operator/blob/0f94fd577617aee6a85fc4141f98ebdfc49a9f92/Jenkinsfile#L157 [3] https://github.com/infrawatch/service-telemetry-operator/blob/0f94fd577617aee6a85fc4141f98ebdfc49a9f92/README.md?plain=1#L62

* This matches recent changes in prometheus[1] and grafana[2] [1] https://github.com/infrawatch/service-telemetry-operator/pull/549/files#diff-2cf84bcf66f12393c86949ec0d3f16c473a650173d55549bb02556d23aa22bd2R46 [2] https://github.com/infrawatch/service-telemetry-operator/pull/550/files#diff-ae71801975adb4f8dd4aa5479a66ad46e46f17de40f9d147b2e09e13ce26633eR45

The way we generate our CSVs uses OLM's skipRange functionality. This is fine, but using only this leads to older versions becoming unavailable after the fact -- see the warning at [1]. By adding an optional spec.replaces to our CSV we allow update testing as well as actual production updates for downstream builds that leverage it. Populating the field requires knowledge of the latest-released bundle, so we take it from an environment variable to be provided by the builder. If this is unset we don't include the spec.replaces field at all -- leaving previous behavior unchanged. Resolves #559 Related: STF-1658 [1] https://olm.operatorframework.io/docs/concepts/olm-architecture/operator-catalog/creating-an-update-graph/#skiprange

Add optional spec.replaces field to CSV for update graph compliance

softwarefactory-project-zuul · 2024-08-26T16:56:07Z

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/4af0663b90ec4a8e9f316436ba3e4cc6

❌ stf-crc-ocp_414-local_build RETRY_LIMIT in 7m 18s
❌ stf-crc-ocp_416-local_build FAILURE in 27m 49s
❌ stf-crc-ocp_414-local_build-index_deploy RETRY_LIMIT in 7m 24s
❌ stf-crc-ocp_416-local_build-index_deploy FAILURE in 26m 55s
❌ stf-crc-ocp_414-nightly_bundles-index_deploy RETRY_LIMIT in 7m 31s
✔️ stf-crc-ocp_416-nightly_bundles-index_deploy SUCCESS in 29m 05s

csibbitt · 2024-08-26T17:01:57Z

deploy/olm-catalog/service-telemetry-operator/Dockerfile.in

@@ -13,7 +13,7 @@ LABEL operators.operatorframework.io.metrics.mediatype.v1=metrics+v1
 LABEL operators.operatorframework.io.metrics.builder=operator-sdk-v0.19.4
 LABEL operators.operatorframework.io.metrics.project_layout=ansible
 LABEL com.redhat.delivery.operator.bundle=true
-LABEL com.redhat.openshift.versions="v4.12-v4.14"
+LABEL com.redhat.openshift.versions="v4.12-v4.16"


Is this supposed to be "v4.14-v4.16"? See also infrawatch/smart-gateway-operator#161 (comment)

csibbitt

I think the spurious extra commits that github shows are because the previous release prep branches were squashed before merging to stable-1.5.(For example ad468f2)
- The content from those commits is already on the branch, but the commits themselves are missing from the history, so github lists them.
- DO NOT SQUASH THIS PR when merging to stable-1.5 and this problem should go away for the future
Did a sanity-read on all content looking for version numbers going the wrong way, changes I couldn't explain, and reviewing things that didn't make immediate sense.
- Just one comment.
Verified that I can see expected changes pertaining to the major epics and bugfixes for 1.5.5
- IPv6 related (change to qdr port 5671 listen address and migration to scrapeconfigs)
- 4.16 support (new auth token, changes to zuul, Dockerfile(!) & Operator metadata + other misc changes)
- RBAC fixes (plural resource names and s/group/resourceAPIGroup/ in SARs)

I would be happy to approve if we sort out the OCP version range in the Dockerfile - either confirm it's intentional, or change it to match what we set in SGO.

vkmc · 2024-08-26T17:32:26Z

I think this was an intentional change at the moment of submitting it, since we had CI for OCP 4.12, OCP 4.14 and OCP 4.16. Now, we dropped OCP 4.12 CI. So it makes sense we also fix this. I will submit a fix for this.

softwarefactory-project-zuul · 2024-08-28T06:30:33Z

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/c84f93e1874a4967ae52bfe91cbb29e2

❌ stf-crc-ocp_414-local_build FAILURE in 26m 34s
❌ stf-crc-ocp_416-local_build FAILURE in 28m 04s
❌ stf-crc-ocp_414-local_build-index_deploy FAILURE in 25m 50s
❌ stf-crc-ocp_416-local_build-index_deploy FAILURE in 29m 42s
✔️ stf-crc-ocp_414-nightly_bundles-index_deploy SUCCESS in 32m 02s
✔️ stf-crc-ocp_416-nightly_bundles-index_deploy SUCCESS in 28m 10s

OCP 4.12 maintenance support has finished, so we should remove OCP 4.12 from the supported versions

softwarefactory-project-zuul · 2024-08-28T08:14:12Z

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/7c0f18a3eb1f4e5cbfe723b7469a5c75

❌ stf-crc-ocp_414-local_build FAILURE in 25m 53s
❌ stf-crc-ocp_416-local_build FAILURE in 28m 20s
❌ stf-crc-ocp_414-local_build-index_deploy POST_FAILURE in 25m 50s
❌ stf-crc-ocp_416-local_build-index_deploy FAILURE in 27m 24s
✔️ stf-crc-ocp_414-nightly_bundles-index_deploy SUCCESS in 35m 51s
✔️ stf-crc-ocp_416-nightly_bundles-index_deploy SUCCESS in 27m 54s

vkmc · 2024-08-29T05:13:41Z

recheck

softwarefactory-project-zuul · 2024-08-29T05:48:59Z

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/6d504a2a3a9d4b7782dc4ccfe6a572f5

❌ stf-crc-ocp_414-local_build FAILURE in 26m 00s
❌ stf-crc-ocp_416-local_build FAILURE in 28m 33s
❌ stf-crc-ocp_414-local_build-index_deploy FAILURE in 24m 39s
❌ stf-crc-ocp_416-local_build-index_deploy FAILURE in 27m 34s
✔️ stf-crc-ocp_414-nightly_bundles-index_deploy SUCCESS in 34m 23s
✔️ stf-crc-ocp_416-nightly_bundles-index_deploy SUCCESS in 29m 20s

vkmc · 2024-08-29T08:16:01Z

recheck

softwarefactory-project-zuul · 2024-08-29T09:12:30Z

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/86a14bd242824d4d99936b5e40ef3476

✔️ stf-crc-ocp_414-local_build SUCCESS in 35m 40s
✔️ stf-crc-ocp_416-local_build SUCCESS in 35m 42s
❌ stf-crc-ocp_414-local_build-index_deploy FAILURE in 43m 27s
❌ stf-crc-ocp_416-local_build-index_deploy FAILURE in 42m 47s
✔️ stf-crc-ocp_414-nightly_bundles-index_deploy SUCCESS in 32m 49s
✔️ stf-crc-ocp_416-nightly_bundles-index_deploy SUCCESS in 32m 03s

Set the BUNDLE_CHANNELS and BUNDLE_DEFAULT_CHANNEL when doing a local build. This is required when doing a deployment from index with local builds.

vkmc · 2024-08-29T10:54:20Z

recheck

softwarefactory-project-zuul · 2024-08-29T13:17:49Z

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/eb812f4cc5dd469d96b3a380629693ac

❌ stf-crc-ocp_414-local_build NODE_FAILURE Node request 100-0007556397 failed in 0s
❌ stf-crc-ocp_416-local_build NODE_FAILURE Node request 100-0007556398 failed in 0s
❌ stf-crc-ocp_414-local_build-index_deploy FAILURE in 41m 39s
❌ stf-crc-ocp_416-local_build-index_deploy FAILURE in 46m 02s
✔️ stf-crc-ocp_414-nightly_bundles-index_deploy SUCCESS in 35m 17s
✔️ stf-crc-ocp_416-nightly_bundles-index_deploy SUCCESS in 30m 19s

elfiesmelfie · 2024-08-29T14:14:04Z

build/Dockerfile

This might need updating

leifmadsen and others added 30 commits October 26, 2023 10:04

Add gitleaks.toml for rh-gitleaks (#510)

db1195a

Add a .gitleaks.toml file to avoid the false positive leak for the example certificate when deploying for Elasticsearch.

[stf-collect-logs] Move describe build|pod from ci/ to the role (#505)

9d7be76

QDR Auth for infrared 17.1 script (#517)

d12aa38

* QDR Auth for infrared 17.1 script * Fix missing substitution for AMQP_PASS in infrared script

[zuul] Add jobs to test with different versions of OCP (#432)

d3d8ee5

* Add crc_ocp_bundle value to select OCP version * zuul: add log collection post-task to get crc logs * Add ocp v13 and a timeout to the job

Support OCP v4.12 through v4.14 (#535)

cba3874

Support STF 1.5.3 starting at OpenShift version 4.12 due to incompatibility with 4.11 due to dependency requirements. Our primary target is support of OCP EUS releases. Closes: STF-1632

Mgirgisf/stf 1580/fix log commands (#526)

2fc9c6c

* update stf-collect-logs tasks * Update log path * solve log bugs in stf-run-ci tasks * create log directory

[zuul] Add job to build locally and do an index-based deployment (#495)

cd25646

* [zuul] Add job to build locally and do an index-based deployment

Remove basic-auth method from grafana (#550)

b29d023

* Only openshift auth will be allowed

Revert "Adjust Alertmanager SAR to be more specific"

28ce38e

This reverts commit 0f94fd5.

Merge pull request #560 from infrawatch/migarcia-bundle-spec-replaces

fe6c909

Add optional spec.replaces field to CSV for update graph compliance

Merge branch 'master' into release-prep-1.5.5

9b7880b

vkmc marked this pull request as ready for review August 26, 2024 16:05

vkmc mentioned this pull request Aug 26, 2024

STF 1.5.5 release ops infrawatch/sg-core#141

Merged

csibbitt reviewed Aug 26, 2024

View reviewed changes

csibbitt and others added 2 commits August 27, 2024 12:27

Check for COO API before trying to delete any COO objects (#631)

ee94763

Use openshift-ansible-operator 4.14 in STO (#630)

27cb54a

vkmc added 2 commits August 28, 2024 08:51

Drop OCP 4.12 from openshift.versions annotation (#629)

02e58ce

OCP 4.12 maintenance support has finished, so we should remove OCP 4.12 from the supported versions

Merge branch 'master' into release-prep-1.5.5

c612d04

vkmc force-pushed the release-prep-1.5.5 branch from c4aa65d to c612d04 Compare August 28, 2024 06:55

vkmc mentioned this pull request Aug 29, 2024

STF 1.5.5 release ops infrawatch/smart-gateway-operator#161

Merged

Set BUNDLE_CHANNELS and BUNDLE_DEFAULT_CHANNEL when doing a local build

17f2f50

Set the BUNDLE_CHANNELS and BUNDLE_DEFAULT_CHANNEL when doing a local build. This is required when doing a deployment from index with local builds.

Merge branch 'vkmc-set-local-build-channels' into release-prep-1.5.5

4364304

vkmc requested review from csibbitt, elfiesmelfie and ayefimov-1 August 29, 2024 14:03

elfiesmelfie approved these changes Aug 29, 2024

View reviewed changes

elfiesmelfie merged commit e694bd6 into stable-1.5 Aug 29, 2024
10 checks passed

elfiesmelfie deleted the release-prep-1.5.5 branch August 29, 2024 14:07

elfiesmelfie reviewed Aug 29, 2024

View reviewed changes

build/Dockerfile

Copy link

Collaborator

elfiesmelfie Aug 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might need updating

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

STF 1.5.5 release ops #628

STF 1.5.5 release ops #628

vkmc commented Aug 26, 2024 •

edited

Loading

softwarefactory-project-zuul bot commented Aug 26, 2024

csibbitt Aug 26, 2024

csibbitt left a comment

vkmc commented Aug 26, 2024

softwarefactory-project-zuul bot commented Aug 28, 2024

softwarefactory-project-zuul bot commented Aug 28, 2024

vkmc commented Aug 29, 2024

softwarefactory-project-zuul bot commented Aug 29, 2024

vkmc commented Aug 29, 2024

softwarefactory-project-zuul bot commented Aug 29, 2024

vkmc commented Aug 29, 2024

softwarefactory-project-zuul bot commented Aug 29, 2024

elfiesmelfie Aug 29, 2024

STF 1.5.5 release ops #628

STF 1.5.5 release ops #628

Conversation

vkmc commented Aug 26, 2024 • edited Loading

softwarefactory-project-zuul bot commented Aug 26, 2024

csibbitt Aug 26, 2024

Choose a reason for hiding this comment

csibbitt left a comment

Choose a reason for hiding this comment

vkmc commented Aug 26, 2024

softwarefactory-project-zuul bot commented Aug 28, 2024

softwarefactory-project-zuul bot commented Aug 28, 2024

vkmc commented Aug 29, 2024

softwarefactory-project-zuul bot commented Aug 29, 2024

vkmc commented Aug 29, 2024

softwarefactory-project-zuul bot commented Aug 29, 2024

vkmc commented Aug 29, 2024

softwarefactory-project-zuul bot commented Aug 29, 2024

elfiesmelfie Aug 29, 2024

Choose a reason for hiding this comment

vkmc commented Aug 26, 2024 •

edited

Loading