Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Enforce NonAdminBackup spec field values #1584

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

mateusoliveira43
Copy link
Contributor

@mateusoliveira43 mateusoliveira43 commented Nov 7, 2024

Why the changes were made

Related to migtools/oadp-non-admin#37

This PR adds EnforceBackupSpec to DPA spec.nonAdmin. With it, admin user can enforce that all NonAdminBackups created have the enforced spec fields with the values admin user configured.

How to test the changes made

Follow install-from-source NAC testing documentation, pointing my NAC fork branch fix/enforce-backup-spec-field-values and this branch.

Test that:

  • when changing DPA spec.nonAdmin.enforceBackupSpec, NAC Pod recreates (for new enforcement to take effect)
  • when not changing DPA spec.nonAdmin.enforceBackupSpec, NAC Pod does not get recreated
  • when setting DPA spec.nonAdmin.enforceBackupSpec, NonAdminBackups fail validation if not respecting enforcement

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 7, 2024
Copy link

openshift-ci bot commented Nov 7, 2024

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

Copy link

openshift-ci bot commented Nov 7, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mateusoliveira43

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 7, 2024
@@ -166,7 +174,7 @@ func ensureRequiredSpecs(deploymentObject *appsv1.Deployment, image string, imag
Name: nonAdminObjectName,
Image: image,
ImagePullPolicy: imagePullPolicy,
Env: []corev1.EnvVar{namespaceEnvVar},
Env: []corev1.EnvVar{namespaceEnvVar, enforceBackupSpecEnvVar},
Copy link
Contributor Author

@mateusoliveira43 mateusoliveira43 Nov 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, goal here is admin user will enforce some NonAdminBackup spec field values through DPA (for example, SnapshotVolumes must be false for every NonAdminBackup created)

During NAC startup, it reads these values admin user has set.

we could read from DPA directly, doing a DPA list call in OADP namespace before here https://github.com/migtools/oadp-non-admin/blob/8d8de636b68c0241b937403a98ab4384616458ba/cmd/main.go#L135 and passing result to NonAdminBackupReconciler. Problems with this approach are:

  1. need to change NAC RBAC (but easy to fix, just add to NAC role list DPA)
  2. how to tell NAC deployment to restart when that field changes?

Current implementation idea is passing DPA field value as env var to deployment. On NAC controller, it would Unmarshal the value and use it. Problem with this approach is that we can reach a char limit on env var value.

Any ideas on a third solution? Or how to fix problem 2 of first idea?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how to tell NAC deployment to restart when that field changes?

if NAC is in a deployment, simply os.Exit() should result in another pod coming up.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed logic here

now I save DPA spec.nonAdmin.enforceBackupSpec that created Deployment and compare with current DPA spec.nonAdmin.enforceBackupSpec, only update Deployment when those are different

return false, errors.New("in order to enable/disable the non-admin feature please set dpa.spec.unsupportedOverrides[tech-preview-ack]: 'true'")
return false, errors.New("in order to enable the non-admin feature please set dpa.spec.unsupportedOverrides[tech-preview-ack]: 'true'")
}
if r.dpa.Spec.NonAdmin.EnforceBackupSpec != nil {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added this to avoid headaches with users creating bad configuration

I think this is NAC responsibility, not user. If user can set this, then all NonAdminBackups will fail (or only of a specific namespace)

do you agree with this safety check?

(if yes, other fields should be added here?)

@mateusoliveira43

This comment was marked as resolved.

@mateusoliveira43 mateusoliveira43 force-pushed the fix/enforce-non-admin-bckup-spec-field-values branch from ce46195 to e170dec Compare November 12, 2024 12:46
@mateusoliveira43 mateusoliveira43 marked this pull request as ready for review November 12, 2024 12:53
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 12, 2024
@openshift-ci openshift-ci bot requested review from mpryc and mrnold November 12, 2024 12:53
@weshayutin
Copy link
Contributor

ok.. checking this out.

  • Created a deployment from your branch

  • Created a dpa.

  • Added oadp-non-admin spec and overrides

  • added enforceBackupSpec

    enforceBackupSpec:
      storageLocation: dpa-sample-1
  • The controller properly restarted \0/
non-admin-controller-5bcd79f674-dtqls non-admin-controller 2024-11-13T01:18:50Z	INFO	Stopping and waiting for non leader election runnables
non-admin-controller-5bcd79f674-dtqls non-admin-controller 2024-11-13T01:18:50Z	INFO	Stopping and waiting for leader election runnables
non-admin-controller-5bcd79f674-dtqls non-admin-controller 2024-11-13T01:18:50Z	INFO	Shutdown signal received, waiting for all workers to finish	{"controller": "nonadminbackup", "controllerGroup": "nac.oadp.openshift.io", "controllerKind": "NonAdminBackup"}
non-admin-controller-5bcd79f674-dtqls non-admin-controller 2024-11-13T01:18:50Z	INFO	All workers finished	{"controller": "nonadminbackup", "controllerGroup": "nac.oadp.openshift.io", "controllerKind": "NonAdminBackup"}
non-admin-controller-5bcd79f674-dtqls non-admin-controller 2024-11-13T01:18:50Z	INFO	Stopping and waiting for caches
non-admin-controller-5bcd79f674-dtqls non-admin-controller W1113 01:18:50.319171       1 reflector.go:462] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:229: watch of *v1.Backup ended with: an error on the server ("unable to decode an event from the watch stream: context canceled") has prevented the request from succeeding
non-admin-controller-5bcd79f674-dtqls non-admin-controller 2024-11-13T01:18:50Z	INFO	Stopping and waiting for webhooks
non-admin-controller-5bcd79f674-dtqls non-admin-controller 2024-11-13T01:18:50Z	INFO	Stopping and waiting for HTTP servers
non-admin-controller-5bcd79f674-dtqls non-admin-controller 2024-11-13T01:18:50Z	INFO	controller-runtime.metrics	Shutting down metrics server with timeout of 1 minute
non-admin-controller-5bcd79f674-dtqls non-admin-controller 2024-11-13T01:18:50Z	INFO	shutting down server	{"kind": "health probe", "addr": "[::]:8081"}
non-admin-controller-5bcd79f674-dtqls non-admin-controller 2024-11-13T01:18:50Z	INFO	Wait completed, proceeding to shutdown the manager
non-admin-controller-6d5b9769dd-jpblw non-admin-controller 2024-11-13T01:18:49Z	INFO	Starting workers	{"controller": "nonadminbackup", "controllerGroup": "nac.oadp.openshift.io", "controllerKind": "NonAdminBackup", "worker count": 1}
- non-admin-controller-5bcd79f674-dtqls › non-admin-controller
  • Adding invalid backupspec key/values fails appropriate - see screenshot

Screenshot from 2024-11-12 18-22-12

It's not clear to me how to test:
when not changing DPA spec.nonAdmin.enforceBackupSpec, NAC Pod does not get recreated

If you can help me understand that, I'll poke at this a bit more.

@kaovilai
Copy link
Member

If you can help me understand that, I'll poke at this a bit more.

See design:
https://github.com/migtools/oadp-non-admin/pull/110/files#diff-788dfa1caf04072e867eaa6c490201f32d2e8bdf4cda035ba397d4d974737850R38

@kaovilai
Copy link
Member

If admin user changes any enforced field value, NAC Pod is recreated (and only NAC Pod) to always be up to date with admin user enforcements.

Therefore following should happen

when not changing DPA spec.nonAdmin.enforceBackupSpec, NAC Pod does not get recreated

@mateusoliveira43
Copy link
Contributor Author

@weshayutin change for example velero log level. That should restart Velero Pod, but not NAC Pod

@mateusoliveira43

This comment was marked as resolved.

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 13, 2024
rework of logic

Signed-off-by: Mateus Oliveira <[email protected]>
fix nil pointer

Signed-off-by: Mateus Oliveira <[email protected]>
fix comparison check

Signed-off-by: Mateus Oliveira <[email protected]>
make update-non-admin-manifests

Signed-off-by: Mateus Oliveira <[email protected]>
run make update-non-admin-manifests

Signed-off-by: Mateus Oliveira <[email protected]>
@mateusoliveira43 mateusoliveira43 force-pushed the fix/enforce-non-admin-bckup-spec-field-values branch from e170dec to 9a7b12d Compare November 14, 2024 11:51
@mateusoliveira43
Copy link
Contributor Author

/unhold

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 14, 2024
Copy link

openshift-ci bot commented Nov 14, 2024

@mateusoliveira43: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/4.14-e2e-test-kubevirt-aws 9a7b12d link true /test 4.14-e2e-test-kubevirt-aws
ci/prow/4.14-e2e-test-aws 9a7b12d link true /test 4.14-e2e-test-aws
ci/prow/4.15-e2e-test-aws 9a7b12d link true /test 4.15-e2e-test-aws
ci/prow/4.17-e2e-test-kubevirt-aws 9a7b12d link true /test 4.17-e2e-test-kubevirt-aws
ci/prow/4.17-e2e-test-aws 9a7b12d link true /test 4.17-e2e-test-aws

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants