Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add deleting cells automation #846

Merged

Conversation

mrkisaolamb
Copy link
Contributor

@mrkisaolamb mrkisaolamb commented Aug 19, 2024

close: OSPRH-105

@mrkisaolamb mrkisaolamb requested review from gibizer and removed request for olliewalsh August 19, 2024 13:23
@mrkisaolamb mrkisaolamb changed the title Add deleting cells automation WIP: Add deleting cells automation Aug 19, 2024
@mrkisaolamb
Copy link
Contributor Author

Still require more testing and adding more func/kuttle tests

@mrkisaolamb mrkisaolamb force-pushed the cell_delete branch 3 times, most recently from f67ed5e to 8fc5a10 Compare August 22, 2024 13:16
@mrkisaolamb mrkisaolamb changed the title WIP: Add deleting cells automation Add deleting cells automation Aug 22, 2024
Copy link
Contributor

@gibizer gibizer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to use this locally but it fails with:

 ❯ oc logs nova-cell1-cell-delete-tw8ts
+ sudo -E kolla_set_configs
sudo: unable to send audit message: Operation not permitted
INFO:__main__:Loading config file at /var/lib/kolla/config_files/config.json
INFO:__main__:Validating config file
INFO:__main__:Kolla config strategy set to: COPY_ALWAYS
INFO:__main__:Copying service configuration files
INFO:__main__:Deleting /etc/nova/nova.conf
INFO:__main__:Copying /var/lib/openstack/config/nova-blank.conf to /etc/nova/nova.conf
INFO:__main__:Setting permission for /etc/nova/nova.conf
INFO:__main__:Copying /var/lib/openstack/config/01-nova.conf to /etc/nova/nova.conf.d/01-nova.conf
INFO:__main__:Setting permission for /etc/nova/nova.conf.d/01-nova.conf
INFO:__main__:Copying /var/lib/openstack/bin/delete_cell.sh to /bin/delete_cell.sh
INFO:__main__:Setting permission for /bin/delete_cell.sh
INFO:__main__:Copying /var/lib/openstack/config/my.cnf to /etc/my.cnf
INFO:__main__:Setting permission for /etc/my.cnf
INFO:__main__:Writing out command to execute
++ cat /run_command
+ CMD=/bin/ensure_cell_mapping.sh
+ ARGS=
+ sudo kolla_copy_cacerts
sudo: unable to send audit message: Operation not permitted
+ [[ ! -n '' ]]
+ . kolla_extend_start
+ echo 'Running command: '\''/bin/ensure_cell_mapping.sh'\'''
Running command: '/bin/ensure_cell_mapping.sh'
+ umask 0022
+ exec /bin/ensure_cell_mapping.sh
/usr/local/bin/kolla_start: line 22: /bin/ensure_cell_mapping.sh: No such file or directory

templates/nova-manage/config/cell-delete-config.json Outdated Show resolved Hide resolved
templates/nova-manage/bin/delete_cell.sh Outdated Show resolved Hide resolved
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/4f7309eceb0b40788c94630721b7dbec

openstack-meta-content-provider NODE_FAILURE Node request 100-0007561214 failed in 0s
⚠️ nova-operator-kuttl SKIPPED Skipped due to failed job openstack-meta-content-provider
⚠️ nova-operator-tempest-multinode SKIPPED Skipped due to failed job openstack-meta-content-provider
⚠️ nova-operator-tempest-multinode-ceph SKIPPED Skipped due to failed job openstack-meta-content-provider

@mrkisaolamb
Copy link
Contributor Author

recheck

@gibizer
Copy link
Contributor

gibizer commented Sep 3, 2024

Tried this again locally by deleting cell1.

I see two issues:

  • the cell_mapping is not deleted
❯ oc logs nova-cell1-cell-delete-t8ddb
+ sudo -E kolla_set_configs
sudo: unable to send audit message: Operation not permitted
INFO:__main__:Loading config file at /var/lib/kolla/config_files/config.json
INFO:__main__:Validating config file
INFO:__main__:Kolla config strategy set to: COPY_ALWAYS
INFO:__main__:Copying service configuration files
INFO:__main__:Deleting /etc/nova/nova.conf
INFO:__main__:Copying /var/lib/openstack/config/nova-blank.conf to /etc/nova/nova.conf
INFO:__main__:Setting permission for /etc/nova/nova.conf
INFO:__main__:Copying /var/lib/openstack/config/01-nova.conf to /etc/nova/nova.conf.d/01-nova.conf
INFO:__main__:Setting permission for /etc/nova/nova.conf.d/01-nova.conf
INFO:__main__:Copying /var/lib/openstack/bin/delete_cell.sh to /bin/delete_cell.sh
INFO:__main__:Setting permission for /bin/delete_cell.sh
INFO:__main__:Copying /var/lib/openstack/config/my.cnf to /etc/my.cnf
INFO:__main__:Setting permission for /etc/my.cnf
INFO:__main__:Writing out command to execute
++ cat /run_command
+ CMD=/bin/delete_cell.sh
+ ARGS=
+ sudo kolla_copy_cacerts
sudo: unable to send audit message: Operation not permitted
+ [[ ! -n '' ]]
+ . kolla_extend_start
+ echo 'Running command: '\''/bin/delete_cell.sh'\'''
Running command: '/bin/delete_cell.sh'
+ umask 0022
+ exec /bin/delete_cell.sh
+ export CELL_NAME=cell1
+ CELL_NAME=cell1
++ nova-manage cell_v2 list_cells
++ tr ' ' '|'
++ tr --squeeze-repeats '|'
++ cut -d '|' -f 3
++ grep -e '^|cell1|'
Modules with known eventlet monkey patching issues were imported prior to eventlet monkey patching: urllib3. This warning can usually be ignored if the caller is only importing and not executing nova code.
+ cell_uuid=30e83114-9261-4476-bfa4-e1301953e85f
+ '[' -z 30e83114-9261-4476-bfa4-e1301953e85f ']'
❯ oc rsh nova-cell0-conductor-0 nova-manage cell_v2 list_cells
Modules with known eventlet monkey patching issues were imported prior to eventlet monkey patching: urllib3. This warning can usually be ignored if the caller is only importing and not executing nova code.
+-------+--------------------------------------+-----------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------+----------+
|  Name |                 UUID                 |                                      Transport URL                                      |                                             Database Connection                                             | Disabled |
+-------+--------------------------------------+-----------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------+----------+
| cell0 | 00000000-0000-0000-0000-000000000000 |                                         rabbit:                                         |    mysql+pymysql://****:****@openstack.openstack.svc/nova_cell0?read_default_file=/etc/my.cnf    |  False   |
| cell1 | 30e83114-9261-4476-bfa4-e1301953e85f | rabbit://****:****@rabbitmq-cell1.openstack.svc:5671/?ssl=1 | mysql+pymysql://****:****@openstack-cell1.openstack.svc/nova_cell1?read_default_file=/etc/my.cnf |  False   |
+-------+--------------------------------------+-----------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------+----------+
  • MariaDBAccount and MariaDBDatabase is not deleted
❯ oc get MariaDBAccount | grep nova
nova-api     18h
nova-cell0   18h
nova-cell1   18h
❯ oc get MariaDBDatabase | grep nova
nova-api     18h
nova-cell0   18h
nova-cell1   18h
 
❯ oc get MariaDBAccount/nova-cell1 -o yaml
apiVersion: mariadb.openstack.org/v1beta1
kind: MariaDBAccount
metadata:
  creationTimestamp: "2024-09-02T15:04:44Z"
  finalizers:
  - openstack.org/nova
  - openstack.org/mariadbaccount
  generation: 1
  labels:
    mariaDBDatabaseName: nova-cell1
  name: nova-cell1
  namespace: openstack
 
 
❯ oc get MariaDBDatabase/nova-cell1 -o yaml
apiVersion: mariadb.openstack.org/v1beta1
kind: MariaDBDatabase
metadata:
  creationTimestamp: "2024-09-02T15:04:44Z"
  finalizers:
  - openstack.org/nova
  - openstack.org/mariadbdatabase
  - openstack.org/mariadbaccount-nova-cell1
  generation: 1
  labels:
    dbName: openstack-cell1
  name: nova-cell1
  namespace: openstack

Copy link
Contributor

@gibizer gibizer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found more leftovers. See inline.

controllers/nova_controller.go Show resolved Hide resolved
@gibizer
Copy link
Contributor

gibizer commented Sep 3, 2024

Also we need to call cleanServiceFromNovaDb in the reconcileDelete of the NovaConductor to ensure that the nova service is also deleted form the nova api DB.

Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/2009990370ba428c97acb4d073f5198b

✔️ openstack-meta-content-provider SUCCESS in 2h 31m 38s
nova-operator-kuttl FAILURE in 49m 15s
✔️ nova-operator-tempest-multinode SUCCESS in 2h 14m 02s
nova-operator-tempest-multinode-ceph FAILURE in 1h 21m 37s

Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/c15b74bd5d604f8885fe61832ad48869

✔️ openstack-meta-content-provider SUCCESS in 3h 39m 35s
nova-operator-kuttl FAILURE in 47m 51s
✔️ nova-operator-tempest-multinode SUCCESS in 2h 16m 15s
nova-operator-tempest-multinode-ceph FAILURE in 2h 29m 23s

Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/9b2f094508b3481bb79b652b9a624e3b

✔️ openstack-meta-content-provider SUCCESS in 2h 47m 22s
nova-operator-kuttl FAILURE in 50m 14s
✔️ nova-operator-tempest-multinode SUCCESS in 1h 55m 46s
✔️ nova-operator-tempest-multinode-ceph SUCCESS in 2h 31m 27s

controllers/nova_controller.go Outdated Show resolved Hide resolved
controllers/nova_controller.go Outdated Show resolved Hide resolved
Copy link

Merge Failed.

This change or one of its cross-repo dependencies was unable to be automatically merged with the current state of its repository. Please rebase the change and upload a new patchset.
Warning:
Error merging github.com/openstack-k8s-operators/nova-operator for 846,70ff3a21897276fed1cf76b23467c226c6f86575

@gibizer
Copy link
Contributor

gibizer commented Oct 4, 2024

@gibizer gibizer self-requested a review October 4, 2024 15:59
Copy link
Contributor

@gibizer gibizer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I consider this ready now (except the mariadb-operator dependency). I cannot approve it as I wrote a sizeable chunk of fixes for it.

Copy link
Contributor

@gibizer gibizer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opened trackers for the remaining items

- script: |
oc get -n nova-kuttl-default Secret/nova-cell1-manage-config-data || exit 0
exit 1
# TODO(gibi): this needs an openstack-operator change to clean up the cell1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

controllers/nova_controller.go Show resolved Hide resolved
Both CR is created by the nova controller for the given cell, so when
that cell is deleted the CRs also needs to be deleted. We also need to
delete the secret connected to the MariaDBAccount as mariadb-operator
does not delete it today.

It also added graceful handling of NotFound during resource deletion and
refactored the cleanup logic to avoid bugs if the delete needs to be
retried.

This now depends on the
openstack-k8s-operators/mariadb-operator#268
instead of copying that change.
@gibizer
Copy link
Contributor

gibizer commented Oct 8, 2024

the mariadb-operator dependency now landed, so for me this is ready to go.

@zzzeek
Copy link
Contributor

zzzeek commented Oct 8, 2024

I'm assuming the nova cells use case is unusual enough that the routines here are still needing to deal with MariaDBAccount and MariaDBDatabase explicitly, rather than using the new DeleteDatabaseAndAccountFinalizers routine, is that correct?

@gibizer
Copy link
Contributor

gibizer commented Oct 8, 2024

I'm assuming the nova cells use case is unusual enough that the routines here are still needing to deal with MariaDBAccount and MariaDBDatabase explicitly, rather than using the new DeleteDatabaseAndAccountFinalizers routine, is that correct?

Yes. We do call DeleteDatabaseAndAccountFinalizers but we also need to explicitly delete the MariaDBDatabase CR afterwards as it is owned by the Nova CR but the Nova CR is not going away, just the NovaCell CR, so we cannot rely on the k8s garbage collector to delete the MariaDBDatabase.

@SeanMooney SeanMooney dismissed gibizer’s stale review October 8, 2024 16:17

gibis concerns have either been adress, are tracked by a follow up jira issue or we have delegated part of the db handeling to the maraidb operator so clearing this stale review

Copy link
Contributor

openshift-ci bot commented Oct 10, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mrkisaolamb, SeanMooney

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [SeanMooney,mrkisaolamb]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-bot openshift-merge-bot bot merged commit 0721f8b into openstack-k8s-operators:main Oct 10, 2024
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants