-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add role to deploy 17.1 env for adoption #2297
base: main
Are you sure you want to change the base?
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Thanks for the PR! ❤️ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would need some more flexibility, but it could match the need with some tweaks/iterations.
Build failed (check pipeline). Post https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/3e0801ce95b34d3184964958b146d59f ✔️ openstack-k8s-operators-content-provider SUCCESS in 10h 11m 04s |
6908cc9
to
d5f18ed
Compare
Build failed (check pipeline). Post https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/f96d929a98b34e69a20a3813f6e9f506 ✔️ openstack-k8s-operators-content-provider SUCCESS in 2h 14m 09s |
ae14a9b
to
cd97aff
Compare
Build failed (check pipeline). Post https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/179a53575c8e437a909be2c41711f18c ✔️ openstack-k8s-operators-content-provider SUCCESS in 2h 33m 15s |
cd97aff
to
be4fffd
Compare
Build failed (check pipeline). Post https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/100b262aeb704c85b86c6589151688bf ✔️ openstack-k8s-operators-content-provider SUCCESS in 1h 46m 28s |
be4fffd
to
87dd378
Compare
Build failed (check pipeline). Post https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/11d5100f2b374d3d875c5cd958cdfef7 ✔️ openstack-k8s-operators-content-provider SUCCESS in 34m 34s |
87dd378
to
7413ab1
Compare
Build failed (check pipeline). Post https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/c8f71eecadfe40639c4e22942a5fb6c5 ✔️ openstack-k8s-operators-content-provider SUCCESS in 1h 43m 57s |
7413ab1
to
ee6fda9
Compare
Build failed (check pipeline). Post https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/c22a1ed7df85440d83b1bb6fc81d2aa0 ✔️ openstack-k8s-operators-content-provider SUCCESS in 1h 40m 24s |
ee6fda9
to
bfc7d98
Compare
bfc7d98
to
ae7d460
Compare
Build failed (check pipeline). Post https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/2c06710e48894922925227767bc3a40b ✔️ openstack-k8s-operators-content-provider SUCCESS in 2h 02m 30s |
ae7d460
to
1b62440
Compare
Build failed (check pipeline). Post https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/782d8ffaf82c413cab42bc91c2a5128d ✔️ openstack-k8s-operators-content-provider SUCCESS in 1h 50m 00s |
1b62440
to
e99f3b5
Compare
809ddd1
to
9494ec2
Compare
Build failed (check pipeline). Post https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/adabf7ff0819468ea1b1c8a9aa1a76eb ✔️ openstack-k8s-operators-content-provider SUCCESS in 1h 26m 40s |
9494ec2
to
cff6ff5
Compare
cff6ff5
to
c7b98d7
Compare
Introduce an scenarios folder that will contain the needed input to deploy a 17.1 environment using the cifmw role added in [1]. The scenario is defined by a variable file, with undercloud specific parameters, overcloud specific parameters, hooks that can be called before or after both the undercloud and overcloud deployment, and two maps that relate the groups in inventory produced by the playbook that created the infra, to Roles and roles hostnames, to make it easier to work with different roles in different scenarios. [1] openstack-k8s-operators/ci-framework#2297
c7b98d7
to
d707019
Compare
roles/adoption_osp_deploy/README.md
Outdated
deployment. Defaults to `pool.ntp.org` | ||
* `cifmw_adoption_osp_deploy_repos`: (List) List of 17.1 repos to enable. Defaults to | ||
`[rhel-9-for-x86_64-baseos-eus-rpms, rhel-9-for-x86_64-appstream-eus-rpms, rhel-9-for-x86_64-highavailability-eus-rpms, openstack-17.1-for-rhel-9-x86_64-rpms, fast-datapath-for-rhel-9-x86_64-rpms, rhceph-6-tools-for-rhel-9-x86_64-rpms]` | ||
* `cifmw_adoption_osp_deploy_skip_stages`: (String or List) Stages to skip |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm I'd use actual tags: ['undercloud', 'overcloud']
and leverage --skip-tags
. That would be better, since it would really skip the tasks without even having to consider the actual tasks with some when
condition.
UNLESS this would be needed in CI context? Not sure if Zuul job definition allows to pass down and tags to skip... ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't think of any use case for this in a CI job, I added it mostly to avoid having to re-run the undercloud deployment whenever something went wrong with the overcloud deploy. I remembered this was implemented in kustomize_deploy and copied the idea from there. But you're right that this could be much simpler with tags, I'll give it a try locally and update the PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated the PR moving to ansible tags, it's indeed simpler, thanks @cjeanner
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks great, biggest issue i saw was the secrets that needs to be worked out
group-templates: | ||
osp-controllers: | ||
computes: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
which computes is this then in the adoption jobs. I mean, osp_computes is the source computes which then become the destination data plane nodes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these are the computes from the greenfield jobs, we don't use them for adoption (in https://github.com/openstack-k8s-operators/ci-framework/pull/2297/files#diff-60530f309d776a3851cf8d6bfb03bf6c4bb531cc5de3f57e66f5a14021d545f2R27 we set the number of vms of this group to 0). But, since we reuse the base network definition https://github.com/openstack-k8s-operators/ci-framework/blob/main/scenarios/reproducers/networking-definition.yml we can't undefine anything that is already there. So I need to make sure the ips for unused compute group do not overlap with the ones from osp-compute groups. I'll add a comment clarifying this
no_log: true | ||
ansible.builtin.command: >- | ||
subscription-manager register --force | ||
--org "{{ cifmw_adoption_osp_deploy_rhsm_org }}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how will this work though i mean the secret cannot live in this repo as it is not trusted/config repo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not intended to be used in CI jobs, this is here in case someone wants to run the deployment manually like I've been doing to test it. Once I start building the zuul jobs, I'll need to do something similar from a config repo
ansible.builtin.command: > | ||
podman login | ||
--username "{{ cifmw_adoption_osp_deploy_container_user }}" | ||
--password "{{ cifmw_adoption_osp_deploy_container_password }}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same this will have to be moved to where these secrets live
Introduce an scenarios folder that will contain the needed input to deploy a 17.1 environment using the cifmw role added in [1]. The scenario is defined by a variable file, with undercloud specific parameters, overcloud specific parameters, hooks that can be called before or after both the undercloud and overcloud deployment, and two maps that relate the groups in inventory produced by the playbook that created the infra, to Roles and roles hostnames, to make it easier to work with different roles in different scenarios. [1] openstack-k8s-operators/ci-framework#2297
Introduce an scenarios folder that will contain the needed input to deploy a 17.1 environment using the cifmw role added in [1]. The scenario is defined by a variable file, with undercloud specific parameters, overcloud specific parameters, hooks that can be called before or after both the undercloud and overcloud deployment, and two maps that relate the groups in inventory produced by the playbook that created the infra, to Roles and roles hostnames, to make it easier to work with different roles in different scenarios. [1] openstack-k8s-operators/ci-framework#2297
Introduce an scenarios folder that will contain the needed input to deploy a 17.1 environment using the cifmw role added in [1]. The scenario is defined by a variable file, with undercloud specific parameters, overcloud specific parameters, hooks that can be called before or after both the undercloud and overcloud deployment, and two maps that relate the groups in inventory produced by the playbook that created the infra, to Roles and roles hostnames, to make it easier to work with different roles in different scenarios. [1] openstack-k8s-operators/ci-framework#2297
Introduce an scenarios folder that will contain the needed input to deploy a 17.1 environment using the cifmw role added in [1]. The scenario is defined by a variable file, with undercloud specific parameters, overcloud specific parameters, hooks that can be called before or after both the undercloud and overcloud deployment, and two maps that relate the groups in inventory produced by the playbook that created the infra, to Roles and roles hostnames, to make it easier to work with different roles in different scenarios. [1] openstack-k8s-operators/ci-framework#2297
cd47816
to
bca17bf
Compare
Add a new role that will deploy a tripleo environment that will serve as source for adoption. This role is expected to cosume the infra created by [1], and a 17.1 scenario definition from the data-plane-adoption repo, introduced by [2]. It also introduce a small fix to the deploy-ocp.yml so the resulting ocp cluster is ready (the nodes needed to be uncordoned). [1] #2285 [2] openstack-k8s-operators/data-plane-adoption#597
bca17bf
to
4c4af9d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @cescgina for starting this. I think we can have a testproject to prove parity with what we have in rdo and then start merging your changes. They might be eventually refined w/ follow up patches if required.
{% endfor %} | ||
computes: | ||
children: | ||
{{ _adoption_source_scenario.roles_groups_map['osp-computes'] }}: {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if this inventory can be used in place of [1] when we trigger the ceph migration (after the adoption) and a playbook that is meant to prepare the TripleO environment [2].
This would save us time to build a "ceph compatibility" layer when it's about executing those playbooks. Do you think we can extend this to have a Ceph section? Maybe worth discussing this in a follow up.
[1] https://review.rdoproject.org/r/c/rdo-jobs/+/53695/59/playbooks/data_plane_adoption/templates/ceph_inventory.j2
[2] openstack-k8s-operators/data-plane-adoption#637
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fmount I can try adding a new section matching what you have in ceph_inventory
once I finish the current testing run, and check that it doesn't interfere with what I currently have. If that works we can get it in on this first version, if we need further changes we can leave it for a follow-up
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack and thank you!
- {{ gateway_ip }} | ||
domain: [] | ||
addresses: | ||
- ip_netmask: {{ ctlplane_ip }}/{{ ctlplane_cidr }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recently had intermittent ssh issues with this config, while having the ctlplane ip (that I assume we use to ssh to the node) on a dedicated nic didn't present such problem.
I'm not asking to change this layout at this point, but just wanted to point out an issue that I faced "a lot" during [1] testing, while not present in the infrared based jobs where the ssh interface is on a dedicated nic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fmount tbh at first I thought this would pose the problems you mentioned, but it seems to work fine in the machine I'm currently testing on. How does the config look like when you have the controlplane ip in a dedicated nic? What ip do you assign to the bridge?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have to check how the downstream env looks like, I just faced that issue during the ceph migration playbook, when you need to run os-net-config
to update the nodes' net config.
The interesting part is that I'm not facing it anymore in my last attempts, so I'm confused about this one.
I think downstream we simply do not have a ovs_bridge
for the ssh net nic.
Example:
1: lo inet 127.0.0.1/8 scope host lo\ valid_lft forever preferred_lft forever
2: enp1s0 inet 192.168.24.8/24 brd 192.168.24.255 scope global enp1s0\ valid_lft forever preferred_lft forever
2: enp1s0 inet 192.168.24.54/32 brd 192.168.24.255 scope global enp1s0\ valid_lft forever preferred_lft forever
7: br-ex inet 10.0.0.119/24 brd 10.0.0.255 scope global br-ex\ valid_lft forever preferred_lft forever
8: vlan30 inet 172.17.3.109/24 brd 172.17.3.255 scope global vlan30\ valid_lft forever preferred_lft forever
8: vlan30 inet 172.17.3.135/32 brd 172.17.3.255 scope global vlan30\ valid_lft forever preferred_lft forever
9: vlan70 inet 172.17.5.142/24 brd 172.17.5.255 scope global vlan70\ valid_lft forever preferred_lft forever
9: vlan70 inet 172.17.5.63/32 brd 172.17.5.255 scope global vlan70\ valid_lft forever preferred_lft forever
10: vlan50 inet 172.17.2.69/24 brd 172.17.2.255 scope global vlan50\ valid_lft forever preferred_lft forever
11: vlan20 inet 172.17.1.119/24 brd 172.17.1.255 scope global vlan20\ valid_lft forever preferred_lft forever
12: vlan40 inet 172.17.4.131/24 brd 172.17.4.255 scope global vlan40\ valid_lft forever preferred_lft forever
and
---
network_config:
- type: interface
name: nic1
use_dhcp: false
dns_servers: ['172.16.0.1', '10.0.0.1']
domain: []
addresses:
- ip_netmask: 192.168.24.8/24
routes: [{'default': True, 'nexthop': '192.168.24.1'}]
- type: vlan
vlan_id: 20
device: nic1
addresses:
- ip_netmask: 172.17.1.119/24
routes: []
- type: vlan
vlan_id: 40
device: nic1
addresses:
- ip_netmask: 172.17.4.131/24
routes: []
- type: ovs_bridge
name: br-tenant
use_dhcp: false
members:
- type: interface
name: nic2
primary: true
- type: vlan
vlan_id: 50
addresses:
- ip_netmask: 172.17.2.69/24
routes: []
- type: vlan
vlan_id: 30
addresses:
- ip_netmask: 172.17.3.109/24
routes: []
- type: vlan
vlan_id: 70
addresses:
- ip_netmask: 172.17.5.142/24
routes: []
- type: ovs_bridge
name: br-ex
dns_servers: ['172.16.0.1', '10.0.0.1']
domain: []
use_dhcp: false
addresses:
- ip_netmask: 10.0.0.119/24
routes: []
members:
- type: interface
name: nic3
primary: true
Not sure it helps and you don't have to change what you have because of this, but hope this gives you the idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks that's helpful. It's quite different to what I have and tbh I'm not sure how much would I need to change on the tripleo side to use something like this, for now I'm leaning on keeping the current one and having this discussion here for reference in case we start seeing problems with the interface. Now I wonder though, is there some topology where the current setup will be problematic and will need to be changed? Should we have this template as part of the scenario definition?
Build failed (check pipeline). Post https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/46a67ffcd92c485bad4a385db672d87a ✔️ openstack-k8s-operators-content-provider SUCCESS in 1h 52m 36s |
Add a new role that will deploy a tripleo environment that will serve as
source for adoption. This role is expected to cosume the infra created
by [1], and a 17.1 scenario definition from the data-plane-adoption
repo, introduced by [2].
It also introduce a small fix to the deploy-ocp.yml so the resulting ocp
cluster is ready (the nodes needed to be uncordoned).
[1] #2285
[2] openstack-k8s-operators/data-plane-adoption#597