Add scenarios for 17.1 env for adoption #597

cescgina · 2024-09-10T14:05:56Z

Introduce a scenarios folder that will contain the needed input to
deploy a 17.1 environment using the cifmw role added in [1]. The
scenario is defined by a variable file, with undercloud specific
parameters, overcloud specific parameters, hooks that can be called
before or after both the undercloud and overcloud deployment, and two
maps that relate the groups in inventory produced by the playbook that
created the infra, to Roles and roles hostnames, to make it easier to
work with different roles in different scenarios.

[1] openstack-k8s-operators/ci-framework#2297

bogdando · 2024-09-11T10:25:19Z

scenarios/hci.yaml

+  osp-computes: "overcloud-computehci"
+  osp-controllers: "overcloud-controller"
+overcloud:
+  stackname: "overcloud"


this looks great, and allows multi-cell / dcn deployments in the future!

jistr · 2024-09-13T14:28:53Z

👍

scenarios/hci.yaml

softwarefactory-project-zuul · 2024-09-20T16:40:36Z

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/e04c2741d5604ca78867359472a4e21d

✔️ adoption-standalone-to-crc-ceph SUCCESS in 2h 33m 31s
❌ adoption-standalone-to-crc-no-ceph FAILURE in 2h 06m 45s

scenarios/hci.yaml

softwarefactory-project-zuul · 2024-09-25T17:10:00Z

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/a9cbacb4f4ee4665a931c153b5d3d154

❌ adoption-standalone-to-crc-ceph TIMED_OUT in 4h 16m 12s
❌ adoption-standalone-to-crc-no-ceph TIMED_OUT in 4h 16m 42s

Add a new role that will deploy a tripleo environment that will serve as source for adoption. This role is expected to cosume the infra created by [1], and a 17.1 scenario definition from the data-plane-adoption repo, introduced by [2]. It also introduce a small fix to the deploy-ocp.yml so the resulting ocp cluster is ready (the nodes needed to be uncordoned). [1] #2285 [2] openstack-k8s-operators/data-plane-adoption#597

softwarefactory-project-zuul · 2024-09-30T13:23:56Z

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/4037787d89be4b339a782b7567e1ca68

❌ adoption-standalone-to-crc-ceph FAILURE in 2h 34m 28s
✔️ adoption-standalone-to-crc-no-ceph SUCCESS in 2h 40m 10s

cescgina · 2024-09-30T13:26:13Z

recheck

marios · 2024-10-01T06:27:56Z

scenarios/hci/network_data.yaml

+  mtu: 1500
+  vip: true
+  name_lower: storage
+  dns_domain: storage.mydomain.tld.


for the TLS support (eventually) we'll need to jinjafy this for the domain

I was working under this assumption that this files should be entirely static, i.e. there would be a 1:1 mapping of scenario and configuration. But your comment make we wonder, will we have a tlse and a non-tlse version of some scenario, or will we have two different scenarios, one for each case? I would go for the latter, if need to change the setup, we should create a new scenario, even if we incur in some duplication

@cescgina the problem I see is that TLS-e is enabled by default in the topology you land (unigamma). You can choose to disable it in the ctlplane/dataplane CRs, or make it configurable.
I don't think we want to duplicate the entire configuration to have something like hci-tls and hci-no-tls, but making it a j2 parameter here means we can process it at ci-framework level where we consume those variables and generated pass it as an extravar to the playbook.
It seems a little effort compared to the complexity of maintaining two scenarios for the same HCI topology.

indeed templating this is simple enough from the ci-framework side, though I'm wary of adding too much complexity here, to avoid ending up with something similar to what we had in install_yamls devsetup, I'll give it a try. The important question here I think is, will we have variations in the scenarios like this, i.e a hci-tls and hci-no-tls? I worked under the assumption that here we would target a small subset of well defined configurations

I don't think we have too many variations: VAs/DTs are supposed to be already an "opinionated" combination of components that reflects what customers use, so I don't expect much difference other than things like tls. We can keep this bit also for a follow up, so we can do good choices when we already have a stable codebase to work on. My comment is more like: let's avoid duplication at this stage, and we can consolidate scenarios in a follow up.

ack, thanks for the clarification, as said, the change is simple enough, so I expect I'll be able to updated both PRs with it by the end of day, thanks for the review, it's really useful to clear this points now

softwarefactory-project-zuul · 2024-10-01T14:48:24Z

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/86fec2e0b2f24a13b1234d474fd2cfd1

✔️ adoption-standalone-to-crc-ceph SUCCESS in 2h 33m 50s
❌ adoption-standalone-to-crc-no-ceph RETRY_LIMIT in 13m 03s

Add a new role that will deploy a tripleo environment that will serve as source for adoption. This role is expected to cosume the infra created by [1], and a 17.1 scenario definition from the data-plane-adoption repo, introduced by [2]. It also introduce a small fix to the deploy-ocp.yml so the resulting ocp cluster is ready (the nodes needed to be uncordoned). [1] #2285 [2] openstack-k8s-operators/data-plane-adoption#597

softwarefactory-project-zuul · 2024-10-02T13:11:08Z

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/ee6667016a734dfa9a6fecc8961761fb

❌ adoption-standalone-to-crc-ceph TIMED_OUT in 4h 13m 50s
❌ adoption-standalone-to-crc-no-ceph NODE_FAILURE Node request 100-0007603188 failed in 0s
❌ adoption-docs-preview POST_FAILURE in 1m 14s

Add a new role that will deploy a tripleo environment that will serve as source for adoption. This role is expected to cosume the infra created by [1], and a 17.1 scenario definition from the data-plane-adoption repo, introduced by [2]. It also introduce a small fix to the deploy-ocp.yml so the resulting ocp cluster is ready (the nodes needed to be uncordoned). [1] #2285 [2] openstack-k8s-operators/data-plane-adoption#597

slawqo · 2024-10-03T10:26:00Z

I am definitely not Tripleo expert here but this looks good for me in general.

jistr

Thank you Joan for putting this together i think it is very good. Now that i was able to think through the whole thing in combo with the other patch, i put in some very "futursitic concerns" as comments but they are not a -1, just some food for thought. LGTM thanks :).

jistr · 2024-10-03T14:21:29Z

scenarios/README.md

+- `config_download_file`: path to the config-download file used to pass
+environment variables to the overcloud, required.
+- `ceph_osd_spec_file`: path to the osd spec file used to deploy ceph when
+applicable, optional.


I'm wondering if we'll have use cases with multiple overcloud stacks. IIRC the DCN topology is deployed in TripleO via multiple stacks. I'm not sure if it would be better to have overclouds or a single overcloud and we would later add extra_overclouds or something like that. Even in the DCN scenario i think one of the overcloud stacks is "primary" where the control plane is hosted, and the other stacks are additive with computes only. So a design with a single primary overcloud stack could perhaps be the right thing to start with anyway.

I should add a disclaimer that this area is somewhat outside my comfort zone so it's possible that i'm talking nonsense.

Will tag @bogdando and @gibizer in case they have a clearer idea.

good point, I think it's "relatively" easy to change the overcloud definition to be a list of overclouds with the same parameters. I can give it a try locally, but waiting for more details on what it would be needed before pushing anything

right, you can as well just call them stack or stacks, omitting overcloud terminology.
The env files used for each stack deployment, alongside the pre and post execution hooks, must be instantiated as well. Let them holding only static values for the beginning, no templating yet.

ack thanks @bogdando I'm preparing an update in this line, I'll update the PR once I've tested it

I changed the overcloud section to a stacks one, that should support deploying multiple stacks (each has its own network_data, config-download, etc.)

jistr · 2024-10-03T15:19:13Z

scenarios/hci.yaml

+    - "/usr/share/openstack-tripleo-heat-templates/environments/manila-cephfsnative-config.yaml"
+    - "/usr/share/openstack-tripleo-heat-templates/environments/enable-legacy-telemetry.yaml"
+    - "/usr/share/openstack-tripleo-heat-templates/environments/cephadm/cephadm.yaml"
+    - "/usr/share/openstack-tripleo-heat-templates/environments/cephadm/ceph-mds.yaml"


I wondered if we should get rid of args and vars and just put the whole deploy command as a variable to set. I mean, will the deployment command always be in the format that is defined in the other patch:

https://github.com/openstack-k8s-operators/ci-framework/pull/2297/files#diff-9e5f7c63638bbac9bfa4df077fe45e16b3acb56feaf89599aacce7a307624498R64

Same for the ceph deploy command, should we just export that out in full as a variable rather than have it in the other patch:

https://github.com/openstack-k8s-operators/ci-framework/pull/2297/files#diff-9a8face42b21d9562691b08533c92de7db4832e6314cb0b9c5479f42dd3d448cR131

In general i tend to be more in favor of exporting larger command chunks as variables perhaps due to being burned by Infrared in the past, where it tried to be very smart and compose the commands for us from various pieces of info, but we almost always ended up fighting it, trying to "trick it" to produce the commands we needed. So its "smartness" quite often stood in the way rather than helped.

I realize there is more complexity to this though, e.g. some of the overcloud deploy input files are generated inside the deployment role rather than provided from the scenario here, so what i'm suggesting above is perhaps easier said then done.

Maybe it would be best to stay the course and just have this as food for thought when designing/amending the overcloud deploy, ceph deploy and similar commands, that when we hardcode the structure for them, we should be hardcoding something quite un-opinionated.

As a last resort, we can introduce an override mechanism later in the form "If this variable is specified, use it as overcloud deploy command. If it is not specified, use the command hardcoded in the deployment role."

I see what you mean, but as you mentioned dealing with the files generated dynamically will be tricky, I like your last suggestion though, to have an optional argument that'll allow the user to override the whole command, I'll add that and update both patches after testing

Sounds good, i think it could be a separate PR later even, or done on a "when we need it" basis, since IIUC it wouldn't do breaking changes to the interface anyway. Up to you :).

marios · 2024-10-04T06:39:41Z

scenarios/hci/network_data.yaml.j2

+  mtu: 1500
+  vip: true
+  name_lower: storage
+  dns_domain: storage.{{ cloud_domain }}.tld.


remove the .tld bit too but leave the trailing '.'

storage.{{ cloud_domain }}.

good catch, fixed, thanks!

softwarefactory-project-zuul · 2024-10-04T09:51:02Z

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/82a14b9b5957476a9a6a99e0fd217aa6

✔️ adoption-standalone-to-crc-ceph SUCCESS in 2h 34m 23s
❌ adoption-standalone-to-crc-no-ceph NODE_FAILURE Node request 100-0007607025 failed in 0s
✔️ adoption-docs-preview SUCCESS in 1m 25s

marios · 2024-10-07T06:32:40Z

scenarios/hci/network_data.yaml.j2

+  service_net_map_replace: external
+  subnets:
+    external_subnet:
+      gateway_ip: '172.21.0.1'


just adding a note about what we discussed on friday; remove this one and add the External network to the controllers (otherwise tls setup fails during deployment)

done, will run a new test and upload the PR

scenarios/hci/roles.yaml

ciecierski

looks good to me

Introduce an scenarios folder that will contain the needed input to deploy a 17.1 environment using the cifmw role added in [1]. The scenario is defined by a variable file, with undercloud specific parameters, overcloud specific parameters, hooks that can be called before or after both the undercloud and overcloud deployment, and two maps that relate the groups in inventory produced by the playbook that created the infra, to Roles and roles hostnames, to make it easier to work with different roles in different scenarios. [1] openstack-k8s-operators/ci-framework#2297

Add a new role that will deploy a tripleo environment that will serve as source for adoption. This role is expected to cosume the infra created by [1], and a 17.1 scenario definition from the data-plane-adoption repo, introduced by [2]. It also introduce a small fix to the deploy-ocp.yml so the resulting ocp cluster is ready (the nodes needed to be uncordoned). [1] #2285 [2] openstack-k8s-operators/data-plane-adoption#597

jistr · 2024-10-08T12:52:38Z

@cescgina Ok to merge?

cescgina · 2024-10-08T12:54:13Z

@cescgina Ok to merge?

I'd say yes, I think this is a good first version

Add a new role that will deploy a tripleo environment that will serve as source for adoption. This role is expected to cosume the infra created by [1], and a 17.1 scenario definition from the data-plane-adoption repo, introduced by [2]. It also introduce a small fix to the deploy-ocp.yml so the resulting ocp cluster is ready (the nodes needed to be uncordoned). [1] #2285 [2] openstack-k8s-operators/data-plane-adoption#597

cescgina mentioned this pull request Sep 10, 2024

[POC] adoption 17.1 input variables openstack-k8s-operators/architecture#385

Closed

cescgina force-pushed the add_osp17_1_scenarios branch from 227f09b to b81cee4 Compare September 10, 2024 17:53

bogdando reviewed Sep 11, 2024

View reviewed changes

cescgina requested a review from marios September 13, 2024 09:32

marios reviewed Sep 16, 2024

View reviewed changes

scenarios/hci.yaml Outdated Show resolved Hide resolved

cescgina force-pushed the add_osp17_1_scenarios branch 3 times, most recently from ccaa33f to bc5992e Compare September 20, 2024 13:53

marios reviewed Sep 24, 2024

View reviewed changes

scenarios/hci.yaml Outdated Show resolved Hide resolved

cescgina force-pushed the add_osp17_1_scenarios branch from bc5992e to ad4b5ff Compare September 25, 2024 11:24

cescgina force-pushed the add_osp17_1_scenarios branch 2 times, most recently from 1e2a587 to 589a436 Compare September 28, 2024 14:05

cescgina mentioned this pull request Sep 30, 2024

Add role to deploy 17.1 env for adoption openstack-k8s-operators/ci-framework#2297

Merged

cescgina force-pushed the add_osp17_1_scenarios branch from 589a436 to fdc98a5 Compare September 30, 2024 10:07

cescgina changed the title ~~[POC] Add scenarios for 17.1 env for adoption~~ Add scenarios for 17.1 env for adoption Sep 30, 2024

cescgina marked this pull request as ready for review September 30, 2024 10:08

marios reviewed Oct 1, 2024

View reviewed changes

cescgina force-pushed the add_osp17_1_scenarios branch from fdc98a5 to 7b99023 Compare October 1, 2024 11:16

cescgina force-pushed the add_osp17_1_scenarios branch 3 times, most recently from 23d2024 to e908b21 Compare October 2, 2024 08:55

jistr approved these changes Oct 3, 2024

View reviewed changes

cescgina force-pushed the add_osp17_1_scenarios branch from e908b21 to 8b1256d Compare October 3, 2024 16:03

marios reviewed Oct 4, 2024

View reviewed changes

cescgina force-pushed the add_osp17_1_scenarios branch from 8b1256d to 7944a2f Compare October 4, 2024 07:15

marios reviewed Oct 7, 2024

View reviewed changes

cescgina force-pushed the add_osp17_1_scenarios branch from 7944a2f to 0715225 Compare October 7, 2024 09:07

ciecierski approved these changes Oct 7, 2024

View reviewed changes

cescgina force-pushed the add_osp17_1_scenarios branch from 0715225 to 6afb8d5 Compare October 7, 2024 11:06

jistr approved these changes Oct 8, 2024

View reviewed changes

jistr merged commit 1a2fb79 into openstack-k8s-operators:main Oct 8, 2024
2 checks passed

cescgina deleted the add_osp17_1_scenarios branch October 8, 2024 16:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add scenarios for 17.1 env for adoption #597

Add scenarios for 17.1 env for adoption #597

cescgina commented Sep 10, 2024 •

edited

Loading

bogdando Sep 11, 2024

jistr commented Sep 13, 2024

softwarefactory-project-zuul bot commented Sep 20, 2024

softwarefactory-project-zuul bot commented Sep 25, 2024

softwarefactory-project-zuul bot commented Sep 30, 2024

cescgina commented Sep 30, 2024

marios Oct 1, 2024

cescgina Oct 1, 2024

fmount Oct 3, 2024

cescgina Oct 3, 2024

fmount Oct 3, 2024

cescgina Oct 3, 2024

softwarefactory-project-zuul bot commented Oct 1, 2024

softwarefactory-project-zuul bot commented Oct 2, 2024

slawqo commented Oct 3, 2024

jistr left a comment

jistr Oct 3, 2024

cescgina Oct 3, 2024

bogdando Oct 4, 2024

cescgina Oct 4, 2024

cescgina Oct 7, 2024

jistr Oct 3, 2024

cescgina Oct 3, 2024

jistr Oct 3, 2024

marios Oct 4, 2024

cescgina Oct 4, 2024

softwarefactory-project-zuul bot commented Oct 4, 2024

marios Oct 7, 2024

cescgina Oct 7, 2024

ciecierski left a comment

jistr commented Oct 8, 2024

cescgina commented Oct 8, 2024

Add scenarios for 17.1 env for adoption #597

Add scenarios for 17.1 env for adoption #597

Conversation

cescgina commented Sep 10, 2024 • edited Loading

Choose a reason for hiding this comment

jistr commented Sep 13, 2024

softwarefactory-project-zuul bot commented Sep 20, 2024

softwarefactory-project-zuul bot commented Sep 25, 2024

softwarefactory-project-zuul bot commented Sep 30, 2024

cescgina commented Sep 30, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

softwarefactory-project-zuul bot commented Oct 1, 2024

softwarefactory-project-zuul bot commented Oct 2, 2024

slawqo commented Oct 3, 2024

jistr left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

softwarefactory-project-zuul bot commented Oct 4, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ciecierski left a comment

Choose a reason for hiding this comment

jistr commented Oct 8, 2024

cescgina commented Oct 8, 2024

cescgina commented Sep 10, 2024 •

edited

Loading