-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add scenarios for 17.1 env for adoption #597
Add scenarios for 17.1 env for adoption #597
Conversation
227f09b
to
b81cee4
Compare
scenarios/hci.yaml
Outdated
osp-computes: "overcloud-computehci" | ||
osp-controllers: "overcloud-controller" | ||
overcloud: | ||
stackname: "overcloud" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks great, and allows multi-cell / dcn deployments in the future!
👍 |
ccaa33f
to
bc5992e
Compare
Build failed (check pipeline). Post https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/e04c2741d5604ca78867359472a4e21d ✔️ adoption-standalone-to-crc-ceph SUCCESS in 2h 33m 31s |
bc5992e
to
ad4b5ff
Compare
Build failed (check pipeline). Post https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/a9cbacb4f4ee4665a931c153b5d3d154 ❌ adoption-standalone-to-crc-ceph TIMED_OUT in 4h 16m 12s |
1e2a587
to
589a436
Compare
Add a new role that will deploy a tripleo environment that will serve as source for adoption. This role is expected to cosume the infra created by [1], and a 17.1 scenario definition from the data-plane-adoption repo, introduced by [2]. It also introduce a small fix to the deploy-ocp.yml so the resulting ocp cluster is ready (the nodes needed to be uncordoned). [1] #2285 [2] openstack-k8s-operators/data-plane-adoption#597
Add a new role that will deploy a tripleo environment that will serve as source for adoption. This role is expected to cosume the infra created by [1], and a 17.1 scenario definition from the data-plane-adoption repo, introduced by [2]. It also introduce a small fix to the deploy-ocp.yml so the resulting ocp cluster is ready (the nodes needed to be uncordoned). [1] #2285 [2] openstack-k8s-operators/data-plane-adoption#597
589a436
to
fdc98a5
Compare
Add a new role that will deploy a tripleo environment that will serve as source for adoption. This role is expected to cosume the infra created by [1], and a 17.1 scenario definition from the data-plane-adoption repo, introduced by [2]. It also introduce a small fix to the deploy-ocp.yml so the resulting ocp cluster is ready (the nodes needed to be uncordoned). [1] #2285 [2] openstack-k8s-operators/data-plane-adoption#597
Build failed (check pipeline). Post https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/4037787d89be4b339a782b7567e1ca68 ❌ adoption-standalone-to-crc-ceph FAILURE in 2h 34m 28s |
recheck |
scenarios/hci/network_data.yaml
Outdated
mtu: 1500 | ||
vip: true | ||
name_lower: storage | ||
dns_domain: storage.mydomain.tld. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for the TLS support (eventually) we'll need to jinjafy this for the domain
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was working under this assumption that this files should be entirely static, i.e. there would be a 1:1 mapping of scenario and configuration. But your comment make we wonder, will we have a tlse and a non-tlse version of some scenario, or will we have two different scenarios, one for each case? I would go for the latter, if need to change the setup, we should create a new scenario, even if we incur in some duplication
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cescgina the problem I see is that TLS-e is enabled by default in the topology you land (unigamma). You can choose to disable it in the ctlplane/dataplane CRs, or make it configurable.
I don't think we want to duplicate the entire configuration to have something like hci-tls
and hci-no-tls
, but making it a j2 parameter here means we can process it at ci-framework level where we consume those variables and generated pass it as an extravar to the playbook.
It seems a little effort compared to the complexity of maintaining two scenarios for the same HCI topology.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indeed templating this is simple enough from the ci-framework side, though I'm wary of adding too much complexity here, to avoid ending up with something similar to what we had in install_yamls devsetup, I'll give it a try. The important question here I think is, will we have variations in the scenarios like this, i.e a hci-tls
and hci-no-tls
? I worked under the assumption that here we would target a small subset of well defined configurations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we have too many variations: VAs/DTs are supposed to be already an "opinionated" combination of components that reflects what customers use, so I don't expect much difference other than things like tls. We can keep this bit also for a follow up, so we can do good choices when we already have a stable codebase to work on. My comment is more like: let's avoid duplication at this stage, and we can consolidate scenarios in a follow up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack, thanks for the clarification, as said, the change is simple enough, so I expect I'll be able to updated both PRs with it by the end of day, thanks for the review, it's really useful to clear this points now
fdc98a5
to
7b99023
Compare
Build failed (check pipeline). Post https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/86fec2e0b2f24a13b1234d474fd2cfd1 ✔️ adoption-standalone-to-crc-ceph SUCCESS in 2h 33m 50s |
23d2024
to
e908b21
Compare
Add a new role that will deploy a tripleo environment that will serve as source for adoption. This role is expected to cosume the infra created by [1], and a 17.1 scenario definition from the data-plane-adoption repo, introduced by [2]. It also introduce a small fix to the deploy-ocp.yml so the resulting ocp cluster is ready (the nodes needed to be uncordoned). [1] #2285 [2] openstack-k8s-operators/data-plane-adoption#597
Build failed (check pipeline). Post https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/ee6667016a734dfa9a6fecc8961761fb ❌ adoption-standalone-to-crc-ceph TIMED_OUT in 4h 13m 50s |
Add a new role that will deploy a tripleo environment that will serve as source for adoption. This role is expected to cosume the infra created by [1], and a 17.1 scenario definition from the data-plane-adoption repo, introduced by [2]. It also introduce a small fix to the deploy-ocp.yml so the resulting ocp cluster is ready (the nodes needed to be uncordoned). [1] #2285 [2] openstack-k8s-operators/data-plane-adoption#597
I am definitely not Tripleo expert here but this looks good for me in general. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you Joan for putting this together i think it is very good. Now that i was able to think through the whole thing in combo with the other patch, i put in some very "futursitic concerns" as comments but they are not a -1, just some food for thought. LGTM thanks :).
- `config_download_file`: path to the config-download file used to pass | ||
environment variables to the overcloud, required. | ||
- `ceph_osd_spec_file`: path to the osd spec file used to deploy ceph when | ||
applicable, optional. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if we'll have use cases with multiple overcloud stacks. IIRC the DCN topology is deployed in TripleO via multiple stacks. I'm not sure if it would be better to have overclouds
or a single overcloud
and we would later add extra_overclouds
or something like that. Even in the DCN scenario i think one of the overcloud stacks is "primary" where the control plane is hosted, and the other stacks are additive with computes only. So a design with a single primary overcloud
stack could perhaps be the right thing to start with anyway.
I should add a disclaimer that this area is somewhat outside my comfort zone so it's possible that i'm talking nonsense.
Will tag @bogdando and @gibizer in case they have a clearer idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point, I think it's "relatively" easy to change the overcloud definition to be a list of overclouds with the same parameters. I can give it a try locally, but waiting for more details on what it would be needed before pushing anything
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right, you can as well just call them stack or stacks, omitting overcloud terminology.
The env files used for each stack deployment, alongside the pre and post execution hooks, must be instantiated as well. Let them holding only static values for the beginning, no templating yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack thanks @bogdando I'm preparing an update in this line, I'll update the PR once I've tested it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed the overcloud
section to a stacks
one, that should support deploying multiple stacks (each has its own network_data, config-download, etc.)
scenarios/hci.yaml
Outdated
- "/usr/share/openstack-tripleo-heat-templates/environments/manila-cephfsnative-config.yaml" | ||
- "/usr/share/openstack-tripleo-heat-templates/environments/enable-legacy-telemetry.yaml" | ||
- "/usr/share/openstack-tripleo-heat-templates/environments/cephadm/cephadm.yaml" | ||
- "/usr/share/openstack-tripleo-heat-templates/environments/cephadm/ceph-mds.yaml" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wondered if we should get rid of args
and vars
and just put the whole deploy command as a variable to set. I mean, will the deployment command always be in the format that is defined in the other patch:
Same for the ceph deploy command, should we just export that out in full as a variable rather than have it in the other patch:
In general i tend to be more in favor of exporting larger command chunks as variables perhaps due to being burned by Infrared in the past, where it tried to be very smart and compose the commands for us from various pieces of info, but we almost always ended up fighting it, trying to "trick it" to produce the commands we needed. So its "smartness" quite often stood in the way rather than helped.
I realize there is more complexity to this though, e.g. some of the overcloud deploy input files are generated inside the deployment role rather than provided from the scenario here, so what i'm suggesting above is perhaps easier said then done.
Maybe it would be best to stay the course and just have this as food for thought when designing/amending the overcloud deploy
, ceph deploy
and similar commands, that when we hardcode the structure for them, we should be hardcoding something quite un-opinionated.
As a last resort, we can introduce an override mechanism later in the form "If this variable is specified, use it as overcloud deploy
command. If it is not specified, use the command hardcoded in the deployment role."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see what you mean, but as you mentioned dealing with the files generated dynamically will be tricky, I like your last suggestion though, to have an optional argument that'll allow the user to override the whole command, I'll add that and update both patches after testing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, i think it could be a separate PR later even, or done on a "when we need it" basis, since IIUC it wouldn't do breaking changes to the interface anyway. Up to you :).
e908b21
to
8b1256d
Compare
scenarios/hci/network_data.yaml.j2
Outdated
mtu: 1500 | ||
vip: true | ||
name_lower: storage | ||
dns_domain: storage.{{ cloud_domain }}.tld. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove the .tld bit too but leave the trailing '.'
storage.{{ cloud_domain }}.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch, fixed, thanks!
8b1256d
to
7944a2f
Compare
Build failed (check pipeline). Post https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/82a14b9b5957476a9a6a99e0fd217aa6 ✔️ adoption-standalone-to-crc-ceph SUCCESS in 2h 34m 23s |
scenarios/hci/network_data.yaml.j2
Outdated
service_net_map_replace: external | ||
subnets: | ||
external_subnet: | ||
gateway_ip: '172.21.0.1' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just adding a note about what we discussed on friday; remove this one and add the External network to the controllers (otherwise tls setup fails during deployment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done, will run a new test and upload the PR
7944a2f
to
0715225
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good to me
Introduce an scenarios folder that will contain the needed input to deploy a 17.1 environment using the cifmw role added in [1]. The scenario is defined by a variable file, with undercloud specific parameters, overcloud specific parameters, hooks that can be called before or after both the undercloud and overcloud deployment, and two maps that relate the groups in inventory produced by the playbook that created the infra, to Roles and roles hostnames, to make it easier to work with different roles in different scenarios. [1] openstack-k8s-operators/ci-framework#2297
0715225
to
6afb8d5
Compare
Add a new role that will deploy a tripleo environment that will serve as source for adoption. This role is expected to cosume the infra created by [1], and a 17.1 scenario definition from the data-plane-adoption repo, introduced by [2]. It also introduce a small fix to the deploy-ocp.yml so the resulting ocp cluster is ready (the nodes needed to be uncordoned). [1] #2285 [2] openstack-k8s-operators/data-plane-adoption#597
@cescgina Ok to merge? |
I'd say yes, I think this is a good first version |
Add a new role that will deploy a tripleo environment that will serve as source for adoption. This role is expected to cosume the infra created by [1], and a 17.1 scenario definition from the data-plane-adoption repo, introduced by [2]. It also introduce a small fix to the deploy-ocp.yml so the resulting ocp cluster is ready (the nodes needed to be uncordoned). [1] #2285 [2] openstack-k8s-operators/data-plane-adoption#597
Introduce a scenarios folder that will contain the needed input to
deploy a 17.1 environment using the cifmw role added in [1]. The
scenario is defined by a variable file, with undercloud specific
parameters, overcloud specific parameters, hooks that can be called
before or after both the undercloud and overcloud deployment, and two
maps that relate the groups in inventory produced by the playbook that
created the infra, to Roles and roles hostnames, to make it easier to
work with different roles in different scenarios.
[1] openstack-k8s-operators/ci-framework#2297