-
Notifications
You must be signed in to change notification settings - Fork 84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce DCN DT #392
Introduce DCN DT #392
Conversation
Hi @sbekkerm. Thanks for your PR. I'm waiting for a openstack-k8s-operators member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
@sbekkerm I see two changes needed up front.
The readme files are incomplete. Please see the the four stage readme's for VA HCI: https://github.com/openstack-k8s-operators/architecture/tree/main/examples/va/hci#stages It contains English instructions that someone can read to implement the VA without ci-framework and using only the produced k8s manifests. If there are external automations for them, that's fine but I should be able to read the directions and reproduce your work so that we can have independent verification. Right now it looks like the the VA1 directions are still there and not updated. In my early example someone could read my directions and get a full deployment (and the https://github.com/fultonj/dcn?tab=readme-ov-file#steps No code should be required to implement what I'm talking about for this request. Just written instructions.
Would you please change this so that it puts the added files into the |
The main difference between VA and HCI is in the values.yaml and service-values.yaml files. For example, the nncp values.yaml contains the configuration necessary for spine and leaf: https://github.com/sbekkerm/architecture/blob/dcn/examples/va/dcn/control-plane/nncp/values.yaml#L18 and the post-ceph service-values.yaml contains Glance Multi Store configuration:
|
Yes, I understand but I still need clear written instructions so I (or anyone else) can reproduce. I want to collaborate with you on this and reproduce the results in my environment so I can find and fix bugs. I think the READMEs are missing too much and filling them in will help other engineers and the docs team. From a very high level it's so we can have https://en.wikipedia.org/wiki/Reproducibility
https://github.com/openstack-k8s-operators/architecture/blob/main/examples/dt/README.md I don't think this should be an update to an existing DT, it should be a new DT, but it's a DT, not a VA. I wouldn't want to hand the field the Jinja2 files. This is something we do for our CI but not yet ready to be a full blown VA we could hand to someone in the field. Maybe it can evolve into a VA in the future. For now, in order to merge what you have I think it should be a DT. |
The README contains all the steps to reproduce the environment. Could you please clarify what specifically is unclear?
The Jinja templates are not involved in the deployment process, as they are only used by CI to set the "CHANGEME" parameters in values.yaml and service-valiues.yaml |
Please move this from the I am putting it in my backlog to go through your README line by line and attempt to reproduce what you have deployed and I'll ask you clarifying questions along the way which will point out what is incomplete in the READMEs. |
Moved from |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am suggesting some changes based on what we had in 17.1 openstack services configs and/or to make tempest tests pass (I've managed to make compute and volume tempest suites to pass all tests with the proposed changes which I applied manually)
@sbekkerm Could this be rebased? On Thursday Oct 3rd I'll do a deployment to test this proposed patch and leave feedback (sorry for the delay). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One more suggestion for the CR
In my view, it's a great suggestion. This feature will be beneficial in cases where AZ sites contain many compute nodes. Additionally, we can use this when we spread the control plane across AZs. |
I agree. My understanding is that there is agreement and that we can go ahead with using multicell in this DT. That said there might be people who don't use multicell but I still think we should test this with multicell. If there was a way to run the resultant job with our without multicell by flipping one switch that would be really nice but that might be too much to ask in this first edition. |
@krcmarik do you want to try that in a stacked patch. i.e. create a new PR which has been rebased on this one so that we can merge this one sooner. |
I am not a fan of the introduction of jinja2 templates - and the custom playbook to execute them. |
@hjensas It’s not just about handling multiple nodesets with spine and leaf—it’s about managing multiple sites, each with its own configuration for services like Glance and Cinder. Plus, each site needs its own Ceph cluster, which adds another layer of complexity. That’s why we went with Jinja2 templates and a custom playbook for this. But I do agree with you that eventually integrating this into the ci-framework would be ideal |
These changes have been addressesd. |
rbd_cluster_name = az2 | ||
backend_availability_zone = az2 | ||
glance: | ||
customServiceConfig: | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This customServiceConfig
at glance
level seems redundant to me.
Given you're defining the backend for each API under glanceAPIs
(including az0), you can basically remove from L69 to L79 [1].
You inherit the top-level customServiceConfig
only if it's not defined for that specific instance, and this does not seem the case.
@fultonj do you want to update the glance config?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically it doesn't block your deployment because as per the link I mentioned earlier the right config ends up being on the right instance, but I think this is simply redundant and should be patched.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree but per our discussion I will submit a follow up patch to address this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then I'm ok to move forward with this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm +2 on merging this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/approve
/lgtm
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: abays, fmount, fultonj, sbekkerm The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Build succeeded (gate pipeline). ✔️ noop SUCCESS in 0s |
2b53bd2
into
openstack-k8s-operators:main
/cherrypick 18.0-fr1 |
@fultonj: new pull request created: #426 In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
To clarify, we have cells, but we're not using cells per site. I think we should address that in a follow up patch. |
This PR introduces DCN VA, which builds upon the HCI VA architecture and is designed for multi-site deployment.
In addition to the regular configuration files, this PR includes Jinja templates for generating values.yaml and service-values.yaml files. These templates are essential for Zuul job execution, allowing for the creation of site-specific configuration files multiple times for each DCN site.