feat: dedicated hosts selectors #4553

preflightsiren · 2023-09-04T06:28:12Z

Description
Wanted to create a draft PR to start a conversation about the approach. I believe following https://karpenter.sh/v0.27.3/contributing/design-guide/ that adding to the AWSNodeTemplate best follows "Identify an opinionated default that solves the majority of use cases."
Of course I'd appreciate any and all help to mould this design.

This design focuses on a simplified configuration, just the minimum needed to use host resource groups.

apiVersion: karpenter.k8s.aws/v1alpha1
kind: AWSNodeTemplate
metadata:
  name: default
spec:
  subnetSelector: { ... }        # required, discovers tagged subnets to attach to instances
  securityGroupSelector: { ... } # required, discovers tagged security groups to attach to instances
  licenseSelector: 
    name: myLicense
  hostResourceGroup: 
    name: my-hrg-name

How was this change tested?

Does this change impact docs?

Yes, PR includes docs updates
Yes, issue opened: #
No

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

netlify · 2023-09-04T06:28:17Z

✅ Deploy Preview for karpenter-docs-prod ready!

Name	Link
🔨 Latest commit	`9fe0f2f`
🔍 Latest deploy log	https://app.netlify.com/sites/karpenter-docs-prod/deploys/6553feb9f0f43700085e5771
😎 Deploy Preview	https://deploy-preview-4553--karpenter-docs-prod.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

preflightsiren · 2023-09-04T06:30:46Z

Created this to split away from #3859 since the approach has significantly changed from when it was first opened.

designs/dedicatedhosts.md

preflightsiren · 2023-09-15T07:20:42Z

Functionality wise this change is pretty complete, it's all wired up through the nodeclass controller, and extends the amifamily resolver to dynamicaly generate the Launch Template.

Todo:

update the website docs
fix/review the code comments
clean up any unused references to licenProvider
Add placement group provider

pkg/providers/license/license.go

pkg/utils/nodeclass/nodeclass.go

website/content/en/preview/concepts/node-templates.md

jonathan-innis

High-level question: We are thinking of supporting only v1beta1 and dropping support for v1alpha5 in the near-term. Does it make sense to just release these fields only under the v1beta1 API, marking them as alpha fields so that we don't have to reason about the conversion logic in this PR. Up to this point, we've just been handling conversion logic for existing API and trying to push new features into the soon-to-be released beta API

pkg/apis/v1alpha1/provider.go

pkg/apis/v1alpha1/awsnodetemplate.go

pkg/apis/v1beta1/ec2nodeclass.go

pkg/providers/launchtemplate/launchtemplate.go

pkg/providers/hostresourcegroup/hostresourcegroup.go

pkg/apis/v1beta1/ec2nodeclass.go

pkg/providers/amifamily/resolver.go

preflightsiren · 2023-09-21T05:08:52Z

Ok I think this PR is now essentially complete or very near to, and is ready for a thorough review.
I have addressed all the feedback so far and re-tested in our test clusters to ensure the selector logic works as expected.
I have added some test cases around the 3 different providers, updating all the test infrastructure to load fake APIs for the aws services, and plumb them into the providers should anyone want to expand the test usage. I have also added some sanity checks into the utils v1alpha1 <-> v1beta1 conversion to ensure they'll maintain compatibility.

jonathan-innis · 2023-09-21T08:01:44Z

@preflightsiren Did you get a chance to see this comment here on the v1beta1 API? I think it could save you a lot of conversion logic if you just supported the single, newer API

High-level question: We are thinking of supporting only v1beta1 and dropping support for v1alpha5 in the near-term. Does it make sense to just release these fields only under the v1beta1 API, marking them as alpha fields so that we don't have to reason about the conversion logic in this PR. Up to this point, we've just been handling conversion logic for existing API and trying to push new features into the soon-to-be released beta API

preflightsiren · 2023-09-21T23:40:02Z

@preflightsiren Did you get a chance to see this comment here on the v1beta1 API? I think it could save you a lot of conversion logic if you just supported the single, newer API

High-level question: We are thinking of supporting only v1beta1 and dropping support for v1alpha5 in the near-term. Does it make sense to just release these fields only under the v1beta1 API, marking them as alpha fields so that we don't have to reason about the conversion logic in this PR. Up to this point, we've just been handling conversion logic for existing API and trying to push new features into the soon-to-be released beta API

Thanks Jon, I must have only replied in my mind and not in github! reposting what I commented in slack:

My team currently uses the v1alpha1.AWSNodeTemplate resources today, and would require an uplift to EC2NodeClass.
The conversion logic is already implemented, is there value to removing it ahead of the v1alpha1 deprecations

preflightsiren · 2023-10-09T04:40:21Z

As per our conversation: I've removed the v1alpha1 and conversion code for awsnodetemplates, all the placement selectors and status fields only apply to the v1beta1.ec2nodeclass resource.

I've documented the fields in the ec2nodeclass documents in preview

…osts-selectors

jonathan-innis

/karpenter snapshot

…osts-selectors

preflightsiren · 2023-10-11T03:20:54Z

sorry @jonathan-innis, don't know how I missed that amongst all the make presubmit checks performed :(

…osts-selectors

pkg/apis/v1beta1/ec2nodeclass_status.go

jonathan-innis · 2023-10-26T06:46:49Z

pkg/providers/amifamily/resolver.go

@@ -174,6 +193,25 @@ func (r Resolver) Resolve(ctx context.Context, nodeClass *v1beta1.EC2NodeClass,
 	return resolvedTemplates, nil
 }

+func (r Resolver) resolvePlacement(nodeClass *v1beta1.EC2NodeClass) *Placement {
+	var placement *Placement
+	hrg := nodeClass.Status.HostResourceGroup


Since this is an asynchronous sync, how do we handle it if the current values in the spec haven't resolved to the status? Are we alright tolerating a bit of eventual consistency here or should we handle cases where the spec generation doesn't line up with the observed generation in the status? Thoughts?

I guess it all depends on how eventually consistent the system will be. I don't believe it's an issue unless the consistency goes beyond the observed behaviour (reconciled in seconds).

If only 1 property is set, provisioning will error and the reconciliation loop with start afresh.

I believe this is all a binary operation that will fail or succeed, and far less likely to partially succeed.

I don't believe it's an issue unless the consistency goes beyond the observed behaviour

It's more of an issue about correctly handling the failure scenarios when we don't resolve the host resource group details for whatever reason. We now shouldn't launch instances since we aren't going to accurately represent a user's desired state for the instance but we haven't done any checking in our resolve path to ensure that's the case.

I think it's quite a bit safer to just wholesale transfer over to using status across all of our resolved fields, where we intentionally think over how we are going to handle this eventually consistent resolution of the status consistently across all fields.

jonathan-innis · 2023-10-26T06:48:39Z

pkg/providers/amifamily/resolver.go

@@ -161,6 +178,8 @@ func (r Resolver) Resolve(ctx context.Context, nodeClass *v1beta1.EC2NodeClass,
 				DetailedMonitoring:  aws.BoolValue(nodeClass.Spec.DetailedMonitoring),
 				AMIID:               amiID,
 				InstanceTypes:       instanceTypes,
+				Licenses:            nodeClass.Status.Licenses,


Same question here on the licenses. I like that we are moving to using the resolved status information in this launch path (I think we want to go this direction long-term) but I also want to think about how we handle the eventual consistency problem here when we don't have a sync path to get this information. If you wanted a shorter path forward, you could just re-get the information here. The info is already cached in memory, so the expense of calling out to the provider again is minimal

I'm happy to leave this eventually consistent; but I'm happy to take input on this direction.

I think we should probably avoid using the status detail here for now until we thoroughly test using the status details or have a way for representing and checking the version of the spec that we have observed in the status (e.g. observedGeneration). Better for now to just re-get the info based on whatever the current spec is.

jonathan-innis · 2023-10-26T06:49:22Z

pkg/providers/launchtemplate/launchtemplate.go

@@ -260,6 +260,8 @@ func (p *Provider) createLaunchTemplate(ctx context.Context, options *amifamily.
 			TagSpecifications: []*ec2.LaunchTemplateTagSpecificationRequest{


Is there a need to add some testing to the launch template suite_test.go to handle these new parameters and make sure that everything resolves properly?

jonathan-innis · 2023-10-26T06:49:59Z

website/content/en/preview/concepts/nodeclasses.md

+  - arn:aws:license-manager:us-west-2:111122223333:license/lic-123456789
+  hostResourceGroup:
+    name: my-hrg
+    arn:aws:resource-groups:us-west-2:111122223333:group/my-hrg


Should we just store the ARN in the host resource group in the same way that we are storing it with the other info?

jonathan-innis · 2023-10-26T06:56:51Z

pkg/providers/placementgroup/placementgroup.go

+		logging.FromContext(ctx).Errorf("discovering placement groups, %w", err)
+		return nil, err
+	}
+	for i := range output.PlacementGroups {


Is it possible for the selector to resolve into multiple Placement Groups? Is there a reason that you would want to resolve into multiple placement groups?

not sure I understand your question. here: we're iterating through the list of placementgroups that is the output of the api response. We can only use a single Placement Group in any of our LaunchTemplate configurations, so we resolve this list down to a single placement group.

Right, but the question was more if the API should be represented as "terms" or if it need only be represented as a selector. Is there a need to have the flexibility to select between two placement groups in the API? From what you are saying, I don't think so. We may only need a placementGroupSelector if we don't need the terms component

jonathan-innis · 2023-10-26T06:58:50Z

pkg/apis/v1beta1/ec2nodeclass.go

@@ -52,6 +52,15 @@ type EC2NodeClassSpec struct {
 	// +kubebuilder:validation:Enum:={AL2,Bottlerocket,Ubuntu,Custom,Windows2019,Windows2022}
 	// +required
 	AMIFamily *string `json:"amiFamily"`
+	// LicenseSelectorTerms is a list of LicenseSelectors. The terms are ORed.
+	// +optional
+	LicenseSelectorTerms []LicenseSelectorTerm `json:"licenseSelectorTerms,omitempty" hash:"ignore"`


Quick high-level question: Which of these things actually need to be terms? Is there a need for someone to specify multiple of any of these things and have all of the options be passed through or us to somehow order and pick one based on some heuristic?

If there's not a reason that we should support more than one resolution of these things per-EC2NodeClass, it may be worth having this be a pure selector, no sets of terms that are ORed

jonathan-innis · 2023-10-26T06:59:24Z

pkg/providers/placementgroup/placementgroup.go

+			}
+		}
+	}
+	if p.cm.HasChanged(fmt.Sprintf("placementgroup/%t/%s", nodeClass.IsNodeTemplate, nodeClass.Name), match) {


Suggested change

if p.cm.HasChanged(fmt.Sprintf("placementgroup/%t/%s", nodeClass.IsNodeTemplate, nodeClass.Name), match) {

if p.cm.HasChanged(fmt.Sprintf("placementgroup/%s", nodeClass.IsNodeTemplate, nodeClass.Name), match) {

Same comment. If only for beta, no need to have the boolean

jonathan-innis · 2023-10-26T06:59:56Z

pkg/providers/placementgroup/placementgroup.go

+
+	var match *ec2.PlacementGroup
+	// Look up all ec2 placement groups
+	output, err := p.ec2api.DescribePlacementGroupsWithContext(ctx, &ec2.DescribePlacementGroupsInput{})


How big is this output? Is there a pages version so that we don't have to handle the massive output?

I can't imagine it being large, and perhaps the API owners felt the same, hence no paged function - https://docs.aws.amazon.com/sdk-for-go/api/service/ec2/#EC2.DescribePlacementGroups

jonathan-innis · 2023-10-26T07:00:25Z

pkg/providers/license/license.go

+
+	var licenses []string
+	// Look up all License Configurations
+	output, err := p.licensemanager.ListLicenseConfigurationsWithContext(ctx, &licensemanager.ListLicenseConfigurationsInput{})


How big is this output? Is there a pages version that we can use to reduce the memory overhead?

typically small - there's no paged option https://docs.aws.amazon.com/sdk-for-go/api/service/licensemanager/#LicenseManager.ListLicenseConfigurationsWithContext

…osts-selectors

jonathan-innis · 2023-11-27T04:05:59Z

pkg/apis/crds/karpenter.k8s.aws_ec2nodeclasses.yaml

+                      type: string
+                  type: object
+                type: array
+              licenseSelectorTerms:


Digging a little deeper into the docs, trying to understand why the license configuration piece is needed. Can you achieve dedicated hosts by associating license configurations with the AMIs that you are planning to launch? From what I can tell, if the AMI has an associated license configuration, you may not have a need to specify the configuration directly.

there's two options:

Associate it with each AMI

Specify it in launch template

In some cases it makes sense to associate with AMIs, in others it does not, such as when the an AMI is used both with & without license rules or where doing so just adds unnecessary workflow steps.

What's the use-case for an AMI being used without license rules? When you are bringing in software at runtime that will have a license attached to it? Why not bake it into the AMI in that case?

so dedicated host resource groups are a peculiar thing. They are implemented within license manager, but they don't have to have anything to do with software licenses https://docs.aws.amazon.com/license-manager/latest/userguide/host-resource-groups.html

You can use host resource groups to separate hosts by purpose, for example, development test hosts versus production, organizational unit, or license constraint.

We use them to get a dynamically allocated pool of dedicated hosts that can be shared across accounts.

jonathan-innis · 2023-11-27T04:13:22Z

pkg/providers/placementgroup/placementgroup.go

+
+	var match *ec2.PlacementGroup
+	// Look up all ec2 placement groups
+	output, err := p.ec2api.DescribePlacementGroupsWithContext(ctx, &ec2.DescribePlacementGroupsInput{})


Looks like there are filters that we can pass into the placement group: https://docs.aws.amazon.com/cli/latest/reference/ec2/describe-placement-groups.html. We may want to use this instead of getting all of the placement groups and filtering on the client-side.

preflightsiren · 2023-11-27T08:51:45Z

After reviewing the work needed to finish this feature, I can't justify it. I'm happy for others to build off its current state.

lorrrrrrrenzo · 2023-12-07T12:03:27Z

Hello @jonathan-innis, would you support a PR that specifically dealt with Placement Groups aspect of the work here? I'm willing to pick up a subset of the work done by @preflightsiren here, but cannot justify working on the full set of features it is trying to add.

Thanks.

preflightsiren requested a review from a team as a code owner September 4, 2023 06:28

preflightsiren requested a review from jmdeal September 4, 2023 06:28

preflightsiren mentioned this pull request Sep 4, 2023

feat: Dedicated host api structure #3859

Closed

3 tasks

preflightsiren force-pushed the feat-3182-dedicated-hosts-selectors branch from 0a56ee8 to 212e13d Compare September 4, 2023 06:32

jonathan-innis reviewed Sep 6, 2023

View reviewed changes

designs/dedicatedhosts.md Show resolved Hide resolved

preflightsiren force-pushed the feat-3182-dedicated-hosts-selectors branch 2 times, most recently from cc3a842 to 3754d8e Compare September 15, 2023 00:44

preflightsiren force-pushed the feat-3182-dedicated-hosts-selectors branch from 3754d8e to 98ebfb1 Compare September 18, 2023 01:26

preflightsiren commented Sep 18, 2023

View reviewed changes

pkg/providers/license/license.go Outdated Show resolved Hide resolved

preflightsiren commented Sep 18, 2023

View reviewed changes

pkg/providers/license/license.go Outdated Show resolved Hide resolved

preflightsiren commented Sep 18, 2023

View reviewed changes

pkg/utils/nodeclass/nodeclass.go Outdated Show resolved Hide resolved

preflightsiren commented Sep 18, 2023

View reviewed changes

website/content/en/preview/concepts/node-templates.md Outdated Show resolved Hide resolved

jonathan-innis reviewed Sep 18, 2023

View reviewed changes

preflightsiren force-pushed the feat-3182-dedicated-hosts-selectors branch from 7eb2915 to 67d0ba7 Compare September 19, 2023 06:00

preflightsiren commented Sep 19, 2023

View reviewed changes

pkg/providers/amifamily/resolver.go Outdated Show resolved Hide resolved

preflightsiren force-pushed the feat-3182-dedicated-hosts-selectors branch 2 times, most recently from 963af8c to 062a0bc Compare September 21, 2023 01:42

Sebastian Cole added 7 commits September 22, 2023 09:41

Document the api changes

7f08088

Adding design doc

4bf9719

Adding fields to API

d91d401

Establish new license provider

57a7a27

More license provider scaffolding

0bda01c

Generated code

77bb9d3

Implement licenses selector

bf0ac90

Updating documentation

914e187

preflightsiren force-pushed the feat-3182-dedicated-hosts-selectors branch from 5b26af9 to 914e187 Compare October 9, 2023 00:50

jonathan-innis self-assigned this Oct 9, 2023

preflightsiren added 4 commits October 10, 2023 09:48

Merge remote-tracking branch 'origin/main' into feat-3182-dedicated-h…

284b157

…osts-selectors

Merge remote-tracking branch 'origin/main' into feat-3182-dedicated-h…

5975167

…osts-selectors

Merge remote-tracking branch 'origin/main' into feat-3182-dedicated-h…

c70d149

…osts-selectors

Merge remote-tracking branch 'origin/main' into feat-3182-dedicated-h…

b0ebfef

…osts-selectors

jonathan-innis reviewed Oct 10, 2023

View reviewed changes

preflightsiren added 2 commits October 11, 2023 14:14

Merge remote-tracking branch 'origin/main' into feat-3182-dedicated-h…

78efb9a

…osts-selectors

Commit missing generated functions

25d1e91

preflightsiren added 2 commits October 12, 2023 10:10

Merge remote-tracking branch 'origin/main' into feat-3182-dedicated-h…

48a0262

…osts-selectors

Fixes post merge

03ba9be

preflightsiren force-pushed the feat-3182-dedicated-hosts-selectors branch from f9788c7 to 03ba9be Compare October 20, 2023 08:07

Merge remote-tracking branch 'origin/main' into feat-3182-dedicated-h…

57c79ee

…osts-selectors

preflightsiren force-pushed the feat-3182-dedicated-hosts-selectors branch from 56d74e7 to 57c79ee Compare October 26, 2023 00:27

Merge remote-tracking branch 'origin/main' into feat-3182-dedicated-h…

f1e23d5

…osts-selectors

jonathan-innis reviewed Oct 26, 2023

View reviewed changes

preflightsiren added 5 commits November 3, 2023 11:42

Merge remote-tracking branch 'origin/main' into feat-3182-dedicated-h…

9e2e1bc

…osts-selectors

feedback: Remove HRG Name, use ARN only

85f0860

Merge remote-tracking branch 'origin/main' into feat-3182-dedicated-h…

9a23247

…osts-selectors

Merge remote-tracking branch 'origin/main' into feat-3182-dedicated-h…

5e0c322

…osts-selectors

Merge remote-tracking branch 'origin/main' into feat-3182-dedicated-h…

9fe0f2f

…osts-selectors

jonathan-innis reviewed Nov 27, 2023

View reviewed changes

preflightsiren closed this Nov 27, 2023

lorrrrrrrenzo mentioned this pull request Dec 12, 2023

feat: Placement Group Support #5307

Closed

3 tasks

aws locked and limited conversation to collaborators Oct 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: dedicated hosts selectors #4553

feat: dedicated hosts selectors #4553

preflightsiren commented Sep 4, 2023 •

edited by jonathan-innis

Loading

netlify bot commented Sep 4, 2023 •

edited

Loading

preflightsiren commented Sep 4, 2023

preflightsiren commented Sep 15, 2023

jonathan-innis left a comment

preflightsiren commented Sep 21, 2023

jonathan-innis commented Sep 21, 2023

preflightsiren commented Sep 21, 2023

preflightsiren commented Oct 9, 2023

jonathan-innis left a comment

preflightsiren commented Oct 11, 2023

jonathan-innis Oct 26, 2023

preflightsiren Oct 30, 2023

jonathan-innis Nov 27, 2023

jonathan-innis Oct 26, 2023

preflightsiren Oct 30, 2023

jonathan-innis Nov 27, 2023

jonathan-innis Oct 26, 2023

jonathan-innis Oct 26, 2023

jonathan-innis Oct 26, 2023

preflightsiren Oct 30, 2023

jonathan-innis Nov 27, 2023

jonathan-innis Oct 26, 2023

jonathan-innis Oct 26, 2023

jonathan-innis Oct 26, 2023

preflightsiren Oct 30, 2023

jonathan-innis Oct 26, 2023

preflightsiren Nov 1, 2023

jonathan-innis Nov 27, 2023

cspargo Nov 27, 2023

jonathan-innis Nov 27, 2023

cspargo Nov 27, 2023

jonathan-innis Nov 27, 2023

preflightsiren commented Nov 27, 2023

lorrrrrrrenzo commented Dec 7, 2023

		@@ -260,6 +260,8 @@ func (p Provider) createLaunchTemplate(ctx context.Context, options amifamily.
		TagSpecifications: []*ec2.LaunchTemplateTagSpecificationRequest{

	if p.cm.HasChanged(fmt.Sprintf("placementgroup/%t/%s", nodeClass.IsNodeTemplate, nodeClass.Name), match) {
	if p.cm.HasChanged(fmt.Sprintf("placementgroup/%s", nodeClass.IsNodeTemplate, nodeClass.Name), match) {

feat: dedicated hosts selectors #4553

feat: dedicated hosts selectors #4553

Conversation

preflightsiren commented Sep 4, 2023 • edited by jonathan-innis Loading

netlify bot commented Sep 4, 2023 • edited Loading

✅ Deploy Preview for karpenter-docs-prod ready!

preflightsiren commented Sep 4, 2023

preflightsiren commented Sep 15, 2023

jonathan-innis left a comment

Choose a reason for hiding this comment

preflightsiren commented Sep 21, 2023

jonathan-innis commented Sep 21, 2023

preflightsiren commented Sep 21, 2023

preflightsiren commented Oct 9, 2023

jonathan-innis left a comment

Choose a reason for hiding this comment

preflightsiren commented Oct 11, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

preflightsiren commented Nov 27, 2023

lorrrrrrrenzo commented Dec 7, 2023

preflightsiren commented Sep 4, 2023 •

edited by jonathan-innis

Loading

netlify bot commented Sep 4, 2023 •

edited

Loading