Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: dedicated hosts selectors #4553

Closed
Show file tree
Hide file tree
Changes from 36 commits
Commits
Show all changes
47 commits
Select commit Hold shift + click to select a range
7f08088
Document the api changes
Sep 4, 2023
4bf9719
Adding design doc
Sep 4, 2023
d91d401
Adding fields to API
Sep 4, 2023
57a7a27
Establish new license provider
Sep 11, 2023
0bda01c
More license provider scaffolding
Sep 11, 2023
77bb9d3
Generated code
Sep 11, 2023
bf0ac90
Implement licenses selector
Sep 12, 2023
c050b87
Add license provider into launchtemplate creation
Sep 13, 2023
1e6f05d
Add api for HostResourceGroups selection
Sep 14, 2023
1410dce
Plumbing for host resource group provider into launchtemplates
Sep 14, 2023
cd8035b
Add missing provider
Sep 14, 2023
fbcbab9
handling v1beta1.NodeClass -> EC2NodeClass
Sep 15, 2023
a66c663
Add provider for placement groups
Sep 18, 2023
f7fff5a
Fixes to api documentation and testing environment
Sep 19, 2023
44fdeef
Refactor host resource group
Sep 19, 2023
4381e8d
Addressing PR feedback for providers
Sep 19, 2023
7246d16
Use nodeClass.Status to populate launch template
Sep 19, 2023
d514af5
fixes
Sep 19, 2023
08f41a1
Fix fake/rgapi methods
Sep 20, 2023
cbf61a7
Refactor utils conversions for Selector terms
Sep 20, 2023
c5bc5ee
cleanup utils conversion functions
Sep 20, 2023
b2e8fa8
Adding nodeclass test for LicenseSelectors
Sep 20, 2023
a3a7f5d
Address linting errors
Sep 21, 2023
5b31e38
Fix nodeclass suite_test
Sep 21, 2023
04b081a
Add license provider tests
Sep 21, 2023
7416e23
Adding test suites for host resource groups provider
Sep 21, 2023
1e4ed56
Adding suite_tests for placement group provider
Sep 21, 2023
16aa9c6
Fix fake api signatures
Sep 21, 2023
16ff1cf
Fix pagination logic for host resource group provider
Sep 21, 2023
42cbd6b
Revert changes, remove support in v1alpha1/awsnodetemplate
preflightsiren Oct 8, 2023
2069783
Merge remote-tracking branch 'origin/main' into feat-3182-dedicated-h…
preflightsiren Oct 8, 2023
914e187
Updating documentation
preflightsiren Oct 9, 2023
284b157
Merge remote-tracking branch 'origin/main' into feat-3182-dedicated-h…
preflightsiren Oct 9, 2023
5975167
Merge remote-tracking branch 'origin/main' into feat-3182-dedicated-h…
preflightsiren Oct 10, 2023
c70d149
Merge remote-tracking branch 'origin/main' into feat-3182-dedicated-h…
preflightsiren Oct 10, 2023
b0ebfef
Merge remote-tracking branch 'origin/main' into feat-3182-dedicated-h…
preflightsiren Oct 10, 2023
78efb9a
Merge remote-tracking branch 'origin/main' into feat-3182-dedicated-h…
preflightsiren Oct 11, 2023
25d1e91
Commit missing generated functions
preflightsiren Oct 11, 2023
48a0262
Merge remote-tracking branch 'origin/main' into feat-3182-dedicated-h…
preflightsiren Oct 11, 2023
03ba9be
Fixes post merge
preflightsiren Oct 20, 2023
57c79ee
Merge remote-tracking branch 'origin/main' into feat-3182-dedicated-h…
preflightsiren Oct 26, 2023
f1e23d5
Merge remote-tracking branch 'origin/main' into feat-3182-dedicated-h…
preflightsiren Oct 26, 2023
9e2e1bc
Merge remote-tracking branch 'origin/main' into feat-3182-dedicated-h…
preflightsiren Nov 3, 2023
85f0860
feedback: Remove HRG Name, use ARN only
preflightsiren Nov 3, 2023
9a23247
Merge remote-tracking branch 'origin/main' into feat-3182-dedicated-h…
preflightsiren Nov 7, 2023
5e0c322
Merge remote-tracking branch 'origin/main' into feat-3182-dedicated-h…
preflightsiren Nov 9, 2023
9fe0f2f
Merge remote-tracking branch 'origin/main' into feat-3182-dedicated-h…
preflightsiren Nov 14, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions cmd/controller/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,9 @@ func main() {
op.InstanceProfileProvider,
op.PricingProvider,
op.AMIProvider,
op.LicenseProvider,
op.HostResourceGroupProvider,
op.PlacementGroupProvider,
)...).
WithWebhooks(ctx, webhooks.NewWebhooks()...).
Start(ctx)
Expand Down
107 changes: 107 additions & 0 deletions designs/dedicatedhosts.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,107 @@
# Dedicated host support
## Background

As described in [3182](https://github.com/aws/karpenter/issues/3182) AWS provides the
ability to launch EC2 instances on [Dedicated Host hardware](https://docs.aws.amazon.com/license-manager/latest/userguide/host-resource-groups.html)
Dedicated hosts are often used to pay for ec2 usage as capital expenditure (CapEx)
rather than an operating expense (OpEx). This is often useful for publicly listed
companies that want to manage their revenue to OpEx ratio.

AWS allows the allocation of ec2 instances to Dedidcated Host Host Resource Groups (HRG)
through the use of launchtemplates.
Detailed documentation for launchtemplates and host resource groups can be found here:
- https://docs.aws.amazon.com/autoscaling/ec2/userguide/create-launch-template.html#advanced-settings-for-your-launch-template
- https://docs.aws.amazon.com/license-manager/latest/userguide/host-resource-groups.html

## Solutions

aws-karpenter already supports a set of [LaunchTemplate configuration](https://github.com/aws/karpenter/blob/main/pkg/apis/v1alpha1/awsnodetemplate.go)
with design document: https://github.com/aws/karpenter/blob/main/designs/aws-launch-templates-v2.md

1. Implement the minimal fields from https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-properties-ec2-launchtemplate-launchtemplatedata-placement.html
specifically `placement.hostResourceGroupArn` and `licenseConfiguration`

this would look like:

```
apiVersion: karpenter.k8s.aws/v1alpha1
kind: AWSNodeTemplate
metadata:
name: default
spec:
licenseConfiguration: arn:aws:license-manager:eu-east-1:123456789012:license-configuration:lic-edf7f9e241f5e16f29996c842111f448 # optional, arn of the license configuration
placement:
hostResourceGroupArn: arn:aws:resource-groups:us-east-1:123456789012:group/my-hrg-name #optional, The ARN of the host resource group in which to launch the instances. If you specify a host resource group ARN, omit the Tenancy parameter or set it to host.

```

2. Implement the complete fields from AWS Launch Templates

eg.
```
apiVersion: karpenter.k8s.aws/v1alpha1
kind: AWSNodeTemplate
metadata:
name: default
spec:
licenseConfiguration: arn:aws:license-manager:eu-east-1:123456789012:license-configuration:lic-edf7f9e241f5e16f29996c842111f448 # optional, arn of the license configuration
placement:
affinity: #optional
availabilityZone: #optional
groupId: #optional, something to do with placement groups
groupName: #optional
hostId: #optional, ID of the dedicated host
hostResourceGroupArn: arn:aws:resource-groups:us-east-1:123456789012:group/my-hrg-name #optional, The ARN of the host resource group in which to launch the instances. If you specify a host resource group ARN, omit the Tenancy parameter or set it to host.
paritionNumber: #optional, The number of the partition the instance should launch in. Valid only if the placement group strategy is set to partition.
spreadDomain: #optional, reserved for future use
tenancy: dedicated #optional, The tenancy of the instance. An instance with a tenancy of dedicated runs on single-tenant hardware, one of dedicated | default | host

```

3. Implement a simplified API
AWS Launch templates also include extra fields

```
apiVersion: karpenter.k8s.aws/v1alpha1
kind: AWSNodeTemplate
metadata:
name: default
spec:
dedicatedHostConfig:
licenseConfiguration: arn:aws:license-manager:eu-east-1:123456789012:license-configuration:lic-edf7f9e241f5e16f29996c842111f448 # optional, arn of the license configuration
resourceHostGroup: arn:aws:resource-groups:us-east-1:123456789012:group/my-hrg-name #option, arn of the HRG
```

4. Implement Selectors for all relevant fields
preflightsiren marked this conversation as resolved.
Show resolved Hide resolved

```
apiVersion: karpenter.k8s.aws/v1alpha1
kind: AWSNodeTemplate
metadata:
name: default
spec:
licenseSelector:
name: "myLicense"
hostResourceGroupSelector:
name: "myHrg"
```

This would require inexpensive or filterable APIs to query for available license configurations and host resource groups.
Today the recommended apis are:
* `aws license-manager list-license-configurations`
* `aws resource-groups list-groups`
* `aws ec2 describe-placement-groups`


## Recommendations

1. Middle ground solution, stick both to the AWS Launch Template api, but implementing the smallest set of configuration practical,
limits the amount of checks required to be performed in Karpenter, eg. setting both hostId and HostResourceGroupArn

2. Completely copies the AWS api, would allow the full set of possible configurations supported by AWS. As documented in launch-template-options.md also requires the most work to implement and support.

3. Focuses entirely on the dedicated host feature, ignoring other configuration options. Reduces confusion

## Decision from Working Group meeting

Implement selectors for all relevant fields to improve portability of configuration between clusters / regions / accounts.
60 changes: 60 additions & 0 deletions pkg/apis/crds/karpenter.k8s.aws_ec2nodeclasses.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -163,6 +163,33 @@ spec:
description: DetailedMonitoring controls if detailed monitoring is
enabled for instances that are launched
type: boolean
hostResourceGroupSelectorTerms:
description: HostResourceGroupSelectorTerms is a list of HostResourceGroupSelectors.
The terms are ORed.
items:
description: HostResourceGroupSelectorTerm defines the selection
logic for host Resource groups that are used to launch nodes.
If multiple fields are used for selection, the requirements are
ANDed
properties:
name:
description: Name of the hrg to be selected
type: string
type: object
type: array
licenseSelectorTerms:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Digging a little deeper into the docs, trying to understand why the license configuration piece is needed. Can you achieve dedicated hosts by associating license configurations with the AMIs that you are planning to launch? From what I can tell, if the AMI has an associated license configuration, you may not have a need to specify the configuration directly.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's two options:

  1. Associate it with each AMI
  2. Specify it in launch template

In some cases it makes sense to associate with AMIs, in others it does not, such as when the an AMI is used both with & without license rules or where doing so just adds unnecessary workflow steps.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the use-case for an AMI being used without license rules? When you are bringing in software at runtime that will have a license attached to it? Why not bake it into the AMI in that case?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so dedicated host resource groups are a peculiar thing. They are implemented within license manager, but they don't have to have anything to do with software licenses https://docs.aws.amazon.com/license-manager/latest/userguide/host-resource-groups.html

You can use host resource groups to separate hosts by purpose, for example, development test hosts versus production, organizational unit, or license constraint.

We use them to get a dynamically allocated pool of dedicated hosts that can be shared across accounts.

description: LicenseSelectorTerms is a list of LicenseSelectors. The
terms are ORed.
items:
description: LicenseSelectorTerm defines selection logic for a licenseConfigurationSpecification
used to launch nodes If multiple fields are used for selection,
the requirements are ANDed.
properties:
name:
description: Name of the license to be selected.
type: string
type: object
type: array
metadataOptions:
default:
httpEndpoint: enabled
Expand Down Expand Up @@ -220,6 +247,19 @@ spec:
credentials are not available."
type: string
type: object
placementGroupSelectorTerms:
description: PlacementGroupSelectorTerms is a list of PlacementGroupSelector.
The terms are ORed.
items:
description: PlacementGroupSelectorTerm defines the selection logic
for ec2 placement groups that are used to launch nodes. If multiple
fields are used for selection, the requirements are ANDed
properties:
name:
description: Name of the placement group to be selected
type: string
type: object
type: array
role:
description: Role is the AWS identity that nodes use. This field is
immutable. Marking this field as immutable avoids concerns around
Expand Down Expand Up @@ -345,10 +385,30 @@ spec:
- requirements
type: object
type: array
hostResourceGroup:
description: HostResourceGroups contains the HRG arns
properties:
arn:
description: Arn of the HRG
type: string
name:
description: Name of the HRG
type: string
type: object
instanceProfile:
description: InstanceProfile contains the resolved instance profile
for the role
type: string
licenses:
description: Licenses contains the license arns
items:
type: string
type: array
placementGroups:
description: PlacementGroups contains the ec2 placement group arns
items:
type: string
type: array
securityGroups:
description: SecurityGroups contains the current Security Groups values
that are available to the cluster under the SecurityGroups selectors.
Expand Down
33 changes: 33 additions & 0 deletions pkg/apis/v1beta1/ec2nodeclass.go
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,15 @@ type EC2NodeClassSpec struct {
// +kubebuilder:validation:Enum:={AL2,Bottlerocket,Ubuntu,Custom,Windows2019,Windows2022}
// +required
AMIFamily *string `json:"amiFamily"`
// LicenseSelectorTerms is a list of LicenseSelectors. The terms are ORed.
preflightsiren marked this conversation as resolved.
Show resolved Hide resolved
preflightsiren marked this conversation as resolved.
Show resolved Hide resolved
// +optional
LicenseSelectorTerms []LicenseSelectorTerm `json:"licenseSelectorTerms,omitempty" hash:"ignore"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick high-level question: Which of these things actually need to be terms? Is there a need for someone to specify multiple of any of these things and have all of the options be passed through or us to somehow order and pick one based on some heuristic?

If there's not a reason that we should support more than one resolution of these things per-EC2NodeClass, it may be worth having this be a pure selector, no sets of terms that are ORed

// HostResourceGroupSelectorTerms is a list of HostResourceGroupSelectors. The terms are ORed.
// +optional
HostResourceGroupSelectorTerms []HostResourceGroupSelectorTerm `json:"hostResourceGroupSelectorTerms,omitempty" hash:"ignore"`
// PlacementGroupSelectorTerms is a list of PlacementGroupSelector. The terms are ORed.
// +optional
PlacementGroupSelectorTerms []PlacementGroupSelectorTerm `json:"placementGroupSelectorTerms,omitempty" hash:"ignore"`
// UserData to be applied to the provisioned nodes.
// It must be in the appropriate format based on the AMIFamily in use. Karpenter will merge certain fields into
// this UserData to ensure nodes are being provisioned with the correct configuration.
Expand Down Expand Up @@ -157,6 +166,30 @@ type AMISelectorTerm struct {
Owner string `json:"owner,omitempty"`
}

// LicenseSelectorTerm defines selection logic for a licenseConfigurationSpecification used to launch nodes
// If multiple fields are used for selection, the requirements are ANDed.
type LicenseSelectorTerm struct {
preflightsiren marked this conversation as resolved.
Show resolved Hide resolved
// Name of the license to be selected.
// +optional
Name string `json:"name,omitempty"`
}

// HostResourceGroupSelectorTerm defines the selection logic for host Resource groups
// that are used to launch nodes. If multiple fields are used for selection, the requirements are ANDed
type HostResourceGroupSelectorTerm struct {
// Name of the hrg to be selected
// +optional
Name string `json:"name,omitempty"`
}

// PlacementGroupSelectorTerm defines the selection logic for ec2 placement groups
// that are used to launch nodes. If multiple fields are used for selection, the requirements are ANDed
type PlacementGroupSelectorTerm struct {
// Name of the placement group to be selected
// +optional
Name string `json:"name,omitempty"`
}

// MetadataOptions contains parameters for specifying the exposure of the
// Instance Metadata Service to provisioned EC2 nodes.
type MetadataOptions struct {
Expand Down
19 changes: 19 additions & 0 deletions pkg/apis/v1beta1/ec2nodeclass_status.go
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,16 @@ type AMI struct {
Requirements []v1.NodeSelectorRequirement `json:"requirements"`
}

// HostResourceGroup contains the resolved host resource group name and arn for node launch
type HostResourceGroup struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious what your thoughts are on this: If we're just using the name selector across all of these new entities in the EC2NodeClass, what's the value in resolving the ARN back into the status? For the entities that already use the name as an "id"-like thing, this is just the name with some additional "syntactic sugar" to add the account and region information

// Name of the HRG
// +optional
Name string `json:"name,omitempty"`
jonathan-innis marked this conversation as resolved.
Show resolved Hide resolved
// Arn of the HRG
// +optional
ARN string `json:"arn,omitempty"`
}

// EC2NodeClassStatus contains the resolved state of the EC2NodeClass
type EC2NodeClassStatus struct {
// Subnets contains the current Subnet values that are available to the
Expand All @@ -63,6 +73,15 @@ type EC2NodeClassStatus struct {
// cluster under the AMI selectors.
// +optional
AMIs []AMI `json:"amis,omitempty"`
// Licenses contains the license arns
// +optional
Licenses []string `json:"licenses,omitempty"`
// HostResourceGroups contains the HRG arns
// +optional
HostResourceGroup *HostResourceGroup `json:"hostResourceGroup,omitempty"`
// PlacementGroups contains the ec2 placement group arns
// +optional
PlacementGroups []string `json:"placementGroups,omitempty"`
// InstanceProfile contains the resolved instance profile for the role
// +optional
InstanceProfile string `json:"instanceProfile,omitempty"`
Expand Down
Loading
Loading