Skip to content

Commit

Permalink
Merge remote-tracking branch 'origin/main' into feat-3182-dedicated-h…
Browse files Browse the repository at this point in the history
…osts-selectors
  • Loading branch information
preflightsiren committed Oct 11, 2023
2 parents 25d1e91 + c5e011c commit 48a0262
Show file tree
Hide file tree
Showing 23 changed files with 71 additions and 42 deletions.
16 changes: 12 additions & 4 deletions .github/actions/e2e/create-cluster/action.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -64,8 +64,8 @@ runs:
- name: deploy alpha instance profile
shell: bash
run: |
aws iam create-instance-profile --instance-profile-name "KarpenterNodeInstanceProfile-${{ inputs.cluster_name }}" --tags Key=testing/type,Value=e2e Key=testing/cluster,Value=${{ inputs.cluster_name }}
aws iam add-role-to-instance-profile --instance-profile-name "KarpenterNodeInstanceProfile-${{ inputs.cluster_name }}" --role-name "KarpenterNodeRole-${{ inputs.cluster_name }}"
aws iam create-instance-profile --instance-profile-name "KarpenterNodeInstanceProfile-${{ inputs.cluster_name }}" --tags Key=testing/type,Value=e2e Key=testing/cluster,Value=${{ inputs.cluster_name }} || true
aws iam add-role-to-instance-profile --instance-profile-name "KarpenterNodeInstanceProfile-${{ inputs.cluster_name }}" --role-name "KarpenterNodeRole-${{ inputs.cluster_name }}" || true
- name: deploy alpha policy
shell: bash
run: |
Expand All @@ -77,15 +77,15 @@ runs:
POLICY_DOCUMENT=$(envsubst < .github/actions/e2e/create-cluster/alpha-controller-policy.json)
POLICY_NAME="KarpenterControllerPolicy-Alpha-${CLUSTER_NAME}"
echo "Creating policy $POLICY_NAME..."
aws iam create-policy --policy-name "$POLICY_NAME" --policy-document "$POLICY_DOCUMENT"
aws iam create-policy --policy-name "$POLICY_NAME" --policy-document "$POLICY_DOCUMENT" || true
- name: create or upgrade cluster
shell: bash
run: |
# Create or Upgrade the cluster based on whether the cluster already exists
cmd="create"
eksctl get cluster --name ${{ inputs.cluster_name }} && cmd="upgrade"
eksctl ${cmd} cluster -f - <<EOF
cat << EOF >> clusterconfig.yaml
---
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
Expand Down Expand Up @@ -146,6 +146,14 @@ runs:
wellKnownPolicies:
ebsCSIController: true
EOF
eksctl ${cmd} cluster -f clusterconfig.yaml
# We need to call these update iamserviceaccount commands again since the "eksctl upgrade cluster" action
# doesn't handle updates to IAM serviceaccounts correctly when the roles assigned to them change
eksctl update iamserviceaccount -f clusterconfig.yaml --approve
- name: tag oidc provider of the cluster
if: always()
shell: bash
Expand Down
7 changes: 4 additions & 3 deletions .github/workflows/e2e-matrix.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -63,9 +63,10 @@ jobs:
e2e-upgrade:
uses: ./.github/workflows/e2e-upgrade.yaml
with:
# This version matches the steps of the newest version of the install-eksctl action
# which will take in the eksctl_version into the composite action
from_git_ref: 3519331035579ac0caf66a7f5a5282a2fef9b409
# This version matches the steps of the newest version that contains the additional step
# of deploying the instance profile so that the pre-upgrade and post-upgrade create-cluster
# actions have the same number of steps and don't fail during post-cleanup
from_git_ref: 62c25a3ea85c7d00165e60a913fff1ec7c1f29fd
to_git_ref: ${{ inputs.git_ref }}
region: ${{ inputs.region }}
k8s_version: ${{ inputs.k8s_version }}
Expand Down
2 changes: 2 additions & 0 deletions .github/workflows/release.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,8 @@ on:
tags: [ 'v*.*.*' ]
permissions:
id-token: write
pull-requests: write
contents: write
jobs:
release:
if: github.repository == 'aws/karpenter'
Expand Down
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,9 +16,9 @@ Karpenter improves the efficiency and cost of running workloads on Kubernetes cl
* **Provisioning** nodes that meet the requirements of the pods
* **Removing** the nodes when the nodes are no longer needed

Come discuss Karpenter in the [#karpenter](https://kubernetes.slack.com/archives/C02SFFZSA2K) channel in the [Kubernetes slack](https://slack.k8s.io/) or join the [Karpenter working group](https://karpenter.sh/docs/contributing/working-group/) bi-weekly calls.
Come discuss Karpenter in the [#karpenter](https://kubernetes.slack.com/archives/C02SFFZSA2K) channel, in the [Kubernetes slack](https://slack.k8s.io/) or join the [Karpenter working group](https://karpenter.sh/docs/contributing/working-group/) bi-weekly calls.

Check out the [Docs](https://karpenter.sh/) to learn more.
Check out the [Docs](https://karpenter.sh/docs/) to learn more.

## Talks
- 09/08/2022 [Workload Consolidation with Karpenter](https://youtu.be/BnksdJ3oOEs)
Expand Down
2 changes: 1 addition & 1 deletion charts/karpenter/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -99,7 +99,7 @@ helm upgrade --install --namespace karpenter --create-namespace \
| tolerations | list | `[{"key":"CriticalAddonsOnly","operator":"Exists"}]` | Tolerations to allow the pod to be scheduled to nodes with taints. |
| topologySpreadConstraints | list | `[{"maxSkew":1,"topologyKey":"topology.kubernetes.io/zone","whenUnsatisfiable":"ScheduleAnyway"}]` | Topology spread constraints to increase the controller resilience by distributing pods across the cluster zones. If an explicit label selector is not provided one will be created from the pod selector labels. |
| webhook.enabled | bool | `true` | Whether to enable the webhooks and webhook permissions. |
| webhook.logLevel | string | `"error"` | |
| webhook.logLevel | string | `"error"` | Webhook log level (Deprecated: Use logConfig.logLevel.webhook instead) |
| webhook.metrics.port | int | `8001` | The container port to use for webhook metrics. |
| webhook.port | int | `8443` | The container port to use for the webhook. |

6 changes: 3 additions & 3 deletions hack/docgen.sh
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ compatibilitymatrix() {


compatibilitymatrix
go run hack/docs/metrics_gen_docs.go pkg/ ${KARPENTER_CORE_DIR}/pkg website/content/en/preview/concepts/metrics.md
go run hack/docs/instancetypes_gen_docs.go website/content/en/preview/concepts/instance-types.md
go run hack/docs/configuration_gen_docs.go website/content/en/preview/concepts/settings.md
go run hack/docs/metrics_gen_docs.go pkg/ ${KARPENTER_CORE_DIR}/pkg website/content/en/preview/reference/metrics.md
go run hack/docs/instancetypes_gen_docs.go website/content/en/preview/reference/instance-types.md
go run hack/docs/configuration_gen_docs.go website/content/en/preview/reference/settings.md
cd charts/karpenter && helm-docs
1 change: 1 addition & 0 deletions pkg/cloudprovider/cloudprovider.go
Original file line number Diff line number Diff line change
Expand Up @@ -379,5 +379,6 @@ func (c *CloudProvider) instanceToNodeClaim(i *instance.Instance, instanceType *
nodeClaim.DeletionTimestamp = &metav1.Time{Time: time.Now()}
}
nodeClaim.Status.ProviderID = fmt.Sprintf("aws:///%s/%s", i.Zone, i.ID)
nodeClaim.Status.ImageID = i.ImageID
return nodeClaim
}
6 changes: 3 additions & 3 deletions pkg/cloudprovider/machine_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -572,9 +572,9 @@ var _ = Describe("Machine/CloudProvider", func() {
if *ov.InstanceType == "m5.large" {
foundNonGPULT = true
Expect(v.Overrides).To(ContainElements(
&ec2.FleetLaunchTemplateOverridesRequest{SubnetId: aws.String("subnet-test1"), InstanceType: aws.String("m5.large"), AvailabilityZone: aws.String("test-zone-1a")},
&ec2.FleetLaunchTemplateOverridesRequest{SubnetId: aws.String("subnet-test2"), InstanceType: aws.String("m5.large"), AvailabilityZone: aws.String("test-zone-1b")},
&ec2.FleetLaunchTemplateOverridesRequest{SubnetId: aws.String("subnet-test3"), InstanceType: aws.String("m5.large"), AvailabilityZone: aws.String("test-zone-1c")},
&ec2.FleetLaunchTemplateOverridesRequest{SubnetId: aws.String("subnet-test1"), ImageId: ov.ImageId, InstanceType: aws.String("m5.large"), AvailabilityZone: aws.String("test-zone-1a")},
&ec2.FleetLaunchTemplateOverridesRequest{SubnetId: aws.String("subnet-test2"), ImageId: ov.ImageId, InstanceType: aws.String("m5.large"), AvailabilityZone: aws.String("test-zone-1b")},
&ec2.FleetLaunchTemplateOverridesRequest{SubnetId: aws.String("subnet-test3"), ImageId: ov.ImageId, InstanceType: aws.String("m5.large"), AvailabilityZone: aws.String("test-zone-1c")},
))
}
}
Expand Down
13 changes: 10 additions & 3 deletions pkg/cloudprovider/nodeclaim_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -87,6 +87,13 @@ var _ = Describe("NodeClaim/CloudProvider", func() {
Expect(corecloudproivder.IsInsufficientCapacityError(err)).To(BeTrue())
Expect(cloudProviderNodeClaim).To(BeNil())
})
It("should set ImageID in the status field of the nodeClaim", func() {
ExpectApplied(ctx, env.Client, nodePool, nodeClass, nodeClaim)
cloudProviderNodeClaim, err := cloudProvider.Create(ctx, nodeClaim)
Expect(err).To(BeNil())
Expect(cloudProviderNodeClaim).ToNot(BeNil())
Expect(cloudProviderNodeClaim.Status.ImageID).ToNot(BeEmpty())
})
It("should return NodeClass Hash on the nodeClaim", func() {
ExpectApplied(ctx, env.Client, nodePool, nodeClass, nodeClaim)
cloudProviderNodeClaim, err := cloudProvider.Create(ctx, nodeClaim)
Expand Down Expand Up @@ -373,9 +380,9 @@ var _ = Describe("NodeClaim/CloudProvider", func() {
if *ov.InstanceType == "m5.large" {
foundNonGPULT = true
Expect(v.Overrides).To(ContainElements(
&ec2.FleetLaunchTemplateOverridesRequest{SubnetId: aws.String("subnet-test1"), InstanceType: aws.String("m5.large"), AvailabilityZone: aws.String("test-zone-1a")},
&ec2.FleetLaunchTemplateOverridesRequest{SubnetId: aws.String("subnet-test2"), InstanceType: aws.String("m5.large"), AvailabilityZone: aws.String("test-zone-1b")},
&ec2.FleetLaunchTemplateOverridesRequest{SubnetId: aws.String("subnet-test3"), InstanceType: aws.String("m5.large"), AvailabilityZone: aws.String("test-zone-1c")},
&ec2.FleetLaunchTemplateOverridesRequest{SubnetId: aws.String("subnet-test1"), ImageId: ov.ImageId, InstanceType: aws.String("m5.large"), AvailabilityZone: aws.String("test-zone-1a")},
&ec2.FleetLaunchTemplateOverridesRequest{SubnetId: aws.String("subnet-test2"), ImageId: ov.ImageId, InstanceType: aws.String("m5.large"), AvailabilityZone: aws.String("test-zone-1b")},
&ec2.FleetLaunchTemplateOverridesRequest{SubnetId: aws.String("subnet-test3"), ImageId: ov.ImageId, InstanceType: aws.String("m5.large"), AvailabilityZone: aws.String("test-zone-1c")},
))
}
}
Expand Down
1 change: 1 addition & 0 deletions pkg/fake/ec2api.go
Original file line number Diff line number Diff line change
Expand Up @@ -176,6 +176,7 @@ func (e *EC2API) CreateFleetWithContext(_ context.Context, input *ec2.CreateFlee
LaunchTemplateAndOverrides: &ec2.LaunchTemplateAndOverridesResponse{
Overrides: &ec2.FleetLaunchTemplateOverrides{
SubnetId: input.LaunchTemplateConfigs[0].Overrides[0].SubnetId,
ImageId: input.LaunchTemplateConfigs[0].Overrides[0].ImageId,
InstanceType: input.LaunchTemplateConfigs[0].Overrides[0].InstanceType,
AvailabilityZone: input.LaunchTemplateConfigs[0].Overrides[0].AvailabilityZone,
},
Expand Down
9 changes: 5 additions & 4 deletions pkg/providers/instance/instance.go
Original file line number Diff line number Diff line change
Expand Up @@ -301,11 +301,11 @@ func (p *Provider) getLaunchTemplateConfigs(ctx context.Context, nodeClass *v1be
if err != nil {
return nil, fmt.Errorf("getting launch templates, %w", err)
}
for launchTemplateName, instanceTypes := range launchTemplates {
for _, launchTemplate := range launchTemplates {
launchTemplateConfig := &ec2.FleetLaunchTemplateConfigRequest{
Overrides: p.getOverrides(instanceTypes, zonalSubnets, scheduling.NewNodeSelectorRequirements(nodeClaim.Spec.Requirements...).Get(v1.LabelTopologyZone), capacityType),
Overrides: p.getOverrides(launchTemplate.InstanceTypes, zonalSubnets, scheduling.NewNodeSelectorRequirements(nodeClaim.Spec.Requirements...).Get(v1.LabelTopologyZone), capacityType, launchTemplate.ImageID),
LaunchTemplateSpecification: &ec2.FleetLaunchTemplateSpecificationRequest{
LaunchTemplateName: aws.String(launchTemplateName),
LaunchTemplateName: aws.String(launchTemplate.Name),
Version: aws.String("$Latest"),
},
}
Expand All @@ -321,7 +321,7 @@ func (p *Provider) getLaunchTemplateConfigs(ctx context.Context, nodeClass *v1be

// getOverrides creates and returns launch template overrides for the cross product of InstanceTypes and subnets (with subnets being constrained by
// zones and the offerings in InstanceTypes)
func (p *Provider) getOverrides(instanceTypes []*cloudprovider.InstanceType, zonalSubnets map[string]*ec2.Subnet, zones *scheduling.Requirement, capacityType string) []*ec2.FleetLaunchTemplateOverridesRequest {
func (p *Provider) getOverrides(instanceTypes []*cloudprovider.InstanceType, zonalSubnets map[string]*ec2.Subnet, zones *scheduling.Requirement, capacityType string, image string) []*ec2.FleetLaunchTemplateOverridesRequest {
// Unwrap all the offerings to a flat slice that includes a pointer
// to the parent instance type name
type offeringWithParentName struct {
Expand Down Expand Up @@ -354,6 +354,7 @@ func (p *Provider) getOverrides(instanceTypes []*cloudprovider.InstanceType, zon
overrides = append(overrides, &ec2.FleetLaunchTemplateOverridesRequest{
InstanceType: aws.String(offering.parentInstanceTypeName),
SubnetId: subnet.SubnetId,
ImageId: aws.String(image),
// This is technically redundant, but is useful if we have to parse insufficient capacity errors from
// CreateFleet so that we can figure out the zone rather than additional API calls to look up the subnet
AvailabilityZone: subnet.AvailabilityZone,
Expand Down
2 changes: 1 addition & 1 deletion pkg/providers/instance/types.go
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,7 @@ func NewInstanceFromFleet(out *ec2.CreateFleetInstance, tags map[string]string)
LaunchTime: time.Now(), // estimate the launch time since we just launched
State: ec2.StatePending,
ID: aws.StringValue(out.InstanceIds[0]),
ImageID: "", // we don't know the image id when we get the output from fleet
ImageID: aws.StringValue(out.LaunchTemplateAndOverrides.Overrides.ImageId),
Type: aws.StringValue(out.InstanceType),
Zone: aws.StringValue(out.LaunchTemplateAndOverrides.Overrides.AvailabilityZone),
CapacityType: aws.StringValue(out.Lifecycle),
Expand Down
15 changes: 11 additions & 4 deletions pkg/providers/launchtemplate/launchtemplate.go
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,12 @@ const (
karpenterManagedTagKey = "karpenter.k8s.aws/cluster"
)

type LaunchTemplate struct {
Name string
InstanceTypes []*cloudprovider.InstanceType
ImageID string
}

type Provider struct {
sync.Mutex
ec2api ec2iface.EC2API
Expand Down Expand Up @@ -97,14 +103,15 @@ func NewProvider(ctx context.Context, cache *cache.Cache, ec2api ec2iface.EC2API
}

func (p *Provider) EnsureAll(ctx context.Context, nodeClass *v1beta1.EC2NodeClass, nodeClaim *corev1beta1.NodeClaim,
instanceTypes []*cloudprovider.InstanceType, additionalLabels map[string]string, tags map[string]string) (map[string][]*cloudprovider.InstanceType, error) {
instanceTypes []*cloudprovider.InstanceType, additionalLabels map[string]string, tags map[string]string) ([]*LaunchTemplate, error) {

p.Lock()
defer p.Unlock()
// If Launch Template is directly specified then just use it
if nodeClass.Spec.LaunchTemplateName != nil {
return map[string][]*cloudprovider.InstanceType{ptr.StringValue(nodeClass.Spec.LaunchTemplateName): instanceTypes}, nil
return []*LaunchTemplate{{Name: ptr.StringValue(nodeClass.Spec.LaunchTemplateName), InstanceTypes: instanceTypes}}, nil
}

options, err := p.createAMIOptions(ctx, nodeClass, lo.Assign(nodeClaim.Labels, additionalLabels), tags)
if err != nil {
return nil, err
Expand All @@ -113,14 +120,14 @@ func (p *Provider) EnsureAll(ctx context.Context, nodeClass *v1beta1.EC2NodeClas
if err != nil {
return nil, err
}
launchTemplates := map[string][]*cloudprovider.InstanceType{}
var launchTemplates []*LaunchTemplate
for _, resolvedLaunchTemplate := range resolvedLaunchTemplates {
// Ensure the launch template exists, or create it
ec2LaunchTemplate, err := p.ensureLaunchTemplate(ctx, resolvedLaunchTemplate)
if err != nil {
return nil, err
}
launchTemplates[*ec2LaunchTemplate.LaunchTemplateName] = resolvedLaunchTemplate.InstanceTypes
launchTemplates = append(launchTemplates, &LaunchTemplate{Name: *ec2LaunchTemplate.LaunchTemplateName, InstanceTypes: resolvedLaunchTemplate.InstanceTypes, ImageID: resolvedLaunchTemplate.AMIID})
}
return launchTemplates, nil
}
Expand Down
1 change: 1 addition & 0 deletions test/cloudformation/iam_cloudformation.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -123,6 +123,7 @@ Resources:
- cloudformation:DeleteStack
- cloudformation:DescribeChangeSet
- cloudformation:DescribeStackEvents
- cloudformation:DescribeStackResources
- cloudformation:ExecuteChangeSet
- cloudformation:GetTemplate
- cloudformation:GetTemplateSummary
Expand Down
2 changes: 1 addition & 1 deletion website/content/en/preview/concepts/disruption.md
Original file line number Diff line number Diff line change
Expand Up @@ -183,7 +183,7 @@ __Behavioral Fields__
| Block Device Mappings | x | |
| Detailed Monitoring | x | |
To enable the drift feature flag, refer to the [Settings Feature Gates]({{<ref "./settings#feature-gates" >}}).
To enable the drift feature flag, refer to the [Settings Feature Gates]({{<ref "../reference/settings#feature-gates" >}}).
Karpenter will add `MachineDrifted` status condition on the machines if the machine is drifted, and does not have the status condition,
Expand Down
2 changes: 1 addition & 1 deletion website/content/en/preview/concepts/pod-density.md
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,7 @@ Environment variables for the Karpenter controller may be specified as [helm cha

### VPC CNI Custom Networking

By default, the VPC CNI allocates IPs for a node and pods from the same subnet. With [VPC CNI Custom Networking](https://aws.github.io/aws-eks-best-practices/networking/custom-networking), the pods will receive IP addresses from another subnet dedicated to pod IPs. This approach makes it easier to manage IP addresses and allows for separate Network Access Control Lists (NACLs) applied to your pods. VPC CNI Custom Networking reduces the pod density of a node since one of the ENI attachments will be used for the node and cannot share the allocated IPs on the interface to pods. Karpenter supports VPC CNI Custom Networking and similar CNI setups where the primary node interface is separated from the pods interfaces through a global [setting](./settings.md#configmap) within the karpenter-global-settings configmap: `aws.reservedENIs`. In the common case, `aws.reservedENIs` should be set to `"1"` if using Custom Networking.
By default, the VPC CNI allocates IPs for a node and pods from the same subnet. With [VPC CNI Custom Networking](https://aws.github.io/aws-eks-best-practices/networking/custom-networking), the pods will receive IP addresses from another subnet dedicated to pod IPs. This approach makes it easier to manage IP addresses and allows for separate Network Access Control Lists (NACLs) applied to your pods. VPC CNI Custom Networking reduces the pod density of a node since one of the ENI attachments will be used for the node and cannot share the allocated IPs on the interface to pods. Karpenter supports VPC CNI Custom Networking and similar CNI setups where the primary node interface is separated from the pods interfaces through a global [setting](../reference/settings.md#configmap) within the karpenter-global-settings configmap: `aws.reservedENIs`. In the common case, `aws.reservedENIs` should be set to `"1"` if using Custom Networking.

{{% alert title="Windows Support Notice" color="warning" %}}
It's currently not possible to specify custom networking with Windows nodes.
Expand Down
Loading

0 comments on commit 48a0262

Please sign in to comment.