Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: some fixes for karpenter deploy #358

Merged
merged 3 commits into from
Sep 30, 2024
Merged

Conversation

kvvit
Copy link

@kvvit kvvit commented Sep 25, 2024

Fixes for karpenter deploy

First error:

Error: karpenter/private failed to create kubernetes rest client for update of resource: resource [karpenter.sh/v1/EC2NodeClass] isn't valid for cluster, check the APIVersion and Kind fields are valid

   with kubectl_manifest.ec2nodeclass_private[0],
   on main.tf line 71, in resource "kubectl_manifest" "ec2nodeclass_private":
   71: resource "kubectl_manifest" "ec2nodeclass_private" {

Solution: set apiVersion: karpenter.k8s.aws/v1 for resources kubectl_manifest: ec2nodeclass_private and ec2nodeclass_public (file terraform/modules/k8s-karpenter/main.tf).

Second error:

 Error: karpenter/default failed to run apply: error when creating "/tmp/879917200kubectl_manifest.yaml": NodePool.karpenter.sh "default" is invalid: [spec.disruption.consolidateAfter: Required value, spec.template.spec.nodeClassRef.group: Required value, spec.template.spec.nodeClassRef.kind: Required value, <nil>: Invalid value: "null": some validation rules were not checked because the object was invalid; correct the existing errors to complete validation]

   with kubectl_manifest.nodepool["default"],
   on main.tf line 135, in resource "kubectl_manifest" "nodepool":
  135: resource "kubectl_manifest" "nodepool" {

Solution: Added fielsds group and kind for private and public NodePools (file terragrunt/ACCOUNT_ID/us-east-1/demo/env.yaml)

        spec:
          nodeClassRef:
            group: karpenter.k8s.aws
            kind: EC2NodeClass
            name: public

Third error:

 Error: karpenter/ci failed to run apply: error when creating "/tmp/969255726kubectl_manifest.yaml": NodePool.karpenter.sh "ci" is invalid: [spec.disruption.consolidateAfter: Required value, <nil>: Invalid value: "null": some validation rules were not checked because the object was invalid; correct the existing errors to complete validation]

   with kubectl_manifest.nodepool["ci"],
   on main.tf line 135, in resource "kubectl_manifest" "nodepool":
  135: resource "kubectl_manifest" "nodepool" {

Solution: Added field consolidateAfter: 1m for private and public NodePools (file terragrunt/ACCOUNT_ID/us-east-1/demo/env.yaml)

     disruption:
       consolidationPolicy: WhenEmptyOrUnderutilized
       consolidateAfter: 1m

Last one:

I removed the unnecessary data block data "aws_ecrpublic_authorization_token" "token" {} and the fields from the helm_release resource: repository_username = data.aws_ecrpublic_authorization_token.token.user_name and repository_password = data.aws_ecrpublic_authorization_token.token.password. I tested these changes, and everything works perfectly.

I also updated the locals block for Karpenter. I set the chart value to oci://public.ecr.aws/karpenter/karpenter and left the repository field empty.

    chart         = try(var.helm.chart_name, "oci://public.ecr.aws/karpenter/karpenter")
    repository    = try(var.helm.repository, "")

This configuration has been tested across different Helm releases using OCI repositories, and everything works flawlessly. I successfully tested the Karpenter deployment from a Docker container in our sandbox account, and it was deployed without any issues.

There was another issue related to the aws_ecrpublic_authorization_token, related to aws provider. Since the public ECR is located in the us-east-1 region, this data doesn't work in other regions. An additional AWS provider with an alias is required to handle this.

@kvvit kvvit requested a review from mglotov September 25, 2024 17:28
@kvvit kvvit self-assigned this Sep 25, 2024
terraform/modules/k8s-karpenter/main.tf Outdated Show resolved Hide resolved
@@ -3,8 +3,8 @@ locals {
karpenter = {
name = try(var.helm.release_name, "karpenter")
enabled = true
chart = try(var.helm.chart_name, "karpenter")
repository = try(var.helm.repository, "oci://public.ecr.aws/karpenter")
chart = try(var.helm.chart_name, "oci://public.ecr.aws/karpenter/karpenter")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you change this? What was the problem here?

Copy link
Author

@kvvit kvvit Sep 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This configuration worked for github-runners that also use oci helm repository. And it's working now for deploy karpenter in the some project.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added information about this changes in the description of the PR

@@ -144,8 +148,6 @@ resource "helm_release" "this" {
version = local.karpenter.chart_version
namespace = module.namespace[count.index].name
max_history = 3
repository_username = data.aws_ecrpublic_authorization_token.token.user_name
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure it works. Let's test it together

Copy link
Author

@kvvit kvvit Sep 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested this yesterday in the sandbox account, and in the in one of the projects. All works fine. More over, this data works only for us-east-1 region. For other regions we need to use additional aws provider with alias.

@mglotov mglotov changed the title Fixes for karpenter deploy fix: some fixes for karpenter deploy Sep 26, 2024
Copy link

sonarcloud bot commented Sep 26, 2024

@kvvit kvvit merged commit 8ace456 into terragrunt Sep 30, 2024
4 of 10 checks passed
@kvvit kvvit deleted the hotfix/karpenter-improve branch September 30, 2024 06:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants