Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add networkInterfaces configuration to launchTemplate #4353

Closed
wants to merge 24 commits into from

Conversation

myaser
Copy link
Contributor

@myaser myaser commented Jul 29, 2023

Fixes #2026

Description
a reduced version of #3819,
Add a networkInterfaces struct to the AWSNodeTemplate CRD, and use the parameters to create AWS launch templates.

How was this change tested?
unit and integration tests were created; also tested manually on a test cluster

Does this change impact docs?

  • Yes, PR includes docs updates
  • Yes, issue opened: #
  • No

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@myaser myaser requested a review from a team as a code owner July 29, 2023 21:34
@myaser myaser requested a review from bwagner5 July 29, 2023 21:34
@netlify
Copy link

netlify bot commented Jul 29, 2023

Deploy Preview for karpenter-docs-prod ready!

Name Link
🔨 Latest commit 3d378e3
🔍 Latest deploy log https://app.netlify.com/sites/karpenter-docs-prod/deploys/653aaca8262aaa000863e87d
😎 Deploy Preview https://deploy-preview-4353--karpenter-docs-prod.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

myaser added a commit to zalando-incubator/kubernetes-on-aws that referenced this pull request Jul 31, 2023
@github-actions
Copy link
Contributor

This PR has been inactive for 14 days. StaleBot will close this stale PR after 14 more days of inactivity.

@bwagner5
Copy link
Contributor

I've taken a first pass and then got side tracked by some other issues. I'll pull this down tomorrow and take it for a test drive.

@myaser myaser force-pushed the networkinterfaces-simplified branch from 6cc4b49 to c12aaaf Compare August 16, 2023 09:58
@bwagner5
Copy link
Contributor

I've tested this out on my side and verified everything is updating properly with the LaunchTemplate, instance, and drift when changing between params within NetworkInterfaces!

Nice! Glad we can get some functionality in so that we can build on this base!

Remaining items are just to fix the DeviceIndex and Description passing to the NetworkInterfaces and that should also fix the test that is failing.

@myaser
Copy link
Contributor Author

myaser commented Aug 18, 2023

I've tested this out on my side and verified everything is updating properly with the LaunchTemplate, instance, and drift when changing between params within NetworkInterfaces!

Nice! Glad we can get some functionality in so that we can build on this base!

Remaining items are just to fix the DeviceIndex and Description passing to the NetworkInterfaces and that should also fix the test that is failing.

great to hear 🎉
I have added the missing fields

test/suites/integration/networkinterfaces_test.go Outdated Show resolved Hide resolved
test/suites/integration/networkinterfaces_test.go Outdated Show resolved Hide resolved
test/suites/integration/networkinterfaces_test.go Outdated Show resolved Hide resolved
test/suites/integration/networkinterfaces_test.go Outdated Show resolved Hide resolved
test/suites/integration/networkinterfaces_test.go Outdated Show resolved Hide resolved
test/suites/integration/networkinterfaces_test.go Outdated Show resolved Hide resolved
test/suites/integration/networkinterfaces_test.go Outdated Show resolved Hide resolved
Copy link
Contributor

@bwagner5 bwagner5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/karpenter snapshot

@github-actions
Copy link
Contributor

Snapshot successfully published to oci://public.ecr.aws/karpenter/karpenter:v0-a2843091fafed84c6e3557d460954c7148e2a1ae. Find the image tag and installation instructions at https://gallery.ecr.aws/karpenter/karpenter/

Copy link
Contributor

@bwagner5 bwagner5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/karpenter snapshot

@github-actions
Copy link
Contributor

Snapshot successfully published to oci://public.ecr.aws/karpenter/karpenter:v0-986c25ea65854d41f518c7527a1fe93e832b39c5. Find the image tag and installation instructions at https://gallery.ecr.aws/karpenter/karpenter/

@bwagner5
Copy link
Contributor

bwagner5 commented Aug 21, 2023

Looks like the network interfaces e2e are still failing:

2023-08-21T19:16:12.6161277Z �[38;5;9m• [FAILED] [51.744 seconds]�[0m
2023-08-21T19:16:12.6161643Z NetworkInterfaces
2023-08-21T19:16:12.6162973Z �[38;5;243m/home/runner/work/karpenter/karpenter/test/suites/integration/networkinterfaces_test.go:32�[0m
2023-08-21T19:16:12.6163764Z   �[38;5;9m�[1m[It] should create a node with more than one NetworkInterface�[0m
2023-08-21T19:16:12.6164422Z   �[38;5;243m/home/runner/work/karpenter/karpenter/test/suites/integration/networkinterfaces_test.go:124�[0m
2023-08-21T19:16:12.6165081Z �[38;5;243m------------------------------�[0m
2023-08-21T19:16:12.6165298Z 
2023-08-21T19:16:12.6165493Z �[38;5;9m�[1mSummarizing 2 Failures:�[0m
2023-08-21T19:16:12.6166305Z   �[38;5;9m[FAIL]�[0m �[0mNetworkInterfaces �[38;5;9m�[1m[It] should use the specified NetworkInterface�[0m
2023-08-21T19:16:12.6167362Z   �[38;5;243m/home/runner/work/karpenter/karpenter/test/suites/integration/networkinterfaces_test.go:121�[0m
2023-08-21T19:16:12.6168145Z   �[38;5;9m[FAIL]�[0m �[0mNetworkInterfaces �[38;5;9m�[1m[It] should create a node with more than one NetworkInterface�[0m
2023-08-21T19:16:12.6168893Z   �[38;5;243m/home/runner/work/karpenter/karpenter/test/suites/integration/networkinterfaces_test.go:155�[0m
2023-08-21T19:16:12.6169225Z 
2023-08-21T19:16:12.6169482Z �[38;5;9m�[1mRan 91 of 93 Specs in 5523.602 seconds�[0m
2023-08-21T19:16:12.6170324Z �[38;5;9m�[1mFAIL!�[0m -- �[38;5;10m�[1m89 Passed�[0m | �[38;5;9m�[1m2 Failed�[0m | �[38;5;11m�[1m0 Pending�[0m | �[38;5;14m�[1m2 Skipped�[0m
  [FAILED] Expected
      <*string | 0xc00141d648>: a test network interface
  to equal
      <string>: a test network interface
  In [It] at: /home/runner/work/karpenter/karpenter/test/suites/integration/networkinterfaces_test.go:121 @ 08/21/23 19:15:08.693
  < Exit [It] should use the specified NetworkInterface - 
  [FAILED] Expected
      <*string | 0xc001279b28>: a test network interface
  to equal
      <string>: a test network interface
  In [It] at: /home/runner/work/karpenter/karpenter/test/suites/integration/networkinterfaces_test.go:155 @ 08/21/23 19:16:00.53
  < Exit [It] should create a node with more than one NetworkInterface - 

@myaser myaser force-pushed the networkinterfaces-simplified branch from 5155663 to a5558ac Compare August 22, 2023 09:03
@myaser
Copy link
Contributor Author

myaser commented Aug 22, 2023

Looks like the network interfaces e2e are still failing:

Sorry about that; I did not manage to run the e2e tests myself because of a different environment setup. Previously I had to hack around it and introduce changes to Makefile and test runner, which soon became outdated with upstream changes.

I hope the last commits have fixed it

@bwagner5
Copy link
Contributor

Looks like the network interfaces e2e are still failing:

Sorry about that; I did not manage to run the e2e tests myself because of a different environment setup. Previously I had to hack around it and introduce changes to Makefile and test runner, which soon became outdated with upstream changes.

I hope the last commits have fixed it

No worries! I'll kick it off!

Copy link
Contributor

@bwagner5 bwagner5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/karpenter snapshot

@github-actions
Copy link
Contributor

Snapshot successfully published to oci://public.ecr.aws/karpenter/karpenter:v0-a5558ac80077f5c1dd6856945b73731dfe66904a. Find the image tag and installation instructions at https://gallery.ecr.aws/karpenter/karpenter/

Copy link
Contributor

@bwagner5 bwagner5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/karpenter snapshot

@github-actions
Copy link
Contributor

Snapshot successfully published to oci://public.ecr.aws/karpenter/karpenter:v0-902336244f2750618749aab26e9f6cf5eba811c3. Find the image tag and installation instructions at https://gallery.ecr.aws/karpenter/karpenter/

@bwagner5
Copy link
Contributor

Ok, looks like the non-Integ e2e tests failed due to some transient infra issue. But the Integ tests now only have 2 failures, looking at those now:

Summarizing 2 Failures:
  [FAIL] NetworkInterfaces [It] should create a default NetworkInterface if none specified, with no public IP auto assignment
  /home/runner/work/karpenter/karpenter/test/suites/integration/networkinterfaces_test.go:50
  [FAIL] Scheduling [It] should provision three nodes for a zonal topology spread
  /home/runner/work/karpenter/karpenter/test/suites/integration/scheduling_test.go:325

Copy link
Contributor

@bwagner5 bwagner5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/karpenter snapshot

Copy link
Contributor

@jmdeal jmdeal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/karpenter snapshot

@github-actions
Copy link
Contributor

Snapshot successfully published to oci://071440425669.dkr.ecr.us-east-1.amazonaws.com/karpenter/snapshot/karpenter:v0-6771deec5719c4b408781a8e1445bc455d731a1b.

Copy link
Contributor

@jmdeal jmdeal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/karpenter snapshot

@github-actions
Copy link
Contributor

Snapshot successfully published to oci://071440425669.dkr.ecr.us-east-1.amazonaws.com/karpenter/snapshot/karpenter:v0-3ecd2d2efc197672c14dfdc58f666dcb852903be.

@jmdeal jmdeal force-pushed the networkinterfaces-simplified branch from 3009c77 to 8340b00 Compare October 26, 2023 17:54
@jmdeal jmdeal force-pushed the networkinterfaces-simplified branch from 8340b00 to 44d2973 Compare October 26, 2023 17:55
Copy link
Contributor

@jmdeal jmdeal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/karpenter snapshot

@github-actions
Copy link
Contributor

Snapshot successfully published to oci://071440425669.dkr.ecr.us-east-1.amazonaws.com/karpenter/snapshot/karpenter:v0-3d378e3739908c63104376fc94f8ee28f2305c35.

Groups: lo.Map(options.SecurityGroups, func(s v1beta1.SecurityGroup, _ int) *string { return aws.String(s.ID) }),
},
func (p *Provider) generateNetworkInterfaces(options *amifamily.LaunchTemplate) []*ec2.LaunchTemplateInstanceNetworkInterfaceSpecificationRequest {
if len(options.NetworkInterfaces) == 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking we should drop this behavior now, looking at the quoted issue it looks like the desired long term solution was adding these explicit fields. Thoughts @jonathan-innis?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could see us marking this behavior as deprecated and then choosing to drop this behavior entirely at v1. We can add this to the v1 laundry list that is getting tracked in this issue: #4993

Copy link
Contributor

@jonathan-innis jonathan-innis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should consider adding an example to the examples/v1beta1 directory as well when we add the interfaces to the API

AssociatePublicIPAddress *bool `json:"associatePublicIPAddress,omitempty"`

// A description for the network interface.
Description *string `json:"description,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How necessary is the description here? This feels ancillary and probably something that we can wait to add until users ask for it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not really necessary, its used in tests to grab the correct network interface but that can be done with the device index.

@@ -312,6 +315,18 @@ type BlockDevice struct {
VolumeType *string `json:"volumeType,omitempty"`
}

type NetworkInterface struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason that all of these values need to be pointers? I think an empty value is probably equal to not set? Typically, you need pointers when an empty value is still a valid value, but I'm not sure that that's the case for any of these fields

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll need to double check but I believe the zero value for AssociatePublicIPAddress would not be an acceptable default. If it's not specified in the EC2NodeClass we don't want to specify it in the launch template either that way we don't override the subnets behavior.

Groups: lo.Map(options.SecurityGroups, func(s v1beta1.SecurityGroup, _ int) *string { return aws.String(s.ID) }),
},
func (p *Provider) generateNetworkInterfaces(options *amifamily.LaunchTemplate) []*ec2.LaunchTemplateInstanceNetworkInterfaceSpecificationRequest {
if len(options.NetworkInterfaces) == 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could see us marking this behavior as deprecated and then choosing to drop this behavior entirely at v1. We can add this to the v1 laundry list that is getting tracked in this issue: #4993

env.EventuallyExpectHealthy(pod)
env.ExpectCreatedNodeCount("==", 1)
instance := env.GetInstance(pod.Spec.NodeName)
for _, interfaceSpec := range interfaces {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that we have a function that grabs the interfaces from the instance? Can we leverage this function rather than grabbing the instance and doing things with checking the fields?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll take a look 👍

}
}
},
Entry("when a single interface is specified", &v1beta1.NetworkInterface{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each of these tests adds time. I like our bias for code coverage, but I think that we might be able to get away with testing the most complex scenario in the E2E test environment, condensing this down to a single test and then testing a bunch of different edge-case scenarios within the functional/unit testing

# optional, configures IMDS for the instance
metadataOptions:
httpEndpoint: enabled
httpProtocolIPv6: disabled
httpPutResponseHopLimit: 2
httpTokens: required

# optional, configure network interfaces for the instance
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is missing the networkInterfaces header

Copy link
Contributor

This PR has been inactive for 14 days. StaleBot will close this stale PR after 14 more days of inactivity.

@jmdeal
Copy link
Contributor

jmdeal commented Nov 20, 2023

Removing lifecycle/stale, this PR is still under consideration. There's been some discussion on whether Karpenter is the ideal place for this configuration surface to live, I plan on trying to move on this soon.

@garvinp-stripe
Copy link
Contributor

Can you provide some update on the current thought?

@jmdeal
Copy link
Contributor

jmdeal commented Nov 22, 2023

We're hesitant to expand the EC2NodeClass to allow for configuring multiple network interfaces if it isn't strictly necessary. There were two separate use cases in the original issue: configuring the AssignPublicIpAddress field, and configuring multiple network interfaces for EFA support. AssignPublicIpAddress can only be used if you're configuring a single network interface; you can't assign a public IP in this manner to an instance launched with multiple network interfaces. As for EFA, we currently have a PR out which enables launching EFA instances without the need for explicit network interface configuration. The path forward here will likely be scoping this PR back to only include an AssignPublicIpAddress field on the EC2NodeClass rather than a NetworkInterfaces field.

@garvinp-stripe
Copy link
Contributor

We might be the odd man out however we do leverage attaching multiple network interfaces to our instances outside of Public IP and EFA, we use a custom CNI that expects a secondary ENI. This PR solves this however we could get around this issue.

One thing I am curious about Karpenter's position is that the AWS provider variant of Karpenter essentially replaces ASGs and Launch Templates in AWS for us however there is a feature disparity between ASGs and Karpenter node management. Its true we may not need to match the functionalities however in my opinion the bias should be towards closing the disparity rather than away. This isn't specific to network interface support but rather as a whole how does/ should Karpenter think about what features it should be keeping from the components it is replacing vs dropping.

@jmdeal
Copy link
Contributor

jmdeal commented Dec 11, 2023

@garvinp-stripe are you able to elaborate about your use-case, preferably in an issue where it's easier to track? We're certainly not against bringing Karpenter closer to feature parity with launch templates, but we want to drive those decisions by user need rather than adding features because they are in launch templates. In this case, the problems outlined in the original issue either didn't need the full scope or were solved through other mechanisms. If you have a use-case for this we're more than happy to continue the discussion.

@myaser
Copy link
Contributor Author

myaser commented Dec 18, 2023

The path forward here will likely be scoping this PR back to only include an AssignPublicIpAddress field on the EC2NodeClass rather than a NetworkInterfaces field.

@jmdeal

ok, I can work on that. but I will do a separate PR.

@jmdeal
Copy link
Contributor

jmdeal commented Dec 18, 2023

Sounds good! I'll go ahead and close this PR out then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

configure network interfaces
5 participants