Add network topology information to nodes #1025

wwvela · 2024-09-19T00:32:10Z

What type of PR is this?
/kind feature

What this PR does / why we need it:
This PR is to Add network topology information to nodes for Issue #998.

Which issue(s) this PR fixes:
Fixes #998

Does this PR introduce a user-facing change?:
NONE

k8s-ci-robot · 2024-09-19T00:32:12Z

Adding the "do-not-merge/release-note-label-needed" label because no release-note block was detected, please follow our release note process to remove it.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

k8s-ci-robot · 2024-09-19T00:32:17Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign kishorj for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot · 2024-09-19T00:32:18Z

This issue is currently awaiting triage.

If cloud-provider-aws contributors determine this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

k8s-ci-robot · 2024-09-19T00:32:19Z

Hi @wwvela. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

cartermckinnon · 2024-09-19T04:49:14Z

pkg/providers/v1/instances_v2.go

+		topology, err := c.ec2.DescribeInstanceTopology(topologyRequest)
+		if err != nil || topology == nil {
+			klog.Infof("topology api not supported for instance type: %s or region: %s", instanceType, region)
+			return additionalLabels, nil


you shouldn't return here, you should wrap the following block in an else

As we will use the supportedInstanceType instead of instance prefix now, the error throw our there should not because not supported issue. I throw the error directly here in new version

cartermckinnon · 2024-09-19T04:49:32Z

pkg/providers/v1/instances_v2.go

+	if c.topologySupportedInstance(instanceType) && slices.Contains(c.cfg.Global.TopologySupportedRegions, region) {
+		topologyRequest := &ec2.DescribeInstanceTopologyInput{InstanceIds: []*string{&instanceID}}
+		topology, err := c.ec2.DescribeInstanceTopology(topologyRequest)
+		if err != nil || topology == nil {


we need to check what the err is before we say that the instance isn't supported

I revised the topologySupportedInstance to check the supported instance types instead of the instance prefix, so the DescribeInstanceTopology api should supported here. I will throw error directly

cartermckinnon · 2024-09-19T04:50:06Z

pkg/providers/v1/instances_v2.go

 	return additionalLabels, nil
 }

+// TopologySupportedInstancePrefixes is help to filter instance types supported by topology API
+func (c *Cloud) topologySupportedInstance(instanceType string) bool {


Suggested change

func (c *Cloud) topologySupportedInstance(instanceType string) bool {

func (c *Cloud) isTopologySupportedInstance(instanceType string) bool {

cartermckinnon · 2024-09-19T04:55:07Z

pkg/providers/v1/instances_v2.go

+// TopologySupportedInstancePrefixes is help to filter instance types supported by topology API
+func (c *Cloud) topologySupportedInstance(instanceType string) bool {
+	for _, prefix := range c.cfg.Global.TopologySupportedInstancePrefixes {
+		if strings.HasPrefix(instanceType, prefix) {


I think we probably just want to work with a list of fully-formed instance type names, prefix matching has some ambiguity and configuration footguns ("p4" would match all p4dn instances, for example). The list isn't going to change very often and we don't need the benefit of shorthand IMO

Do we want to be updating the cloud-provider each time there's a new AWS instance type, though?

would it be a bad idea to do some dynamic discover based on an DescribeInstanceTopology result? then cache the instance types' support with a TTL? from what i can tell the api just returns an empty list when the instance-id's type is not supported

I discussed this with Carter before. We don't want to trigger this api for all nodes, because some cx might have hundreds or thousands of nodes in their cluster, if we call this api for all nodes, it very likely trigger throttle issue.

And DescribeInstanceTopology only supported for very limited instances types which lots of cx might not use. check the supported instance type and regions here. https://docs.aws.amazon.com/AWSEC2/latest/APIReference/API_DescribeInstanceTopology.html

Do we want to be updating the cloud-provider each time there's a new AWS instance type, though?

We don't need to update cloud-provider, TopologySupportedInstanceTypes and TopologySupportedRegions is added in cloud-config where will accept the input in KCP CCM manifest. We only need to update the manifest with the cloud-config input.

cartermckinnon · 2024-09-19T05:02:34Z

pkg/providers/v1/instances_v2.go

+		for index, networkNode := range topology.NetworkNodes {
+			label := LabelNetworkNode + strconv.Itoa(index)
+			additionalLabels[label] = *networkNode
+		}


I'm not clear how you would use this. How would you "query" the labels using a pod's nodeSelector, or a telemetry system, if the index of the networkNode appears in the label key?

after reading through the instance topology docs i think i have an idea how this works?

Assuming the goal is to run workloads that share the same topology, you'd need to know the network nodes that your capacity sits on first, then use that in a nodeSelector for the corresponding layer index. AFAIK there's no way to apply a nodeSelector based on some heuristic like "schedule pods to nodes where the label topology.k8s.aws/network-node-layer-1 matches", so this logic would have to be handled by a high level system. the only guarentee is that there are always at least 3 layers.

However, this scheme breaks down if network nodes can be part of different layers, which i couldn't immediately grok from the docs but i assumed wasn't possible.

this scheme breaks down if network nodes can be part of different layers

yep, if trees with different roots can overlap (share network nodes).

The long form doc is helpful, the API doc doesn’t even say that order is meaningful in the networkNodeSet. 😄

I’m wondering if the “layer” number needs to be in the labels. Is it enough to just say “schedule this pod within this networkNode” instead of “within this networkNode which is the layer 2 networkNode for the instance”?

—-

A primary motivation for this feature was observability, being able to correlate issues with something more granular than AZ. If you want to slice and dice your instance metrics using network nodes, should you care about the layer of the network node?

If we inverted this, so that the label key contains the networkNode and the value is the layer number, that would allow you to use the Exists operator in a nodeAffinity to place the pod within a networkNode, without needing the layer to also match: https://kubernetes.io/docs/reference/kubernetes-api/common-definitions/node-selector-requirement/#NodeSelectorRequirement

yep, if trees with different roots can overlap (share network nodes).

We can assume that's not the case for now.

A primary motivation for this feature was observability, being able to correlate issues with something more granular than AZ

Do you mean the feature to apply arbitrary labels from within the cloud provider? I think observability at the zone-id level is where we started, but I'd like to think we can help with scheduling here too. The next thing I'm thinking is a label that describes whether an instance has a GPU or not, so we don't need to schedule GPU pods (Device Plugin, EFA etc) on nodes that don't need it etc.

Assuming the goal is to run workloads that share the same topology, you'd need to know the network nodes that your capacity sits on first, then use that in a nodeSelector for the corresponding layer index.

Right that's what I was expecting you'd need to do. You don't know the topology until you launch the instance first, so it forces this ordering of sorts where you launch the instance first, you realize what your network nodes are, and you should ideally see a common network node across your instances. I think you could maybe define weighted node affinities across the network node layers so you pack it as close as possible?

cartermckinnon · 2024-09-19T05:05:15Z

/ok-to-test

cartermckinnon · 2024-09-19T05:05:57Z

/kind feature

ndbaker1 · 2024-09-19T07:19:53Z

pkg/providers/v1/aws_ec2.go

+	} else if len(resp.Instances) == 0 {
+		return nil, nil
+	}
+	return resp.Instances[0], err


this wrapper's logic seems strange, shouldn't we return the list of []*ec2.InstanceTopology instead of just the first item?

DescribeInstanceTopology can accept a list of instance IDs, if we pass only 1 instance ID, it will return 1 according topology info something like below.

"Instances": [ { "InstanceId": "i-xxx", "InstanceType": "p5e.48xlarge", "NetworkNodes": [ "nn-xxxxx", "nn-xxxxx", "nn-xxxxx" ], "AvailabilityZone": "us-west-2c", "ZoneId": "usw2-az3" } ]

But yeah, it seems more reasonable to return all instances in api and get the first instance in where trigger it and get result. Will revised in next

agreed, because we still allow the full ec2.DescribeInstanceTopologyInput payload which could specify more than one instance id

ndbaker1 · 2024-09-19T07:37:25Z

pkg/providers/v1/well_known_labels.go

@@ -20,4 +20,5 @@ const (
 	// LabelZoneID is a topology label that can be applied to any resource
 	// but will be initially applied to nodes.
 	LabelZoneID = "topology.k8s.aws/zone-id"
+	LabelNetworkNode = "topology.k8s.aws/network-node-"


Suggested change

LabelNetworkNode = "topology.k8s.aws/network-node-"

LabelNetworkNode = "topology.k8s.aws/network-node-layer-"

IIUC, the info you're embedding into the key is about the layer of the network node in the node set

revised in next version

ndbaker1 · 2024-09-19T07:47:30Z

pkg/providers/v1/instances_v2.go

+		for index, networkNode := range topology.NetworkNodes {
+			label := LabelNetworkNode + strconv.Itoa(index)
+			additionalLabels[label] = *networkNode
+		}


after reading through the instance topology docs i think i have an idea how this works?

Assuming the goal is to run workloads that share the same topology, you'd need to know the network nodes that your capacity sits on first, then use that in a nodeSelector for the corresponding layer index. AFAIK there's no way to apply a nodeSelector based on some heuristic like "schedule pods to nodes where the label topology.k8s.aws/network-node-layer-1 matches", so this logic would have to be handled by a high level system. the only guarentee is that there are always at least 3 layers.

However, this scheme breaks down if network nodes can be part of different layers, which i couldn't immediately grok from the docs but i assumed wasn't possible.

ndbaker1 · 2024-10-19T05:42:57Z

pkg/providers/v1/aws_ec2.go

+func (s *awsSdkEC2) DescribeInstanceTopology(request *ec2.DescribeInstanceTopologyInput) ([]*ec2.InstanceTopology, error) {
+	resp, err := s.ec2.DescribeInstanceTopology(request)


just realized this api is paginated, should we use logic similar to the DescribeInstances wrapper below?

The result of the DescribeInstanceTopology is actually only 1 element in the array because we will not do group describe for multiple instances. And it will only return 1 topology info for 1 instance something like

{ "Instances": [ { "InstanceId": "i-xxx", "InstanceType": "p5e.48xlarge", "NetworkNodes": [ "nn-xxx", "nn-xxx", "nn-xxx" ], "AvailabilityZone": "us-west-2c", "ZoneId": "usw2-az3" } ] }

did we test this with large instances because even though the result can be only 1 instance because of filters, not all filters are applied at the server side for the ec2 and for filter which are applied later, we have to go over multiple pages?

For our current usage, a lack of pagination should be fine. The only risk would be if we were re-using for some other use where we were describing multiple instances, which, AFAIK, isn't the type of thing that we ever do in the cloud provider.

I could be wrong, but when you're supplying a primary key like an instance ID, I've never seen EC2 not return the resource in question on the first call.

ndbaker1 · 2024-10-19T05:42:59Z

pkg/providers/v1/aws_ec2.go

+	} else if len(resp.Instances) == 0 {
+		return nil, nil


is this branch necessary? wouldn't it be better to have an empty array as a default with no error?

changed to return []*ec2.InstanceTopology{}, nil

ndbaker1 · 2024-10-19T05:47:14Z

pkg/providers/v1/topology.go

+func (t *topologyCache) mightSupportTopology(instanceType string, region string) bool {
+	t.mutex.RLock()
+	defer t.mutex.RUnlock()


this lock might need to belong in the caller getNodeTopology? since that's where the possible writes occur to the data

ndbaker1 · 2024-10-19T05:49:29Z

pkg/providers/v1/topology.go

+	unsupportedInstance []string
+	unsupportedRegion   []string
+}


just future proofing here, is there a minimal cache with TTL implementation we could use?

I think it is not neccessary for the topology catch to have TTL implementation, as the already supported instance type and region will not changed

ndbaker1 · 2024-10-19T05:53:35Z

pkg/providers/v1/topology.go

+		}
+		return topology[0], nil
+	}
+	return nil, nil


maybe a static error to throw here is better? it seems odd to return no error with nil

then we could check something like errors.Is(err, ErrTopologyUnsupported)

Why we need to throw error if the instance not supported by the api? as we will just skip adding the topology label for it

kmala · 2024-10-21T16:41:09Z

pkg/providers/v1/iface/types.go

@@ -8,6 +8,7 @@ import "github.com/aws/aws-sdk-go/service/ec2"
 type EC2 interface {
 	// Query EC2 for instances matching the filter
 	DescribeInstances(request *ec2.DescribeInstancesInput) ([]*ec2.Instance, error)
+	DescribeInstanceTopology(request *ec2.DescribeInstanceTopologyInput) ([]*ec2.InstanceTopology, error)


we need to update the docs right for this new permission?
https://github.com/kubernetes/cloud-provider-aws/blob/master/docs/prerequisites.md
Also can you update the description with release notes to make sure users are aware of this while upgrading.

kmala · 2024-10-21T16:42:53Z

pkg/providers/v1/instances_v2.go

 	v1 "k8s.io/api/core/v1"
 	"k8s.io/apimachinery/pkg/types"
 	cloudprovider "k8s.io/cloud-provider"
+	"strconv"


nit: can you sort the imports

kmala · 2024-10-21T16:55:24Z

pkg/providers/v1/instances_v2.go

+		return nil, err
+	}
+
+	additionalLabels, err := c.getAdditionalLabels(zone.FailureDomain, string(instanceID), instanceType, zone.Region)


This function is called regularly for each node, what would be the impact on the performance and also will it cause any ec2 limits issue as we are making another ec2 api?

it should not cause ec2 limit. The topologyCache is designed to avoid this situation, the describeInstanceTopology api will only triggered once for those unsupported regions and nodes with unsupported instance types and will cache it in the topologyCache. So describeInstanceTopology api will only triggered for supported instance type or those not cached unsupported instance types.

kmala · 2024-10-21T17:02:11Z

pkg/providers/v1/aws_ec2.go

+func (s *awsSdkEC2) DescribeInstanceTopology(request *ec2.DescribeInstanceTopologyInput) ([]*ec2.InstanceTopology, error) {
+	resp, err := s.ec2.DescribeInstanceTopology(request)


did we test this with large instances because even though the result can be only 1 instance because of filters, not all filters are applied at the server side for the ec2 and for filter which are applied later, we have to go over multiple pages?

mmerkes · 2024-10-21T20:23:22Z

pkg/providers/v1/topology.go

+	defer t.mutex.RUnlock()
+	if t.mightSupportTopology(instanceType, region) {
+		topologyRequest := &ec2.DescribeInstanceTopologyInput{InstanceIds: []*string{&instanceID}}
+		topology, err := t.cloud.ec2.DescribeInstanceTopology(topologyRequest)


Per offline discussion, we need to gracefully handle access denied when making this API call. Non-EKS customers may not have enough permissions to make their call, and they can update them if they want this functionality.

add the grace handle for permission missing and add the e2e test

mmerkes · 2024-10-22T16:47:54Z

hack/e2e/run.sh

@@ -128,7 +128,7 @@ if [[ "${UP}" = "yes" ]]; then
      --run-id="${test_run_id}" \
      --cloud-provider=aws \
      --cluster-name="${CLUSTER_NAME}" \
-      --create-args="--dns=none --zones=${ZONES} --node-size=m5.large --master-size=m5.large --override=cluster.spec.cloudControllerManager.cloudProvider=aws --override=cluster.spec.cloudControllerManager.image=${IMAGE_NAME}:${IMAGE_TAG}" \
+      --create-args="--dns=none --zones=${ZONES} --node-size=p4d.24xlarge --master-size=m5.large --override=cluster.spec.cloudControllerManager.cloudProvider=aws --override=cluster.spec.cloudControllerManager.image=${IMAGE_NAME}:${IMAGE_TAG}" \


This is not going to work. These instance types are super expensive and not readily available, so it won't be practical to test here.

Gotcha. I have change the instance type back to m5.large in latest change

But the instance type supported by the DescribeInstanceTopology are expensive. So I'm not sure if can still use the e2e to test this change or it will be better to test it by creating a dev cluster and replace the CCM manually

mmerkes · 2024-10-22T17:09:38Z

tests/e2e/nodes.go

+    	framework.ExpectNoError(err)
+
+    	client, err := kubernetes.NewForConfig(clientConfig)
+    	framework.ExpectNoError(err)


spacing is off here

Should be able to just run go fmt or whatever the k8s dev command is.

Revised and verified by make check

mmerkes · 2024-10-22T17:10:01Z

tests/e2e/nodes.go

@@ -44,4 +47,62 @@ var _ = Describe("[cloud-provider-aws-e2e] nodes", func() {
 			gomega.Expect(node.Labels).To(gomega.HaveKey("topology.k8s.aws/zone-id"))
 		}
 	})
+
+	It("should correctly label nodes with instance type p3dn.24xlarge", func(ctx context.Context) {


Test name is out of date.

mmerkes · 2024-10-22T17:37:18Z

pkg/providers/v1/aws_ec2.go

+func (s *awsSdkEC2) DescribeInstanceTopology(request *ec2.DescribeInstanceTopologyInput) ([]*ec2.InstanceTopology, error) {
+	resp, err := s.ec2.DescribeInstanceTopology(request)


For our current usage, a lack of pagination should be fine. The only risk would be if we were re-using for some other use where we were describing multiple instances, which, AFAIK, isn't the type of thing that we ever do in the cloud provider.

I could be wrong, but when you're supplying a primary key like an instance ID, I've never seen EC2 not return the resource in question on the first call.

mmerkes · 2024-10-22T17:41:54Z

pkg/providers/v1/topology.go

+		topologyRequest := &ec2.DescribeInstanceTopologyInput{InstanceIds: []*string{&instanceID}}
+		topology, err := t.cloud.ec2.DescribeInstanceTopology(topologyRequest)
+		if err != nil {
+			klog.Errorf("Error describing instance topology: %q", err)


nit: This should be a warn, or have a log for each branch so that there's an explicit log for each scenario.

mmerkes · 2024-10-22T17:42:56Z

pkg/providers/v1/topology.go

+				// gracefully handle the DecribeInstanceTopology access missing error
+				klog.Infof("Not authorized to perform: ec2:DescribeInstanceTopology, permission missing")
+				return nil, nil
+			}


@cartermckinnon Does this cover all of the scenarios that you talked about that should be gracefully handled?

Yep I believe so

mmerkes · 2024-10-22T17:44:56Z

pkg/providers/v1/topology.go

+	unsupportedInstance []string
+	unsupportedRegion   []string


nit: Wouldn't a map be more efficient for these?

mmerkes · 2024-10-22T17:46:01Z

pkg/providers/v1/topology.go

+	// if both instanceType and region are not in unsupported cache, the instance type and region might be supported
+	// or we haven't check the supportness and cache it yet. If they are unsupported and not cached, we will run
+	// describeTopology api once for them
+	// Initialize the map if it's unset


nit: remove

mmerkes · 2024-10-22T17:47:46Z

tests/e2e/nodes.go

+			framework.Failf("Error checking EC2 describeInstanceTopology access: %v", err)
+		}
+		allowed := result.Status.Allowed
+		supportedInstanceType := "p4d.24xlarge"


nit: May be brittle if topology information is ever backported to vanilla instances, though would be easy enough to fix.

mmerkes · 2024-10-22T17:48:58Z

tests/e2e/nodes.go

+		if err != nil {
+			framework.Failf("Error checking EC2 describeInstanceTopology access: %v", err)
+		}
+		allowed := result.Status.Allowed


Is this not something that we can have expectations on what it will be?

mmerkes · 2024-10-22T18:28:54Z

/retest

mmerkes · 2024-10-22T22:53:16Z

/retest

mmerkes · 2024-10-23T04:54:50Z

/retest

cartermckinnon · 2024-10-23T05:10:30Z

pkg/providers/v1/topology.go

+		topology, err := t.cloud.ec2.DescribeInstanceTopology(topologyRequest)
+		if err != nil {
+			klog.Errorf("Error describing instance topology: %q", err)
+			if strings.Contains(err.Error(), "The functionality you requested is not available in this region") {


we should be casting these to an awserr and then comparing against error code constants, instead of doing string things.

see: https://docs.aws.amazon.com/sdk-for-go/v1/developer-guide/handling-errors.html

cartermckinnon · 2024-10-23T05:14:59Z

pkg/providers/v1/topology.go

+			return nil, err
+		} else if len(topology) == 0 {
+			// instanceType is not support topology info and the result is empty
+			t.unsupportedInstance = append(t.unsupportedInstance, instanceType)


you haven't acquired a write lock before modifying supportedInstance/supportedRegion

cartermckinnon · 2024-10-23T05:17:26Z

pkg/providers/v1/topology.go

+	if t.unsupportedInstance == nil {
+		t.unsupportedInstance = []string{}
+	}
+	if t.unsupportedRegion == nil {
+		t.unsupportedRegion = []string{}
+	}


This can be done when you create the topologyCache, and I'd also move the locking down into this function instead of assuming the caller has acquired the correct lock

cartermckinnon · 2024-10-23T05:19:13Z

pkg/providers/v1/topology.go

+				// gracefully handle the DecribeInstanceTopology access missing error
+				klog.Infof("Not authorized to perform: ec2:DescribeInstanceTopology, permission missing")
+				return nil, nil
+			}


Yep I believe so

k8s-ci-robot · 2024-10-23T05:47:39Z

@wwvela: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
pull-cloud-provider-aws-e2e-kubetest2-quick	`6baf290`	link	false	`/test pull-cloud-provider-aws-e2e-kubetest2-quick`
pull-cloud-provider-aws-e2e-kubetest2	`6baf290`	link	false	`/test pull-cloud-provider-aws-e2e-kubetest2`

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

mmerkes · 2024-10-24T15:04:15Z

I'm taking over this change from @wwvela in #1038. She's on vacation, so I'm trying to get this over the line. I've done some significant refactoring to address issues discussed in this review, even some I said I wouldn't fix...

Paginate DescribeInstanceTopology for completeness.
Expire cache after 24 hours when checking of network topology information is available.
Optimize adding additional labels: Both the zone ID and network topology labels will only be added if they don't already exist. In the previous implementation, we'd call DescribeInstanceTopology every time a node with a topology was available.
Properly handles error codes from AWS API call
The caching implementation should solve concurrency issues and remove the need to use a mutex

I don't see much for unit tests here, so I plan on adding those this morning. FYI @cartermckinnon @ndbaker1 @kmala

mmerkes · 2024-10-24T22:26:49Z

/close

Completed in #1038

k8s-ci-robot · 2024-10-24T22:26:55Z

@mmerkes: Closed this PR.

In response to this:

/close

Completed in #1038

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

k8s-ci-robot · 2024-10-24T22:26:58Z

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

add topology labels

374e420

k8s-ci-robot added do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Sep 19, 2024

k8s-ci-robot requested review from kmala and mmerkes September 19, 2024 00:32

k8s-ci-robot added the needs-kind Indicates a PR lacks a `kind/foo` label and requires one. label Sep 19, 2024

k8s-ci-robot added needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Sep 19, 2024

k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Sep 19, 2024

cartermckinnon reviewed Sep 19, 2024

View reviewed changes

k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Sep 19, 2024

k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. and removed needs-kind Indicates a PR lacks a `kind/foo` label and requires one. labels Sep 19, 2024

ndbaker1 reviewed Sep 19, 2024

View reviewed changes

wwvela and others added 4 commits September 19, 2024 12:28

Add network topology information to nodes

9e72a88

Merge branch 'kubernetes:master' into master

cd5d92a

use topology cache

150f58e

fix the pull-cloud-provider-aws-check failure

984187e

ndbaker1 reviewed Oct 19, 2024

View reviewed changes

kmala reviewed Oct 21, 2024

View reviewed changes

fix the test failure

05a0a88

mmerkes reviewed Oct 21, 2024

View reviewed changes

graceful handle access missing and add e2e test

08e2c6c

update the supported instance type in e2e test

ee9743a

mmerkes reviewed Oct 22, 2024

View reviewed changes

update thee2e test

49abbbe

mmerkes reviewed Oct 22, 2024

View reviewed changes

address the comments

6baf290

mmerkes reviewed Oct 22, 2024

View reviewed changes

cartermckinnon reviewed Oct 23, 2024

View reviewed changes

mmerkes mentioned this pull request Oct 24, 2024

Add network topology information to nodes (continued) #1038

Merged

k8s-ci-robot closed this Oct 24, 2024

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 24, 2024

	func (c *Cloud) topologySupportedInstance(instanceType string) bool {
	func (c *Cloud) isTopologySupportedInstance(instanceType string) bool {

	LabelNetworkNode = "topology.k8s.aws/network-node-"
	LabelNetworkNode = "topology.k8s.aws/network-node-layer-"

		func (s awsSdkEC2) DescribeInstanceTopology(request ec2.DescribeInstanceTopologyInput) ([]*ec2.InstanceTopology, error) {
		resp, err := s.ec2.DescribeInstanceTopology(request)

Add network topology information to nodes #1025

Add network topology information to nodes #1025

Conversation

wwvela commented Sep 19, 2024 • edited Loading

k8s-ci-robot commented Sep 19, 2024

k8s-ci-robot commented Sep 19, 2024

k8s-ci-robot commented Sep 19, 2024

k8s-ci-robot commented Sep 19, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ndbaker1 Sep 19, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cartermckinnon Sep 19, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cartermckinnon commented Sep 19, 2024

cartermckinnon commented Sep 19, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ndbaker1 Sep 19, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mmerkes commented Oct 22, 2024

mmerkes commented Oct 22, 2024

mmerkes commented Oct 23, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wwvela commented Sep 19, 2024 •

edited

Loading

ndbaker1 Sep 19, 2024 •

edited

Loading

cartermckinnon Sep 19, 2024 •

edited

Loading

ndbaker1 Sep 19, 2024 •

edited

Loading