Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support IP configuration for multicard instances #2031

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

nkvetsinski
Copy link
Contributor

@nkvetsinski nkvetsinski commented Oct 30, 2024

Issue #, if available:

Description of changes:

This PR is a followup from a previous one, where we decided to change the approach and let nodeadm generate .network files that will be handled by systemd-networkd service.

In this PR, nodeadm will create /etc/systemd/network/70-eni-${eni_id}.network files for each non 0 indexed card. It will skip cards that are 0 indexed or non 0 cards that don't have IP configured from EC2.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Testing Done

I tested pod-pod networking and also utilizing the non 0 indexed interfaced from pods that were running in the host network namespace. Here are some test results:

[root@ip-192-168-181-144 bin]# networkctl 
IDX LINK           TYPE     OPERATIONAL SETUP
  1 lo             loopback carrier     unmanaged
  2 ens32          ether    routable    configured
  3 ens65          ether    routable    configured
  4 ens129         ether    routable    configured
  5 ens161         ether    routable    configured
  6 eni35db1707153 ether    degraded    unmanaged
  7 enifa1e02dfca4 ether    degraded    unmanaged
  9 eni9e39c60014c ether    degraded    unmanaged
102 ens34          ether    routable    unmanaged

9 links listed.
[root@ip-192-168-181-144 bin]# networkctl status ens65 --no-pager
● 3: ens65                      
                     Link File: /usr/lib/systemd/network/99-default.link
                  Network File: /etc/systemd/network/70-eni-0fd24c0f08dc8a05e.network
                         State: routable (configured)
                  Online state: online
                          Type: ether
                          Path: pci-0000:20:01.0
                        Driver: ena
                        Vendor: Amazon.com, Inc.
                         Model: Elastic Network Adapter (ENA)
             Alternative Names: enp32s1
              Hardware Address: 0e:06:85:0a:d1:37
                           MTU: 9001 (min: 128, max: 9216)
                         QDisc: mq
  IPv6 Address Generation Mode: eui64
      Number of Queues (Tx/Rx): 96/96
                       Address: 192.168.162.178 (DHCP4 via 192.168.128.1)
                                2600:1f14:2322:ad02:fa95:2deb:2218:265c
                                fe80::c06:85ff:fe0a:d137
                       Gateway: 192.168.128.1
                                fe80::c5d:c8ff:fed5:e829
                           DNS: 192.168.0.2
                Search Domains: us-west-2.compute.internal
             Activation Policy: up
           Required For Online: yes
               DHCP4 Client ID: IAID:0x5430a1e6/DUID
             DHCP6 Client IAID: 0x5430a1e6
             DHCP6 Client DUID: DUID-EN/Vendor:0000ab11b0953346daeed0ad

Nov 15 01:08:12 localhost systemd-networkd[24406]: ens65: Link UP
Nov 15 01:08:12 localhost systemd-networkd[24406]: ens65: Gained carrier
Nov 15 01:08:13 localhost systemd-networkd[24406]: ens65: DHCPv4 address 192.168.162.178/18, gateway 192.168.128.1 acquired from 192.168.128.1
Nov 15 01:08:13 localhost systemd-networkd[24406]: ens65: Gained IPv6LL
Nov 15 01:08:13 localhost systemd-networkd[24406]: ens65: DHCPv6 address 2600:1f14:2322:ad02:fa95:2deb:2218:265c/128 (valid for 7min 29s, preferred for 2min 19s)
Nov 15 01:08:27 ip-192-168-181-144.us-west-2.compute.internal systemd-networkd[24406]: ens65: Reconfiguring with /etc/systemd/network/70-eni-0fd24c0f08dc8a05e.network.
Nov 15 01:08:27 ip-192-168-181-144.us-west-2.compute.internal systemd-networkd[24406]: ens65: DHCP lease lost
Nov 15 01:08:27 ip-192-168-181-144.us-west-2.compute.internal systemd-networkd[24406]: ens65: DHCPv6 lease lost
Nov 15 01:08:28 ip-192-168-181-144.us-west-2.compute.internal systemd-networkd[24406]: ens65: DHCPv6 address 2600:1f14:2322:ad02:fa95:2deb:2218:265c/128 (valid for 7min 29s, preferred for 2min 19s)
Nov 15 01:08:28 ip-192-168-181-144.us-west-2.compute.internal systemd-networkd[24406]: ens65: DHCPv4 address 192.168.162.178/18, gateway 192.168.128.1 acquired from 192.168.128.1

Route tables:

[root@ip-192-168-181-144 bin]# ip route show table main
default via 192.168.128.1 dev ens32 proto dhcp src 192.168.181.144 metric 512 
default via 192.168.128.1 dev ens65 proto dhcp src 192.168.162.178 metric 613 
default via 192.168.128.1 dev ens161 proto dhcp src 192.168.130.21 metric 713 
default via 192.168.128.1 dev ens129 proto dhcp src 192.168.191.109 metric 813 
192.168.0.2 via 192.168.128.1 dev ens32 proto dhcp src 192.168.181.144 metric 512 
192.168.0.2 via 192.168.128.1 dev ens65 proto dhcp src 192.168.162.178 metric 613 
192.168.0.2 via 192.168.128.1 dev ens161 proto dhcp src 192.168.130.21 metric 713 
192.168.0.2 via 192.168.128.1 dev ens129 proto dhcp src 192.168.191.109 metric 813 
192.168.128.0/18 dev ens32 proto kernel scope link src 192.168.181.144 metric 512 
192.168.128.0/18 dev ens65 proto kernel scope link src 192.168.162.178 metric 613 
192.168.128.0/18 dev ens161 proto kernel scope link src 192.168.130.21 metric 713 
192.168.128.0/18 dev ens129 proto kernel scope link src 192.168.191.109 metric 813 
192.168.128.1 dev ens32 proto dhcp scope link src 192.168.181.144 metric 512 
192.168.128.1 dev ens65 proto dhcp scope link src 192.168.162.178 metric 613 
192.168.128.1 dev ens161 proto dhcp scope link src 192.168.130.21 metric 713 
192.168.128.1 dev ens129 proto dhcp scope link src 192.168.191.109 metric 813 
192.168.153.44 dev enifa1e02dfca4 scope link 
192.168.184.173 dev eni9e39c60014c scope link 
192.168.190.173 dev eni35db1707153 scope link 
[root@ip-192-168-181-144 bin]# 
[root@ip-192-168-181-144 bin]# ip rule show
0:	from all lookup local
512:	from all to 192.168.190.173 lookup main
512:	from all to 192.168.153.44 lookup main
512:	from all to 192.168.184.173 lookup main
1024:	from all fwmark 0x80/0x80 lookup main
10101:	from 192.168.162.178 lookup 10101 proto static
10201:	from 192.168.130.21 lookup 10201 proto static
10301:	from 192.168.191.109 lookup 10301 proto static
32766:	from all lookup main
32767:	from all lookup default
[root@ip-192-168-181-144 bin]# 
[root@ip-192-168-181-144 bin]# ip route show table 10101
default via 192.168.128.1 dev ens65 proto dhcp metric 613 
192.168.128.0/18 dev ens65 proto static scope link 
[root@ip-192-168-181-144 bin]# 
[root@ip-192-168-181-144 bin]# ip route show table 10201
default via 192.168.128.1 dev ens161 proto dhcp metric 713 
192.168.128.0/18 dev ens161 proto static scope link 
[root@ip-192-168-181-144 bin]# 
[root@ip-192-168-181-144 bin]# ip route show table 10301
default via 192.168.128.1 dev ens129 proto dhcp metric 813 
192.168.128.0/18 dev ens129 proto static scope link 

See this guide for recommended testing for PRs. Some tests may not apply. Completing tests and providing additional validation steps are not required, but it is recommended and may reduce review time and time to merge.

@cartermckinnon
Copy link
Member

@M00nF1sh can you take a look at this?

@Pavani-Panakanti
Copy link
Contributor

LGTM

@@ -89,11 +96,79 @@ func (a *networkingAspect) ensureEKSNetworkConfiguration(cfg *api.NodeConfig) er
return nil
}

func (a *networkingAspect) ensureMulticardNetworkConfiguration(cfg *api.NodeConfig) error {
var networkRestartRequired bool
routeTableId := 1001
Copy link
Member

@M00nF1sh M00nF1sh Nov 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how is this routeTableId chosen to be starts with 1001 for those multi-card ENIs? is it chosen to align with our AL2? Maybe better to align with AL2023's default behavior if there is no specific reason.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I chose it to align with AL2.

Copy link
Member

@M00nF1sh M00nF1sh Nov 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm ok with the 1000 route table thing, but prefer to align with AL2023's default behavior whenever possible. e.g. what's the route table routes for those enis set to if launch on a normal AL2023 instead of EKS's ones. i have no idea why 1000 were chosen for EKS Al2.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of have a hacky "1000" route table, unless there are reasons. the major concern i have is this hard-coded 1000 might conflicts with other products who uses secondary route tables(vpc-cni for example). So the closer it aligns with AL2023's default behavior, the less surprise to customers

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we discussed offline, the reason for having the routes in separate route tables was because I was aligning with what we have in AL2. However nowadays CNI actually skips non-zero cards so we'll go with the approach of adding the routes for non-zero cards in the main routing table.

networkRestartRequired = true
}

if networkRestartRequired {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: maybe we should combine the "reloadNetworkConfigurations" with ensureEKSNetworkConfiguration to do reload once.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can restart the network in the Setup function, after we configure both primary and multicard interfaces.

continue
}

networkInterfaceConfName := fmt.Sprintf("80-card%d.network", card.CardIndex)
Copy link
Member

@M00nF1sh M00nF1sh Nov 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC, it should be possible to have multiple ENIs per network card(with different deviceIndex), have we tested this behavior?
seems it won't work if you have multiple ENI on a network card

Copy link
Contributor Author

@nkvetsinski nkvetsinski Nov 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your point. The problem we face is just the name of the file right? I can add the deviceIndex in the name: 80-card%d-%d.network, or we can use the mac address too. Do you have any preference?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'd like to avoid couple to cardIndex&deviceIndex(it's complicated, see https://github.com/amazonlinux/amazon-ec2-net-utils/blob/e01f53f278eeb13bbdc856da921584944c825286/lib/lib.sh#L344)

I prefer to use 70-<eni-xxxx>.network where eni-xxxx is the eni id, you can get it from ec2Metadata

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants