zebra: Bring up 514 BGP neighbor sessions #18214

soumyar-roy · 2025-02-21T00:20:49Z

Issue:
When 514 inerfaces/neighbors are configured, it creates socket error, "Cannot allocate memory", when back to back V6 RA messages are tried to be sent over the socket. This prevents interface, to know its peer's link local address.

Fix:
This fix batches the RA messages using wheel timer, thus preventing socket operations overflooding.

Testing:
Monitor BGP session running sh bgp summary command

Before fix:
r1# sh bgp summary

IPv4 Unicast Summary:
BGP router identifier 192.168.1.1, local AS number 1001 VRF default vrf-id 0 BGP table version 0
RIB entries 0, using 0 bytes of memory
Peers 515, using 12 MiB of memory

Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd PfxSnt Desc
r1-eth0 4 1002 89 90 0 0 0 00:07:10 0 0 N/A
r1-eth1 4 1002 89 90 0 0 0 00:07:10 0 0 N/A
r1-eth2 4 1002 89 90 0 0 0 00:07:10 0 0 N/A
r1-eth3 4 1002 89 90 0 0 0 00:07:10 0 0 N/A
r1-eth4 4 1002 89 90 0 0 0 00:07:10 0 0 N/A
r1-eth5 4 1002 89 90 0 0 0 00:07:10 0 0 N/A

….....
r1-eth252 4 1002 31 29 0 0 0 00:02:08 0 0 N/A
r1-eth253 4 1002 31 29 0 0 0 00:02:08 0 0 N/A
r1-eth254 4 1002 31 29 0 0 0 00:02:08 0 0 N/A
r1-eth255 4 1002 31 29 0 0 0 00:02:08 0 0 N/A
r1-eth256 4 0 0 0 0 0 0 never Idle 0 N/A
r1-eth257 4 0 0 0 0 0 0 never Idle 0 N/A
r1-eth258 4 0 0 0 0 0 0 never Idle 0 N/A
r1-eth259 4 0 0 0 0 0 0 never Idle 0 N/A
r1-eth260 4 0 0 0 0 0 0 never Idle 0 N/A
……..…..
r1-eth511 4 0 0 0 0 0 0 never Idle 0 N/A
r1-eth512 4 0 0 0 0 0 0 never Idle 0 N/A
r1-eth513 4 0 0 0 0 0 0 never Idle 0 N/A
r1-eth514 4 0 0 0 0 0 0 never Idle 0 N/A
After fix:
r1# show bgp summary

IPv4 Unicast Summary:
BGP router identifier 192.168.1.1, local AS number 1001 VRF default vrf-id 0 BGP table version 0
RIB entries 0, using 0 bytes of memory
Peers 515, using 12 MiB of memory

Neighbor V r1-eth0 r1-eth1 r1-eth2 r1-eth3 r1-eth4 r1-eth5 r1-eth6 r1-eth7 r1-eth8 ....
r1-eth499 r1-eth500 r1-eth501 r1-eth502 r1-eth503 r1-eth504 r1-eth505 r1-eth506 r1-eth507 r1-eth508 r1-eth509 r1-eth510 r1-eth511 r1-eth512 r1-eth513 r1-eth514 AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd PfxSnt Desc
4 1002 87 87 0 0 0 00:07:04 0 0 N/A
4 1002 87 87 0 0 0 00:07:04 0 0 N/A
4 1002 87 87 0 0 0 00:07:04 0 0 N/A
4 1002 64 67 0 0 0 00:05:09 0 0 N/A
4 1002 87 87 0 0 0 00:07:04 0 0 N/A
4 1002 87 87 0 0 0 00:07:04 0 0 N/A
4 1002 67 70 0 0 0 00:05:22 0 0 N/A
4 1002 87 87 0 0 0 00:07:04 0 0 N/A
4 1002 87 87 0 0 0 00:07:04 0 0 N/A
4 1002 43 43 0 0 0 00:03:22 0 0 N/A
4 1002 43 43 0 0 0 00:03:22 0 0 N/A
4 1002 19 22 0 0 0 00:01:21 0 0 N/A
4 1002 43 43 0 0 0 00:03:22 0 0 N/A
4 1002 43 43 0 0 0 00:03:22 0 0 N/A
4 1002 20 23 0 0 0 00:01:30 0 0 N/A
4 1002 43 43 0 0 0 00:03:22 0 0 N/A
4 1002 43 43 0 0 0 00:03:22 0 0 N/A
4 1002 22 25 0 0 0 00:01:39 0 0 N/A
4 1002 43 43 0 0 0 00:03:22 0 0 N/A
4 1002 17 20 0 0 0 00:01:13 0 0 N/A
4 1002 43 43 0 0 0 00:03:22 0 0 N/A
4 1002 43 43 0 0 0 00:03:22 0 0 N/A
4 1002 19 22 0 0 0 00:01:22 0 0 N/A
4 1002 43 43 0 0 0 00:03:22 0 0 N/A
4 1002 43 43 0 0 0 00:03:22 0 0 N/A

routingrocks · 2025-02-21T03:12:05Z

zebra/rtadv.c

+
+		rtadv_send_packet(zvrf->rtadv.sock, ifp, RA_ENABLE);
+	} else {
+		zif->rtadv.AdvIntervalTimer -=


is it possible AdvintervalTimer will go negative, as the RTADV_TIMER_WHEEL_PERIOD_MS is larger?

Yes, it can. that time we reset it to zif->rtadv.MaxRtrAdvInterval. This is what it is now in exiting behavior. We are not really matching exact AdvIntervalTimer period, rather we are making sure, we have spent at least AdvIntervalTimer amount of time, before sending next RA.

Ok, can we prevent or validate the from going negative?

no we need to allow to go negative. This gives flexibility to send RA , say your configured RA interval(we cant control this) is not a multiple of timer expiry period. say configured RA interval is 1005 ms, timer expiry 10 ms, so we cant match 1005, say timer expires at 10, 20 ....1000 ms then 1010ms. We wont send RA upto 1000 ms. so when we check at 1010 ms, that time diff == 5 - 10 (or 1005 - 1010) == -5ms, it tells that we have waited enough, so now send RA. now if we had configured RA interval is 1000 ms, diff == 1000 - 1000 == 0, so that time also we send RA. Obviously we dont want to expire time at 1ms, where we could always find a perfect match , but it would have created too many timer events. this is the exiting logic.

got it, thanks

ton31337

As I understand the idea of this PR/proposal is to change AdvIntervalTimer (sort of random)? But maybe it's most like an OS issue and the OS should be tuned instead?

ton31337 · 2025-02-21T09:15:38Z

zebra/rtadv.c

@@ -36,6 +37,14 @@ extern struct zebra_privs_t zserv_privs;
 static uint32_t interfaces_configured_for_ra_from_bgp;
 #define RTADV_ADATA_SIZE 1024

+#define PROC_IGMP6 "/proc/net/igmp6"
+
+#define MAX_V6ADDR_LEN 33 //32 hex char + 1 null terminataor


Let's use /* */ style.

Why not INET6_ADDRSTRLEN?

Changed it to exactly 32. this is basically indicating 32 chars in V6 address hex string, say for 2001:db8:85a3::8a2e:370:7334
hex string is 20010db885a3000000008a2e03707334, which is 32 chars long

ton31337 · 2025-02-21T09:19:30Z

zebra/rtadv.c

+static void ipv6_to_raw_hex(const struct in6_addr *addr, char *output)
+{
+	for (int i = 0; i < 16; i++) {
+		sprintf(output + (i * 2), "%02x", addr->s6_addr[i]);


snprintfrr()...

zebra/zebra_router.c

zebra/zebra_router.h

zebra/rtadv.c

ton31337 · 2025-02-21T09:22:53Z

zebra/rtadv.c

+			continue;
+		}
+
+		if ((strncmp(ifname_in, ifname_found, ifname_in_len) == 0) &&


Shouldn't we do an exact match? strmatch().

strmatch is using strcmp, #define strmatch(a,b) (!strcmp((a), (b))). I was advised not to use strcmp rather use strncmp.

ton31337 · 2025-02-21T09:24:33Z

zebra/rtadv.c

+	struct zebra_vrf *zvrf = rtadv_interface_get_zvrf(ifp);
+
+	if (zif->rtadv.inFastRexmit && zif->rtadv.UseFastRexmit) {
+		if (--zif->rtadv.NumFastReXmitsRemain <= 0) {


Please drop {} where a branch is from a single statement.

ton31337 · 2025-02-21T09:32:46Z

zebra/rtadv.c

@@ -1261,7 +1325,8 @@ static void rtadv_start_interface_events(struct zebra_vrf *zvrf,
 	if (adv_if != NULL)
 		return; /* Already added */

-	if_join_all_router(zvrf->rtadv.sock, zif->ifp);
+	event_add_timer_msec(zrouter.master, start_icmpv6_join_timer, zif->ifp, 10,


10ms is something good/bad/enough?

Changed the logic a little. First we try to do ICMPv6 join (which is existing behavior), if fails then wait random amount time 1 ms to ICMPV6_JOIN_TIMER_EXP_MS(100 ms) to retry next run.

soumyar-roy · 2025-02-21T22:24:10Z

As I understand the idea of this PR/proposal is to change AdvIntervalTimer (sort of random)? But maybe it's most like an OS issue and the OS should be tuned instead?

"the idea of this PR/proposal is to change AdvIntervalTimer (sort of random)?" No, it has two issues , 1) try to send RA for all interfaces (again say all them came up now) 2) try to join ICMPv6 all routers for all interfaces(say all f them came up 1st time) in both cases we get socket errors. For issue 1) rather sending RA requests for all the interfaces in one shot, we batch them using wheel timer. Wheel timer has slots, to which interfaces are added based upon hashing of interface index. Based upon wheel timer expiry at time, we send RA for interfaces which are in current slot, next slot and its interfaces will be processed next timer expiry. This way batching for RA requests is done. For issue 2) when interface is Up, I try to join for ICMPv6 all router group(existing behavior), if it fails I wait random amount of time, so that not for all the interfaces we try ICMPv6 join same time, which causes socket error.

zebra/zebra_router.c

zebra/zebra_router.h

ton31337 · 2025-02-26T08:08:23Z

zebra/rtadv.c

+			continue;
+
+		if ((strncmp(ifname_in, ifname_found, ifname_in_len) == 0) &&
+		    (!memcmp(&mcast_addr_in_bin, &mcast_addr_found_bin, sizeof(struct in6_addr))))


IPV6_ADDR_CMP

donaldsharp · 2025-02-26T14:43:51Z

build is failing:

build	24-Feb-2025 19:02:31	In file included from ./lib/log.h:17,
build	24-Feb-2025 19:02:31	                 from ./lib/ipaddr.h:12,
build	24-Feb-2025 19:02:31	                 from ./lib/prefix.h:16,
build	24-Feb-2025 19:02:31	                 from ./lib/stream.h:14,
build	24-Feb-2025 19:02:31	                 from zebra/rtadv.c:15:
build	24-Feb-2025 19:02:31	zebra/rtadv.c: In function ‘v6_addr_hex_str_to_in6_addr’:
build	24-Feb-2025 19:02:31	./lib/zlog.h:109:34: error: format ‘%u’ expects argument of type ‘unsigned int’, but argument 3 has type ‘size_t’ {aka ‘long unsigned int’} [-Werror=format=]
build	24-Feb-2025 19:02:31	  109 |                 zlog_ref(&_xref, (msg), ##__VA_ARGS__);                        \
build	24-Feb-2025 19:02:31	      |                                  ^~~~~
build	24-Feb-2025 19:02:31	./lib/zlog.h:124:9: note: in expansion of macro ‘_zlog_ecref’
build	24-Feb-2025 19:02:31	  124 |         _zlog_ecref(ferr_id, LOG_ERR, format, ## __VA_ARGS__)
build	24-Feb-2025 19:02:31	      |         ^~~~~~~~~~~
build	24-Feb-2025 19:02:31	zebra/rtadv.c:1954:17: note: in expansion of macro ‘flog_err_sys’
build	24-Feb-2025 19:02:31	 1954 |                 flog_err_sys(EC_LIB_SYSTEM_CALL, "Invalid V6 addr hex len %u", str_len);
build	24-Feb-2025 19:02:31	      |                 ^~~~~~~~~~~~
build	24-Feb-2025 19:02:32	  CC       zebra/zebra_mlag_vty.o
build	24-Feb-2025 19:02:32	  CC       zebra/zebra_srv6_vty.o
build	24-Feb-2025 19:02:32	cc1: all warnings being treated as errors
build	24-Feb-2025 19:02:32	make[1]: *** [Makefile:10895: zebra/rtadv.o] Error 1```

Issue: When 514 inerfaces/neighbors are configured, it creates socket error, "Cannot allocate memory", when back to back V6 RA messages are tried to be sent over the socket. This prevents interface, to know its peer's link local address. Socket error comes when 1) try to join ICMPv6 all router multicast group, back to back for all interfaces 2)send back to back RA for all interfaces Fix: 1)For ICMPv6 join case, we check if the interface has already joined all router group, if not try to join. On failure, retry joining after random amount of time determined 1 ms to ICMPV6_JOIN_TIMER_EXP_MS(100 ms) 2) For RA issue case, batch sending of RA mesages using wheel timer Testing: Monitor BGP session running sh bgp summary command Before fix: r1# sh bgp summary IPv4 Unicast Summary: BGP router identifier 192.168.1.1, local AS number 1001 VRF default vrf-id 0 BGP table version 0 RIB entries 0, using 0 bytes of memory Peers 515, using 12 MiB of memory Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd PfxSnt Desc r1-eth0 4 1002 89 90 0 0 0 00:07:10 0 0 N/A r1-eth1 4 1002 89 90 0 0 0 00:07:10 0 0 N/A r1-eth2 4 1002 89 90 0 0 0 00:07:10 0 0 N/A r1-eth3 4 1002 89 90 0 0 0 00:07:10 0 0 N/A r1-eth4 4 1002 89 90 0 0 0 00:07:10 0 0 N/A r1-eth5 4 1002 89 90 0 0 0 00:07:10 0 0 N/A …..<snip>... r1-eth252 4 1002 31 29 0 0 0 00:02:08 0 0 N/A r1-eth253 4 1002 31 29 0 0 0 00:02:08 0 0 N/A r1-eth254 4 1002 31 29 0 0 0 00:02:08 0 0 N/A r1-eth255 4 1002 31 29 0 0 0 00:02:08 0 0 N/A r1-eth256 4 0 0 0 0 0 0 never Idle 0 N/A r1-eth257 4 0 0 0 0 0 0 never Idle 0 N/A r1-eth258 4 0 0 0 0 0 0 never Idle 0 N/A r1-eth259 4 0 0 0 0 0 0 never Idle 0 N/A r1-eth260 4 0 0 0 0 0 0 never Idle 0 N/A ……..<snip>….. r1-eth511 4 0 0 0 0 0 0 never Idle 0 N/A r1-eth512 4 0 0 0 0 0 0 never Idle 0 N/A r1-eth513 4 0 0 0 0 0 0 never Idle 0 N/A r1-eth514 4 0 0 0 0 0 0 never Idle 0 N/A After fix: r1# show bgp summary IPv4 Unicast Summary: BGP router identifier 192.168.1.1, local AS number 1001 VRF default vrf-id 0 BGP table version 0 RIB entries 0, using 0 bytes of memory Peers 515, using 12 MiB of memory Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd PfxSnt Desc r1-eth0 4 1002 87 87 0 0 0 00:07:04 0 0 N/A r1-eth1 4 1002 87 87 0 0 0 00:07:04 0 0 N/A r1-eth2 4 1002 87 87 0 0 0 00:07:04 0 0 N/A r1-eth3 4 1002 64 67 0 0 0 00:05:09 0 0 N/A r1-eth4 4 1002 87 87 0 0 0 00:07:04 0 0 N/A r1-eth5 4 1002 87 87 0 0 0 00:07:04 0 0 N/A r1-eth6 4 1002 67 70 0 0 0 00:05:22 0 0 N/A r1-eth7 4 1002 87 87 0 0 0 00:07:04 0 0 N/A r1-eth8 4 1002 87 87 0 0 0 00:07:04 0 0 N/A .... r1-eth499 4 1002 43 43 0 0 0 00:03:22 0 0 N/A r1-eth500 4 1002 43 43 0 0 0 00:03:22 0 0 N/A r1-eth501 4 1002 19 22 0 0 0 00:01:21 0 0 N/A r1-eth502 4 1002 43 43 0 0 0 00:03:22 0 0 N/A r1-eth503 4 1002 43 43 0 0 0 00:03:22 0 0 N/A r1-eth504 4 1002 20 23 0 0 0 00:01:30 0 0 N/A r1-eth505 4 1002 43 43 0 0 0 00:03:22 0 0 N/A r1-eth506 4 1002 43 43 0 0 0 00:03:22 0 0 N/A r1-eth507 4 1002 22 25 0 0 0 00:01:39 0 0 N/A r1-eth508 4 1002 43 43 0 0 0 00:03:22 0 0 N/A r1-eth509 4 1002 17 20 0 0 0 00:01:13 0 0 N/A r1-eth510 4 1002 43 43 0 0 0 00:03:22 0 0 N/A r1-eth511 4 1002 43 43 0 0 0 00:03:22 0 0 N/A r1-eth512 4 1002 19 22 0 0 0 00:01:22 0 0 N/A r1-eth513 4 1002 43 43 0 0 0 00:03:22 0 0 N/A r1-eth514 4 1002 43 43 0 0 0 00:03:22 0 0 N/A Signed-off-by: Soumya Roy <[email protected]>

Signed-off-by: Soumya Roy <[email protected]>

soumyar-roy · 2025-02-26T22:07:03Z

build is failing:

build	24-Feb-2025 19:02:31	In file included from ./lib/log.h:17,
build	24-Feb-2025 19:02:31	                 from ./lib/ipaddr.h:12,
build	24-Feb-2025 19:02:31	                 from ./lib/prefix.h:16,
build	24-Feb-2025 19:02:31	                 from ./lib/stream.h:14,
build	24-Feb-2025 19:02:31	                 from zebra/rtadv.c:15:
build	24-Feb-2025 19:02:31	zebra/rtadv.c: In function ‘v6_addr_hex_str_to_in6_addr’:
build	24-Feb-2025 19:02:31	./lib/zlog.h:109:34: error: format ‘%u’ expects argument of type ‘unsigned int’, but argument 3 has type ‘size_t’ {aka ‘long unsigned int’} [-Werror=format=]
build	24-Feb-2025 19:02:31	  109 |                 zlog_ref(&_xref, (msg), ##__VA_ARGS__);                        \
build	24-Feb-2025 19:02:31	      |                                  ^~~~~
build	24-Feb-2025 19:02:31	./lib/zlog.h:124:9: note: in expansion of macro ‘_zlog_ecref’
build	24-Feb-2025 19:02:31	  124 |         _zlog_ecref(ferr_id, LOG_ERR, format, ## __VA_ARGS__)
build	24-Feb-2025 19:02:31	      |         ^~~~~~~~~~~
build	24-Feb-2025 19:02:31	zebra/rtadv.c:1954:17: note: in expansion of macro ‘flog_err_sys’
build	24-Feb-2025 19:02:31	 1954 |                 flog_err_sys(EC_LIB_SYSTEM_CALL, "Invalid V6 addr hex len %u", str_len);
build	24-Feb-2025 19:02:31	      |                 ^~~~~~~~~~~~
build	24-Feb-2025 19:02:32	  CC       zebra/zebra_mlag_vty.o
build	24-Feb-2025 19:02:32	  CC       zebra/zebra_srv6_vty.o
build	24-Feb-2025 19:02:32	cc1: all warnings being treated as errors
build	24-Feb-2025 19:02:32	make[1]: *** [Makefile:10895: zebra/rtadv.o] Error 1```

Fixed this, chnaged %zu

frrbot bot added the zebra label Feb 21, 2025

github-actions bot added master size/L labels Feb 21, 2025

soumyar-roy marked this pull request as draft February 21, 2025 00:21

github-actions bot added the rebase PR needs rebase label Feb 21, 2025

routingrocks reviewed Feb 21, 2025

View reviewed changes

ton31337 reviewed Feb 21, 2025

View reviewed changes

soumyar-roy force-pushed the soumya/ra514nei branch from b806e97 to b2476ab Compare February 21, 2025 21:23

donaldsharp reviewed Feb 24, 2025

View reviewed changes

zebra/zebra_router.c Outdated Show resolved Hide resolved

donaldsharp reviewed Feb 24, 2025

View reviewed changes

zebra/zebra_router.h Outdated Show resolved Hide resolved

soumyar-roy force-pushed the soumya/ra514nei branch from b2476ab to 312cf54 Compare February 24, 2025 18:46

frrbot bot added the tests Topotests, make check, etc label Feb 24, 2025

github-actions bot added size/XL and removed size/L labels Feb 24, 2025

soumyar-roy marked this pull request as ready for review February 25, 2025 16:47

ton31337 reviewed Feb 26, 2025

View reviewed changes

Soumya Roy added 2 commits February 26, 2025 14:03

tests: add support for bringimg up 514 BGP neighbors

93b91d0

Signed-off-by: Soumya Roy <[email protected]>

soumyar-roy force-pushed the soumya/ra514nei branch from 4dbaa12 to 93b91d0 Compare February 26, 2025 22:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

zebra: Bring up 514 BGP neighbor sessions #18214

zebra: Bring up 514 BGP neighbor sessions #18214

soumyar-roy commented Feb 21, 2025

routingrocks Feb 21, 2025

soumyar-roy Feb 21, 2025

routingrocks Feb 21, 2025

soumyar-roy Feb 21, 2025

routingrocks Feb 25, 2025

ton31337 left a comment

ton31337 Feb 21, 2025

ton31337 Feb 21, 2025

soumyar-roy Feb 21, 2025

ton31337 Feb 21, 2025

ton31337 Feb 21, 2025

soumyar-roy Feb 21, 2025

ton31337 Feb 21, 2025

soumyar-roy Feb 21, 2025

ton31337 Feb 21, 2025

soumyar-roy Feb 21, 2025

soumyar-roy commented Feb 21, 2025

ton31337 Feb 26, 2025

soumyar-roy Feb 26, 2025

donaldsharp commented Feb 26, 2025

soumyar-roy commented Feb 26, 2025

zebra: Bring up 514 BGP neighbor sessions #18214

Are you sure you want to change the base?

zebra: Bring up 514 BGP neighbor sessions #18214

Conversation

soumyar-roy commented Feb 21, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ton31337 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

soumyar-roy commented Feb 21, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

donaldsharp commented Feb 26, 2025

soumyar-roy commented Feb 26, 2025