Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zebra: Bring up 514 BGP neighbor sessions #18214

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

soumyar-roy
Copy link

Issue:
When 514 inerfaces/neighbors are configured, it creates socket error, "Cannot allocate memory", when back to back V6 RA messages are tried to be sent over the socket. This prevents interface, to know its peer's link local address.

Fix:
This fix batches the RA messages using wheel timer, thus preventing socket operations overflooding.

Testing:
Monitor BGP session running sh bgp summary command

Before fix:
r1# sh bgp summary

IPv4 Unicast Summary:
BGP router identifier 192.168.1.1, local AS number 1001 VRF default vrf-id 0 BGP table version 0
RIB entries 0, using 0 bytes of memory
Peers 515, using 12 MiB of memory

Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd PfxSnt Desc
r1-eth0 4 1002 89 90 0 0 0 00:07:10 0 0 N/A
r1-eth1 4 1002 89 90 0 0 0 00:07:10 0 0 N/A
r1-eth2 4 1002 89 90 0 0 0 00:07:10 0 0 N/A
r1-eth3 4 1002 89 90 0 0 0 00:07:10 0 0 N/A
r1-eth4 4 1002 89 90 0 0 0 00:07:10 0 0 N/A
r1-eth5 4 1002 89 90 0 0 0 00:07:10 0 0 N/A

….....
r1-eth252 4 1002 31 29 0 0 0 00:02:08 0 0 N/A
r1-eth253 4 1002 31 29 0 0 0 00:02:08 0 0 N/A
r1-eth254 4 1002 31 29 0 0 0 00:02:08 0 0 N/A
r1-eth255 4 1002 31 29 0 0 0 00:02:08 0 0 N/A
r1-eth256 4 0 0 0 0 0 0 never Idle 0 N/A
r1-eth257 4 0 0 0 0 0 0 never Idle 0 N/A
r1-eth258 4 0 0 0 0 0 0 never Idle 0 N/A
r1-eth259 4 0 0 0 0 0 0 never Idle 0 N/A
r1-eth260 4 0 0 0 0 0 0 never Idle 0 N/A
……..…..
r1-eth511 4 0 0 0 0 0 0 never Idle 0 N/A
r1-eth512 4 0 0 0 0 0 0 never Idle 0 N/A
r1-eth513 4 0 0 0 0 0 0 never Idle 0 N/A
r1-eth514 4 0 0 0 0 0 0 never Idle 0 N/A
After fix:
r1# show bgp summary

IPv4 Unicast Summary:
BGP router identifier 192.168.1.1, local AS number 1001 VRF default vrf-id 0 BGP table version 0
RIB entries 0, using 0 bytes of memory
Peers 515, using 12 MiB of memory

Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd PfxSnt Desc
r1-eth0 4 1002 87 87 0 0 0 00:07:04 0 0 N/A
r1-eth1 4 1002 87 87 0 0 0 00:07:04 0 0 N/A
r1-eth2 4 1002 87 87 0 0 0 00:07:04 0 0 N/A
r1-eth3 4 1002 64 67 0 0 0 00:05:09 0 0 N/A
r1-eth4 4 1002 87 87 0 0 0 00:07:04 0 0 N/A
r1-eth5 4 1002 87 87 0 0 0 00:07:04 0 0 N/A
r1-eth6 4 1002 67 70 0 0 0 00:05:22 0 0 N/A
r1-eth7 4 1002 87 87 0 0 0 00:07:04 0 0 N/A
r1-eth8 4 1002 87 87 0 0 0 00:07:04 0 0 N/A
....
r1-eth499 4 1002 43 43 0 0 0 00:03:22 0 0 N/A
r1-eth500 4 1002 43 43 0 0 0 00:03:22 0 0 N/A
r1-eth501 4 1002 19 22 0 0 0 00:01:21 0 0 N/A
r1-eth502 4 1002 43 43 0 0 0 00:03:22 0 0 N/A
r1-eth503 4 1002 43 43 0 0 0 00:03:22 0 0 N/A
r1-eth504 4 1002 20 23 0 0 0 00:01:30 0 0 N/A
r1-eth505 4 1002 43 43 0 0 0 00:03:22 0 0 N/A
r1-eth506 4 1002 43 43 0 0 0 00:03:22 0 0 N/A
r1-eth507 4 1002 22 25 0 0 0 00:01:39 0 0 N/A
r1-eth508 4 1002 43 43 0 0 0 00:03:22 0 0 N/A
r1-eth509 4 1002 17 20 0 0 0 00:01:13 0 0 N/A
r1-eth510 4 1002 43 43 0 0 0 00:03:22 0 0 N/A
r1-eth511 4 1002 43 43 0 0 0 00:03:22 0 0 N/A
r1-eth512 4 1002 19 22 0 0 0 00:01:22 0 0 N/A
r1-eth513 4 1002 43 43 0 0 0 00:03:22 0 0 N/A
r1-eth514 4 1002 43 43 0 0 0 00:03:22 0 0 N/A

@frrbot frrbot bot added the zebra label Feb 21, 2025
@soumyar-roy soumyar-roy marked this pull request as draft February 21, 2025 00:21
@github-actions github-actions bot added the rebase PR needs rebase label Feb 21, 2025

rtadv_send_packet(zvrf->rtadv.sock, ifp, RA_ENABLE);
} else {
zif->rtadv.AdvIntervalTimer -=
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it possible AdvintervalTimer will go negative, as the RTADV_TIMER_WHEEL_PERIOD_MS is larger?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it can. that time we reset it to zif->rtadv.MaxRtrAdvInterval. This is what it is now in exiting behavior. We are not really matching exact AdvIntervalTimer period, rather we are making sure, we have spent at least AdvIntervalTimer amount of time, before sending next RA.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, can we prevent or validate the from going negative?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no we need to allow to go negative. This gives flexibility to send RA , say your configured RA interval(we cant control this) is not a multiple of timer expiry period. say configured RA interval is 1005 ms, timer expiry 10 ms, so we cant match 1005, say timer expires at 10, 20 ....1000 ms then 1010ms. We wont send RA upto 1000 ms. so when we check at 1010 ms, that time diff == 5 - 10 (or 1005 - 1010) == -5ms, it tells that we have waited enough, so now send RA. now if we had configured RA interval is 1000 ms, diff == 1000 - 1000 == 0, so that time also we send RA. Obviously we dont want to expire time at 1ms, where we could always find a perfect match , but it would have created too many timer events. this is the exiting logic.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it, thanks

Copy link
Member

@ton31337 ton31337 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I understand the idea of this PR/proposal is to change AdvIntervalTimer (sort of random)? But maybe it's most like an OS issue and the OS should be tuned instead?

zebra/rtadv.c Outdated
@@ -36,6 +37,14 @@ extern struct zebra_privs_t zserv_privs;
static uint32_t interfaces_configured_for_ra_from_bgp;
#define RTADV_ADATA_SIZE 1024

#define PROC_IGMP6 "/proc/net/igmp6"

#define MAX_V6ADDR_LEN 33 //32 hex char + 1 null terminataor
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use /* */ style.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not INET6_ADDRSTRLEN?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed it to exactly 32. this is basically indicating 32 chars in V6 address hex string, say for 2001:db8:85a3::8a2e:370:7334
hex string is 20010db885a3000000008a2e03707334, which is 32 chars long

zebra/rtadv.c Outdated
static void ipv6_to_raw_hex(const struct in6_addr *addr, char *output)
{
for (int i = 0; i < 16; i++) {
sprintf(output + (i * 2), "%02x", addr->s6_addr[i]);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

snprintfrr()...

zebra/rtadv.c Outdated
continue;
}

if ((strncmp(ifname_in, ifname_found, ifname_in_len) == 0) &&
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we do an exact match? strmatch().

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

strmatch is using strcmp, #define strmatch(a,b) (!strcmp((a), (b))). I was advised not to use strcmp rather use strncmp.

zebra/rtadv.c Outdated
struct zebra_vrf *zvrf = rtadv_interface_get_zvrf(ifp);

if (zif->rtadv.inFastRexmit && zif->rtadv.UseFastRexmit) {
if (--zif->rtadv.NumFastReXmitsRemain <= 0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please drop {} where a branch is from a single statement.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

zebra/rtadv.c Outdated
@@ -1261,7 +1325,8 @@ static void rtadv_start_interface_events(struct zebra_vrf *zvrf,
if (adv_if != NULL)
return; /* Already added */

if_join_all_router(zvrf->rtadv.sock, zif->ifp);
event_add_timer_msec(zrouter.master, start_icmpv6_join_timer, zif->ifp, 10,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

10ms is something good/bad/enough?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed the logic a little. First we try to do ICMPv6 join (which is existing behavior), if fails then wait random amount time 1 ms to ICMPV6_JOIN_TIMER_EXP_MS(100 ms) to retry next run.

@soumyar-roy
Copy link
Author

As I understand the idea of this PR/proposal is to change AdvIntervalTimer (sort of random)? But maybe it's most like an OS issue and the OS should be tuned instead?

"the idea of this PR/proposal is to change AdvIntervalTimer (sort of random)?" No, it has two issues , 1) try to send RA for all interfaces (again say all them came up now) 2) try to join ICMPv6 all routers for all interfaces(say all f them came up 1st time) in both cases we get socket errors. For issue 1) rather sending RA requests for all the interfaces in one shot, we batch them using wheel timer. Wheel timer has slots, to which interfaces are added based upon hashing of interface index. Based upon wheel timer expiry at time, we send RA for interfaces which are in current slot, next slot and its interfaces will be processed next timer expiry. This way batching for RA requests is done. For issue 2) when interface is Up, I try to join for ICMPv6 all router group(existing behavior), if it fails I wait random amount of time, so that not for all the interfaces we try ICMPv6 join same time, which causes socket error.

@frrbot frrbot bot added the tests Topotests, make check, etc label Feb 24, 2025
@github-actions github-actions bot added size/XL and removed size/L labels Feb 24, 2025
@soumyar-roy soumyar-roy marked this pull request as ready for review February 25, 2025 16:47
zebra/rtadv.c Outdated
continue;

if ((strncmp(ifname_in, ifname_found, ifname_in_len) == 0) &&
(!memcmp(&mcast_addr_in_bin, &mcast_addr_found_bin, sizeof(struct in6_addr))))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IPV6_ADDR_CMP

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed.

@donaldsharp
Copy link
Member

build is failing:

build	24-Feb-2025 19:02:31	In file included from ./lib/log.h:17,
build	24-Feb-2025 19:02:31	                 from ./lib/ipaddr.h:12,
build	24-Feb-2025 19:02:31	                 from ./lib/prefix.h:16,
build	24-Feb-2025 19:02:31	                 from ./lib/stream.h:14,
build	24-Feb-2025 19:02:31	                 from zebra/rtadv.c:15:
build	24-Feb-2025 19:02:31	zebra/rtadv.c: In function ‘v6_addr_hex_str_to_in6_addr’:
build	24-Feb-2025 19:02:31	./lib/zlog.h:109:34: error: format ‘%u’ expects argument of type ‘unsigned int’, but argument 3 has type ‘size_t’ {aka ‘long unsigned int’} [-Werror=format=]
build	24-Feb-2025 19:02:31	  109 |                 zlog_ref(&_xref, (msg), ##__VA_ARGS__);                        \
build	24-Feb-2025 19:02:31	      |                                  ^~~~~
build	24-Feb-2025 19:02:31	./lib/zlog.h:124:9: note: in expansion of macro ‘_zlog_ecref’
build	24-Feb-2025 19:02:31	  124 |         _zlog_ecref(ferr_id, LOG_ERR, format, ## __VA_ARGS__)
build	24-Feb-2025 19:02:31	      |         ^~~~~~~~~~~
build	24-Feb-2025 19:02:31	zebra/rtadv.c:1954:17: note: in expansion of macro ‘flog_err_sys’
build	24-Feb-2025 19:02:31	 1954 |                 flog_err_sys(EC_LIB_SYSTEM_CALL, "Invalid V6 addr hex len %u", str_len);
build	24-Feb-2025 19:02:31	      |                 ^~~~~~~~~~~~
build	24-Feb-2025 19:02:32	  CC       zebra/zebra_mlag_vty.o
build	24-Feb-2025 19:02:32	  CC       zebra/zebra_srv6_vty.o
build	24-Feb-2025 19:02:32	cc1: all warnings being treated as errors
build	24-Feb-2025 19:02:32	make[1]: *** [Makefile:10895: zebra/rtadv.o] Error 1```

Soumya Roy added 2 commits February 26, 2025 14:03
Issue:
When 514 inerfaces/neighbors are configured, it creates socket error,
"Cannot allocate memory", when back to back V6 RA messages are tried
to be sent over the socket. This prevents interface, to know its peer's
link local address. Socket error comes when 1) try to join ICMPv6 all
router multicast group, back to back for all interfaces 2)send back to
back RA for all interfaces

Fix:
1)For ICMPv6 join case, we check if the interface has already joined
all router group, if not try to join. On failure, retry joining after
random amount of time determined 1 ms to ICMPV6_JOIN_TIMER_EXP_MS(100 ms)
2) For RA issue case, batch sending of RA mesages using wheel timer

Testing:
Monitor BGP session running sh bgp summary command

Before fix:
r1# sh bgp summary

IPv4 Unicast Summary:
BGP router identifier 192.168.1.1, local AS number 1001 VRF default vrf-id 0
BGP table version 0
RIB entries 0, using 0 bytes of memory
Peers 515, using 12 MiB of memory

Neighbor        V         AS   MsgRcvd   MsgSent   TblVer  InQ OutQ  Up/Down State/PfxRcd   PfxSnt Desc
r1-eth0         4       1002        89        90        0    0    0 00:07:10            0        0 N/A
r1-eth1         4       1002        89        90        0    0    0 00:07:10            0        0 N/A
r1-eth2         4       1002        89        90        0    0    0 00:07:10            0        0 N/A
r1-eth3         4       1002        89        90        0    0    0 00:07:10            0        0 N/A
r1-eth4         4       1002        89        90        0    0    0 00:07:10            0        0 N/A
r1-eth5         4       1002        89        90        0    0    0 00:07:10            0        0 N/A

…..<snip>...
r1-eth252       4       1002        31        29        0    0    0 00:02:08            0        0 N/A
r1-eth253       4       1002        31        29        0    0    0 00:02:08            0        0 N/A
r1-eth254       4       1002        31        29        0    0    0 00:02:08            0        0 N/A
r1-eth255       4       1002        31        29        0    0    0 00:02:08            0        0 N/A
r1-eth256       4          0         0         0        0    0    0    never         Idle        0 N/A
r1-eth257       4          0         0         0        0    0    0    never         Idle        0 N/A
r1-eth258       4          0         0         0        0    0    0    never         Idle        0 N/A
r1-eth259       4          0         0         0        0    0    0    never         Idle        0 N/A
r1-eth260       4          0         0         0        0    0    0    never         Idle        0 N/A
……..<snip>…..
r1-eth511       4          0         0         0        0    0    0    never         Idle        0 N/A
r1-eth512       4          0         0         0        0    0    0    never         Idle        0 N/A
r1-eth513       4          0         0         0        0    0    0    never         Idle        0 N/A
r1-eth514       4          0         0         0        0    0    0    never         Idle        0 N/A
After fix:
r1# show bgp summary

IPv4 Unicast Summary:
BGP router identifier 192.168.1.1, local AS number 1001 VRF default vrf-id 0
BGP table version 0
RIB entries 0, using 0 bytes of memory
Peers 515, using 12 MiB of memory

Neighbor        V         AS   MsgRcvd   MsgSent   TblVer  InQ OutQ  Up/Down State/PfxRcd   PfxSnt Desc
r1-eth0         4       1002        87        87        0    0    0 00:07:04            0        0 N/A
r1-eth1         4       1002        87        87        0    0    0 00:07:04            0        0 N/A
r1-eth2         4       1002        87        87        0    0    0 00:07:04            0        0 N/A
r1-eth3         4       1002        64        67        0    0    0 00:05:09            0        0 N/A
r1-eth4         4       1002        87        87        0    0    0 00:07:04            0        0 N/A
r1-eth5         4       1002        87        87        0    0    0 00:07:04            0        0 N/A
r1-eth6         4       1002        67        70        0    0    0 00:05:22            0        0 N/A
r1-eth7         4       1002        87        87        0    0    0 00:07:04            0        0 N/A
r1-eth8         4       1002        87        87        0    0    0 00:07:04            0        0 N/A
....
r1-eth499       4       1002        43        43        0    0    0 00:03:22            0        0 N/A
r1-eth500       4       1002        43        43        0    0    0 00:03:22            0        0 N/A
r1-eth501       4       1002        19        22        0    0    0 00:01:21            0        0 N/A
r1-eth502       4       1002        43        43        0    0    0 00:03:22            0        0 N/A
r1-eth503       4       1002        43        43        0    0    0 00:03:22            0        0 N/A
r1-eth504       4       1002        20        23        0    0    0 00:01:30            0        0 N/A
r1-eth505       4       1002        43        43        0    0    0 00:03:22            0        0 N/A
r1-eth506       4       1002        43        43        0    0    0 00:03:22            0        0 N/A
r1-eth507       4       1002        22        25        0    0    0 00:01:39            0        0 N/A
r1-eth508       4       1002        43        43        0    0    0 00:03:22            0        0 N/A
r1-eth509       4       1002        17        20        0    0    0 00:01:13            0        0 N/A
r1-eth510       4       1002        43        43        0    0    0 00:03:22            0        0 N/A
r1-eth511       4       1002        43        43        0    0    0 00:03:22            0        0 N/A
r1-eth512       4       1002        19        22        0    0    0 00:01:22            0        0 N/A
r1-eth513       4       1002        43        43        0    0    0 00:03:22            0        0 N/A
r1-eth514       4       1002        43        43        0    0    0 00:03:22            0        0 N/A

Signed-off-by: Soumya Roy <[email protected]>
@soumyar-roy
Copy link
Author

build is failing:

build	24-Feb-2025 19:02:31	In file included from ./lib/log.h:17,
build	24-Feb-2025 19:02:31	                 from ./lib/ipaddr.h:12,
build	24-Feb-2025 19:02:31	                 from ./lib/prefix.h:16,
build	24-Feb-2025 19:02:31	                 from ./lib/stream.h:14,
build	24-Feb-2025 19:02:31	                 from zebra/rtadv.c:15:
build	24-Feb-2025 19:02:31	zebra/rtadv.c: In function ‘v6_addr_hex_str_to_in6_addr’:
build	24-Feb-2025 19:02:31	./lib/zlog.h:109:34: error: format ‘%u’ expects argument of type ‘unsigned int’, but argument 3 has type ‘size_t’ {aka ‘long unsigned int’} [-Werror=format=]
build	24-Feb-2025 19:02:31	  109 |                 zlog_ref(&_xref, (msg), ##__VA_ARGS__);                        \
build	24-Feb-2025 19:02:31	      |                                  ^~~~~
build	24-Feb-2025 19:02:31	./lib/zlog.h:124:9: note: in expansion of macro ‘_zlog_ecref’
build	24-Feb-2025 19:02:31	  124 |         _zlog_ecref(ferr_id, LOG_ERR, format, ## __VA_ARGS__)
build	24-Feb-2025 19:02:31	      |         ^~~~~~~~~~~
build	24-Feb-2025 19:02:31	zebra/rtadv.c:1954:17: note: in expansion of macro ‘flog_err_sys’
build	24-Feb-2025 19:02:31	 1954 |                 flog_err_sys(EC_LIB_SYSTEM_CALL, "Invalid V6 addr hex len %u", str_len);
build	24-Feb-2025 19:02:31	      |                 ^~~~~~~~~~~~
build	24-Feb-2025 19:02:32	  CC       zebra/zebra_mlag_vty.o
build	24-Feb-2025 19:02:32	  CC       zebra/zebra_srv6_vty.o
build	24-Feb-2025 19:02:32	cc1: all warnings being treated as errors
build	24-Feb-2025 19:02:32	make[1]: *** [Makefile:10895: zebra/rtadv.o] Error 1```

Fixed this, chnaged %zu

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
master rebase PR needs rebase size/XL tests Topotests, make check, etc zebra
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants