-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
zebra: Bring up 514 BGP neighbor sessions #18214
base: master
Are you sure you want to change the base?
Conversation
|
||
rtadv_send_packet(zvrf->rtadv.sock, ifp, RA_ENABLE); | ||
} else { | ||
zif->rtadv.AdvIntervalTimer -= |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it possible AdvintervalTimer will go negative, as the RTADV_TIMER_WHEEL_PERIOD_MS is larger?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it can. that time we reset it to zif->rtadv.MaxRtrAdvInterval. This is what it is now in exiting behavior. We are not really matching exact AdvIntervalTimer period, rather we are making sure, we have spent at least AdvIntervalTimer amount of time, before sending next RA.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, can we prevent or validate the from going negative?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no we need to allow to go negative. This gives flexibility to send RA , say your configured RA interval(we cant control this) is not a multiple of timer expiry period. say configured RA interval is 1005 ms, timer expiry 10 ms, so we cant match 1005, say timer expires at 10, 20 ....1000 ms then 1010ms. We wont send RA upto 1000 ms. so when we check at 1010 ms, that time diff == 5 - 10 (or 1005 - 1010) == -5ms, it tells that we have waited enough, so now send RA. now if we had configured RA interval is 1000 ms, diff == 1000 - 1000 == 0, so that time also we send RA. Obviously we dont want to expire time at 1ms, where we could always find a perfect match , but it would have created too many timer events. this is the exiting logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
got it, thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I understand the idea of this PR/proposal is to change AdvIntervalTimer (sort of random)? But maybe it's most like an OS issue and the OS should be tuned instead?
zebra/rtadv.c
Outdated
@@ -36,6 +37,14 @@ extern struct zebra_privs_t zserv_privs; | |||
static uint32_t interfaces_configured_for_ra_from_bgp; | |||
#define RTADV_ADATA_SIZE 1024 | |||
|
|||
#define PROC_IGMP6 "/proc/net/igmp6" | |||
|
|||
#define MAX_V6ADDR_LEN 33 //32 hex char + 1 null terminataor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's use /* */
style.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not INET6_ADDRSTRLEN?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed it to exactly 32. this is basically indicating 32 chars in V6 address hex string, say for 2001:db8:85a3::8a2e:370:7334
hex string is 20010db885a3000000008a2e03707334, which is 32 chars long
zebra/rtadv.c
Outdated
static void ipv6_to_raw_hex(const struct in6_addr *addr, char *output) | ||
{ | ||
for (int i = 0; i < 16; i++) { | ||
sprintf(output + (i * 2), "%02x", addr->s6_addr[i]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
snprintfrr()
...
zebra/rtadv.c
Outdated
continue; | ||
} | ||
|
||
if ((strncmp(ifname_in, ifname_found, ifname_in_len) == 0) && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we do an exact match? strmatch()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
strmatch is using strcmp, #define strmatch(a,b) (!strcmp((a), (b))). I was advised not to use strcmp rather use strncmp.
zebra/rtadv.c
Outdated
struct zebra_vrf *zvrf = rtadv_interface_get_zvrf(ifp); | ||
|
||
if (zif->rtadv.inFastRexmit && zif->rtadv.UseFastRexmit) { | ||
if (--zif->rtadv.NumFastReXmitsRemain <= 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please drop {}
where a branch is from a single statement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
zebra/rtadv.c
Outdated
@@ -1261,7 +1325,8 @@ static void rtadv_start_interface_events(struct zebra_vrf *zvrf, | |||
if (adv_if != NULL) | |||
return; /* Already added */ | |||
|
|||
if_join_all_router(zvrf->rtadv.sock, zif->ifp); | |||
event_add_timer_msec(zrouter.master, start_icmpv6_join_timer, zif->ifp, 10, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
10ms is something good/bad/enough?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed the logic a little. First we try to do ICMPv6 join (which is existing behavior), if fails then wait random amount time 1 ms to ICMPV6_JOIN_TIMER_EXP_MS(100 ms) to retry next run.
b806e97
to
b2476ab
Compare
"the idea of this PR/proposal is to change AdvIntervalTimer (sort of random)?" No, it has two issues , 1) try to send RA for all interfaces (again say all them came up now) 2) try to join ICMPv6 all routers for all interfaces(say all f them came up 1st time) in both cases we get socket errors. For issue 1) rather sending RA requests for all the interfaces in one shot, we batch them using wheel timer. Wheel timer has slots, to which interfaces are added based upon hashing of interface index. Based upon wheel timer expiry at time, we send RA for interfaces which are in current slot, next slot and its interfaces will be processed next timer expiry. This way batching for RA requests is done. For issue 2) when interface is Up, I try to join for ICMPv6 all router group(existing behavior), if it fails I wait random amount of time, so that not for all the interfaces we try ICMPv6 join same time, which causes socket error. |
b2476ab
to
312cf54
Compare
zebra/rtadv.c
Outdated
continue; | ||
|
||
if ((strncmp(ifname_in, ifname_found, ifname_in_len) == 0) && | ||
(!memcmp(&mcast_addr_in_bin, &mcast_addr_found_bin, sizeof(struct in6_addr)))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IPV6_ADDR_CMP
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed.
build is failing:
|
Issue: When 514 inerfaces/neighbors are configured, it creates socket error, "Cannot allocate memory", when back to back V6 RA messages are tried to be sent over the socket. This prevents interface, to know its peer's link local address. Socket error comes when 1) try to join ICMPv6 all router multicast group, back to back for all interfaces 2)send back to back RA for all interfaces Fix: 1)For ICMPv6 join case, we check if the interface has already joined all router group, if not try to join. On failure, retry joining after random amount of time determined 1 ms to ICMPV6_JOIN_TIMER_EXP_MS(100 ms) 2) For RA issue case, batch sending of RA mesages using wheel timer Testing: Monitor BGP session running sh bgp summary command Before fix: r1# sh bgp summary IPv4 Unicast Summary: BGP router identifier 192.168.1.1, local AS number 1001 VRF default vrf-id 0 BGP table version 0 RIB entries 0, using 0 bytes of memory Peers 515, using 12 MiB of memory Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd PfxSnt Desc r1-eth0 4 1002 89 90 0 0 0 00:07:10 0 0 N/A r1-eth1 4 1002 89 90 0 0 0 00:07:10 0 0 N/A r1-eth2 4 1002 89 90 0 0 0 00:07:10 0 0 N/A r1-eth3 4 1002 89 90 0 0 0 00:07:10 0 0 N/A r1-eth4 4 1002 89 90 0 0 0 00:07:10 0 0 N/A r1-eth5 4 1002 89 90 0 0 0 00:07:10 0 0 N/A …..<snip>... r1-eth252 4 1002 31 29 0 0 0 00:02:08 0 0 N/A r1-eth253 4 1002 31 29 0 0 0 00:02:08 0 0 N/A r1-eth254 4 1002 31 29 0 0 0 00:02:08 0 0 N/A r1-eth255 4 1002 31 29 0 0 0 00:02:08 0 0 N/A r1-eth256 4 0 0 0 0 0 0 never Idle 0 N/A r1-eth257 4 0 0 0 0 0 0 never Idle 0 N/A r1-eth258 4 0 0 0 0 0 0 never Idle 0 N/A r1-eth259 4 0 0 0 0 0 0 never Idle 0 N/A r1-eth260 4 0 0 0 0 0 0 never Idle 0 N/A ……..<snip>….. r1-eth511 4 0 0 0 0 0 0 never Idle 0 N/A r1-eth512 4 0 0 0 0 0 0 never Idle 0 N/A r1-eth513 4 0 0 0 0 0 0 never Idle 0 N/A r1-eth514 4 0 0 0 0 0 0 never Idle 0 N/A After fix: r1# show bgp summary IPv4 Unicast Summary: BGP router identifier 192.168.1.1, local AS number 1001 VRF default vrf-id 0 BGP table version 0 RIB entries 0, using 0 bytes of memory Peers 515, using 12 MiB of memory Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd PfxSnt Desc r1-eth0 4 1002 87 87 0 0 0 00:07:04 0 0 N/A r1-eth1 4 1002 87 87 0 0 0 00:07:04 0 0 N/A r1-eth2 4 1002 87 87 0 0 0 00:07:04 0 0 N/A r1-eth3 4 1002 64 67 0 0 0 00:05:09 0 0 N/A r1-eth4 4 1002 87 87 0 0 0 00:07:04 0 0 N/A r1-eth5 4 1002 87 87 0 0 0 00:07:04 0 0 N/A r1-eth6 4 1002 67 70 0 0 0 00:05:22 0 0 N/A r1-eth7 4 1002 87 87 0 0 0 00:07:04 0 0 N/A r1-eth8 4 1002 87 87 0 0 0 00:07:04 0 0 N/A .... r1-eth499 4 1002 43 43 0 0 0 00:03:22 0 0 N/A r1-eth500 4 1002 43 43 0 0 0 00:03:22 0 0 N/A r1-eth501 4 1002 19 22 0 0 0 00:01:21 0 0 N/A r1-eth502 4 1002 43 43 0 0 0 00:03:22 0 0 N/A r1-eth503 4 1002 43 43 0 0 0 00:03:22 0 0 N/A r1-eth504 4 1002 20 23 0 0 0 00:01:30 0 0 N/A r1-eth505 4 1002 43 43 0 0 0 00:03:22 0 0 N/A r1-eth506 4 1002 43 43 0 0 0 00:03:22 0 0 N/A r1-eth507 4 1002 22 25 0 0 0 00:01:39 0 0 N/A r1-eth508 4 1002 43 43 0 0 0 00:03:22 0 0 N/A r1-eth509 4 1002 17 20 0 0 0 00:01:13 0 0 N/A r1-eth510 4 1002 43 43 0 0 0 00:03:22 0 0 N/A r1-eth511 4 1002 43 43 0 0 0 00:03:22 0 0 N/A r1-eth512 4 1002 19 22 0 0 0 00:01:22 0 0 N/A r1-eth513 4 1002 43 43 0 0 0 00:03:22 0 0 N/A r1-eth514 4 1002 43 43 0 0 0 00:03:22 0 0 N/A Signed-off-by: Soumya Roy <[email protected]>
Signed-off-by: Soumya Roy <[email protected]>
4dbaa12
to
93b91d0
Compare
Fixed this, chnaged %zu |
Issue:
When 514 inerfaces/neighbors are configured, it creates socket error, "Cannot allocate memory", when back to back V6 RA messages are tried to be sent over the socket. This prevents interface, to know its peer's link local address.
Fix:
This fix batches the RA messages using wheel timer, thus preventing socket operations overflooding.
Testing:
Monitor BGP session running sh bgp summary command
Before fix:
r1# sh bgp summary
IPv4 Unicast Summary:
BGP router identifier 192.168.1.1, local AS number 1001 VRF default vrf-id 0 BGP table version 0
RIB entries 0, using 0 bytes of memory
Peers 515, using 12 MiB of memory
Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd PfxSnt Desc
r1-eth0 4 1002 89 90 0 0 0 00:07:10 0 0 N/A
r1-eth1 4 1002 89 90 0 0 0 00:07:10 0 0 N/A
r1-eth2 4 1002 89 90 0 0 0 00:07:10 0 0 N/A
r1-eth3 4 1002 89 90 0 0 0 00:07:10 0 0 N/A
r1-eth4 4 1002 89 90 0 0 0 00:07:10 0 0 N/A
r1-eth5 4 1002 89 90 0 0 0 00:07:10 0 0 N/A
….....
r1-eth252 4 1002 31 29 0 0 0 00:02:08 0 0 N/A
r1-eth253 4 1002 31 29 0 0 0 00:02:08 0 0 N/A
r1-eth254 4 1002 31 29 0 0 0 00:02:08 0 0 N/A
r1-eth255 4 1002 31 29 0 0 0 00:02:08 0 0 N/A
r1-eth256 4 0 0 0 0 0 0 never Idle 0 N/A
r1-eth257 4 0 0 0 0 0 0 never Idle 0 N/A
r1-eth258 4 0 0 0 0 0 0 never Idle 0 N/A
r1-eth259 4 0 0 0 0 0 0 never Idle 0 N/A
r1-eth260 4 0 0 0 0 0 0 never Idle 0 N/A
……..…..
r1-eth511 4 0 0 0 0 0 0 never Idle 0 N/A
r1-eth512 4 0 0 0 0 0 0 never Idle 0 N/A
r1-eth513 4 0 0 0 0 0 0 never Idle 0 N/A
r1-eth514 4 0 0 0 0 0 0 never Idle 0 N/A
After fix:
r1# show bgp summary
IPv4 Unicast Summary:
BGP router identifier 192.168.1.1, local AS number 1001 VRF default vrf-id 0 BGP table version 0
RIB entries 0, using 0 bytes of memory
Peers 515, using 12 MiB of memory
Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd PfxSnt Desc
r1-eth0 4 1002 87 87 0 0 0 00:07:04 0 0 N/A
r1-eth1 4 1002 87 87 0 0 0 00:07:04 0 0 N/A
r1-eth2 4 1002 87 87 0 0 0 00:07:04 0 0 N/A
r1-eth3 4 1002 64 67 0 0 0 00:05:09 0 0 N/A
r1-eth4 4 1002 87 87 0 0 0 00:07:04 0 0 N/A
r1-eth5 4 1002 87 87 0 0 0 00:07:04 0 0 N/A
r1-eth6 4 1002 67 70 0 0 0 00:05:22 0 0 N/A
r1-eth7 4 1002 87 87 0 0 0 00:07:04 0 0 N/A
r1-eth8 4 1002 87 87 0 0 0 00:07:04 0 0 N/A
....
r1-eth499 4 1002 43 43 0 0 0 00:03:22 0 0 N/A
r1-eth500 4 1002 43 43 0 0 0 00:03:22 0 0 N/A
r1-eth501 4 1002 19 22 0 0 0 00:01:21 0 0 N/A
r1-eth502 4 1002 43 43 0 0 0 00:03:22 0 0 N/A
r1-eth503 4 1002 43 43 0 0 0 00:03:22 0 0 N/A
r1-eth504 4 1002 20 23 0 0 0 00:01:30 0 0 N/A
r1-eth505 4 1002 43 43 0 0 0 00:03:22 0 0 N/A
r1-eth506 4 1002 43 43 0 0 0 00:03:22 0 0 N/A
r1-eth507 4 1002 22 25 0 0 0 00:01:39 0 0 N/A
r1-eth508 4 1002 43 43 0 0 0 00:03:22 0 0 N/A
r1-eth509 4 1002 17 20 0 0 0 00:01:13 0 0 N/A
r1-eth510 4 1002 43 43 0 0 0 00:03:22 0 0 N/A
r1-eth511 4 1002 43 43 0 0 0 00:03:22 0 0 N/A
r1-eth512 4 1002 19 22 0 0 0 00:01:22 0 0 N/A
r1-eth513 4 1002 43 43 0 0 0 00:03:22 0 0 N/A
r1-eth514 4 1002 43 43 0 0 0 00:03:22 0 0 N/A