forked from sonic-net/sonic-mgmt
-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
update from master #38
Open
bbinxie
wants to merge
8,677
commits into
SW-CSA:master
Choose a base branch
from
sonic-net:master
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
885053e
to
8f9cb56
Compare
What is the motivation for this PR? If enable 2vlan config in topology file(such as ansible/vars/topo_t0-116.yml): change from vlan_configs: default_vlan_config: one_vlan_a to vlan_configs: default_vlan_config: two_vlan_a Then vlan name is not Vlan1000 anymore, it could be Vlan100 or Vlan200. So, in https://github.com/sonic-net/sonic-mgmt/pull/9334/files, it sets default vlan name to Vlan1000 in pytest_generate_tests for T0 is not very reasonable. How did you do it? So, in test_acl, for T0 topology, still get vlan name from config, not from vlan_name parameter, then test_acl can pass. How did you verify/test it? Run test_acl on testbed with 2vlan config.
…16345) What is the motivation for this PR? In the Impacted Area Based PR testing, we identify the modified feature folders for the changes in a PR. Previously, the code retrieved the same feature folder multiple times if multiple modified scripts were located within it. However, we only need to retrieve each feature folder once. This PR optimizes the process to eliminate redundant retrievals. How did you do it? Implement a check to skip retrieval if the feature folder has already been processed. How did you verify/test it? I modified three scripts under the same feature folder and verified that the feature folder was only retrieved once
What is the motivation for this PR? Recently testcase fails with the syslog error: 2024 Dec 30 15:13:59.746932 str3-7260cx3-acs-14 ERR ntpd[64974]: CLOCK: leapsecond file ('/usr/share/zoneinfo/leap-seconds.list'): expired less than 3 days ago How did you do it? Ignore this syslog error for now.
Summary: T2 Snappi based automation testcases, currently didnt have Different types of upstream neighbors support. As part of this PR, Different upstream neighbors support is added. Fixed a bug where t2_uplink_fanout_info was not parsed correctly. Po members flap test to skip if num_po_members < 2.
…st (#16157) What is the motivation for this PR? Elastictest performs well in distribute running PR test in multiple KVMs, which support us to add more test scripts to PR checker. But some traffic test can't be tested on KVM platform, we need to skip traffic test if needed How did you do it? This PR adds qos tests to the KVM-based PR test framework with the following scope and modifications: Excludes fanout switch-related configurations, which are not applicable in the KVM test environment. Traffic tests have been intentionally skipped due to the limitations of running traffic in the KVM environment. How did you verify/test it?
What is the motivation for this PR? In our Impacted Area Based PR testing, we noticed some failure beacuse of the error Could not get lock /var/lib/dpkg/lock-frontend. It is held by process 5166 (dpkg). In this PR, we add a timeout for acquiring dpkg lock to avoid this issue. How did you do it? In this PR, we add a timeout for acquiring dpkg lock to avoid this issue. How did you verify/test it?
The JR2 tuning values were copied over from the J2c+ tuning values here: #13660 But the JR2 tunings are slightly different as the shared buffer pool is different on JR2 vs J2c+. J2c+:
) We are seeing UnboundLocalError when running sonic-mgmt tests against a single-ASIC linecard: ``` UnboundLocalError: local variable 'dst_sys_port_id' referenced before assignment ``` Upon further investigation, this was determined to be happening because a previous attempt to fix this issue (PR #13700) completely omitted the ASIC prefix, but the entries in SYSTEM_PORT in config_db do have an Asic0 prefix even on a single ASIC DUT. Resolve this by specifically adding the Asic0 prefix in the case of a single-ASIC T2 DUT, instead of leaving the prefix out. Tested by manually running qos tests on a T2 single ASIC DUT with these changes.
Create /etc/tacacs folder on PTF when it's missing Why I did it TACACS test failed because /etc/tacacs+ folder does not exist, which recently missing on some version of PTF container image. Error message: E msg = Destination directory /etc/tacacs+ does not exist How I did it Create /etc/tacacs folder on PTF when it's missing How to verify it Pass all test case. Description for the changelog Create /etc/tacacs folder on PTF when it's missing
…ailed. (#16373) Remove golden config file and revert config when load golden config failed. Why I did it Fix bug #16338 TACACS test case reload golden config failed, and golden config not remove because a code bug, then all test case failed after that because login failed. How I did it add try finally to make sure golden config always remove after reload config. How to verify it Pass all test case. Description for the changelog Remove golden config file and revert config when load golden config failed.
Description of PR Optimize bgp/test_reliable_tsa.py with multithreading to reduce the running time. Summary: Fixes # (issue) Approach What is the motivation for this PR? The bgp/test_reliable_tsa.py takes a very long time to finish on T2 chassis (5.5h ~ 6h), so we wanted to optimize it using multithreading to reduce the running time. After the optimization, the running time is reduced to ~3.5h. How did you do it? How did you verify/test it? I ran the updated code and can confirm it's working as expected. Elastictest link with flaky test case re-run link co-authorized by: [email protected]
…e missing (#16357) What is the motivation for this PR? Sometimes exabgp in ptf would be in incorrect status by stress testing, hence add restarting exabgp before re-announce routes in sanity check. How did you do it? Restart exabgp before re-announce routes Add try catch to handle failed to re-announce issue How did you verify/test it? Run test with sanity check
…IPv6 neighbor addresses on KVM testbeds. (#16371) Temporarily skipping test_arp_update_for_failed_standby_neighbor for IPv6 neighbor addresses on KVM testbeds. Signed-off-by: Mahdi Ramezani <[email protected]>
In #8149 the multi-asic and multi-dut variants were added to test_qos_sai.py. This required updating calls to dynamically_compensate_leakout to specify either the src_client or dst_clientbut a couple calls inPGSharedWatermarkTest` passed the wrong client. For more details on the failure this causes see #16167 Summary: Fixes #16167
Fix test ipfwd/test_nhop_group.py for Arista 7050CX3 SKU
… updated. (#16396) Signed-off-by: Mahdi Ramezani <[email protected]>
Summary: Covers following TestGap on VOQ Chassis: [Test Gap]Chassis-VOQ: ECMP hashing tests when member goes down/UP #14985 Generic all platform ECMP Hashing test with Member flap trigger Type of change Bug fix Testbed and Framework(new/improvement) Test case(new/improvement) Back port request 202012 202205 202305 202311 202405 Approach What is the motivation for this PR? Currently there is no test case to cover ECMP hashing upon member flap trigger. This gap is present on pizza box DUTs as well. Also on VOQ chassis the ECMP member flap on one linecard, needs to get synced to remote linecards. We currently dont have testcase covering this. How did you do it? Wrote a new testcase test_ecmp_group_member_flap(common for all topology types). Underlying it utilizes the existing fib_test infrastructure. High level: Here, I am verifying ECMP member flap test on default route. The test will be skipped if DUT doesnt have default route/num paths <2. Initially the test verifies traffic forwarding and ECMP hashing of this default route. Then one of the member port is brought down, and the again traffic test and ECMP hashing test is carried out. Finally, the member port is brought back and traffic/hash test is carried out. For VOQ chassis, I pass an additional parameter(skip_src_ports) to ensure that ptf incoming traffic lands on remote linecard. This is to ensure ECMP member down/up is handled on remote linecards properly. How did you verify/test it? Verified the test on T2 Chassis. Also verified on T1 DUT.
What is the motivation for this PR? GNMI needs to use role to in API operations, for example, read-only user can't call write API. How did you do it? Modify current end to end test, default role should be rw. How did you verify/test it? Run gnmi end to end test.
1. Use sonic-ubuntu-1c instead of sonic-common. 2. Fix docker run command to reuse agent.
Align acl test for t0-isolated-d16u16s1
What is the motivation for this PR? Dualtor I/O tests are flaky due to the DUT is stuck in heartbeat suspension, so the DUT cannot react to any failure scenarios, this is causing some nightly failures like the following: the heartbeat suspension leaves the mux port in unhealthy state: E Failed: Database states don't match expected state standby,incorrect STATE_DB values { E "MUX_LINKMGR_TABLE|Ethernet48": { E "state": "unhealthy" E } E } if there is a failure triggered during the heartbeat suspension, the DUT will fail to toggle to the correct state: E Failed: Database states don't match expected state active,incorrect APP_DB values { E "HW_MUX_CABLE_TABLE:Ethernet60": { E "state": "standby" E }, E "MUX_CABLE_TABLE:Ethernet60": { E "state": "standby" E } E } why the DUT cannot react to failure scenarios when stuck in heartbeat suspension? When icmp_responder is not running, the active side is stuck in the following loop waiting the peer to take over; and the heartbeat suspension will backoff with max timeout as 128 * 100ms = 51s. If the icmp_responder is started and testcase continues to run with a failure scenario like link drop in this period, the DUT cannot detect the link drop failure as the active side is still in heartbeat suspension. # (unknown, active, up) ----------------------------> (unknown, wait, up) # ^ suspend timeout, probe mux | # | | # | | # | | # +---------------------------------------------------+ # mux active, suspend heartbeat with backoff why kvm PR tests hit this issue more often? This issue is easier reproducible when the active side reaches the maximum suspension timeout - 51s. This can be easily achieved in the kvm PR tests because the dualtor I/O testcases are executed right after the pretests(no icmp_responder is running) and the DUT can easily backoff the suspension timeout to the maximum. On physical testbed, the dualtor I/O testcases are interleaved with other testcases that might start the icmp_responder. How did you do it? Let's restart linkmgrd in the toggle fixture so all mux ports will come out of the heartbeat suspension state. How did you verify/test it? dualtor_io/test_heartbeat_failure.py::test_active_tor_heartbeat_failure_upstream[active-standby] PASSED [100%] Signed-off-by: Longxiang Lyu <[email protected]>
What is the motivation for this PR? Backplane port is necessary in our srv6 tests, but it will cause random failure in other tests. When the test packet dst IP matches the IP prefix advertised by the exabgp, the ptf backplane interface will receive the test packet from the neighbor VM. The reason is the routes are advertised by exabgp to VM through the ptf backplane interface. And methods like verify_packet_any_port() not only validate the packet is received by the expected ports, but also validate it's not received by the unexpected ports. How did you do it? Added configuration options in ptf initial to limite the configuration scenarios of backplane ports only in srv6 tests How did you verify/test it? we tested it via daily jenkins run Signed-off-by: linsongnan <[email protected]>
Co-authored-by: yatishkoul <[email protected]>
Co-authored-by: yatishkoul <[email protected]>
* Skip NVGRE hash tests for Broadcom SKUs * Review comments
…eration.py file (#16749) Description of PR Summary: The metric 'Rx_L1_rate_Gbps' is picked from traffic item statistics. However, in 10.80, this is not working causing index error. Switching it to Rx_L1_rate_bps in the statistics file. Fixes # (issue) #16731 Approach What is the motivation for this PR? Resolving index error in the traffic item statistics. Instead of capturing Gbps rates, which no longer exists, switched to bps rates. How did you do it? Single line of code: stats[metric_name+'_Rx_L1_Rate_bps'] = float(metric['Rx L1 Rate (bps)']) How did you verify/test it? Local clone. co-authorized by: [email protected]
Description of PR Summary: Fixes # (issue) After database service is ready, we observe that redis is not connectable yet and showed RuntimeError: Unable to connect to redis - Connection refused(1): Cannot assign requested address As a result, adding some delays here to ensure this will be available before running show command. Approach What is the motivation for this PR? Described above How did you do it? Added some sleep of 1 minute to ensure database is ready to connect How did you verify/test it? Verified on T2 Signed-off-by: Austin Pham <[email protected]>
What is the motivation for this PR? Skipping generic_config_updater/test_dynamic_acl.py on Q200 platforms as its not supported on Q200 How did you do it? Added a skip condition for generic_config_updater/test_dynamic_acl.py for Q200 platforms in tests/common/plugins/conditional_mark/tests_mark_conditions.yaml Type of change -Test modification Back port request -202311 -202405 How did you verify/test it? Made sure that TC is skipped on Q200 platforms co-authorized by: [email protected]
…2 devices get test_power_off_reboot.py fix (#16899) Description of PR Summary: Fixes #16898 Approach What is the motivation for this PR? PR #16736 reintroduces bug that we previously fixed in #16313 by making it only applying it to Cisco chassis. How did you do it? Let duthosts be passed to reboot_and_check for all T2/chassis devices How did you verify/test it? Run on non Cisco T2 device Signed-off-by: Javier Tan [email protected]
… area detection. (#16917) What is the motivation for this PR? In the Get Impacted Area stage, we use regular expressions to determine whether the changes in a PR are related to a specific feature. In this PR, we refine the regular expression to reduce false matches and improve accuracy. How did you do it? In this PR, we refine the regular expression to reduce false matches and improve accuracy. How did you verify/test it? For string like ansible/roles/files/ptftests, we can't match the pattern. And for string start with tests/, we can match this pattern.
Description of PR Add a module-level fixture for temporarily disabling route check for a test module Summary: Fixes # (issue) Microsoft ADO 31326413 Approach What is the motivation for this PR? In our recent Cisco T2 Nightly run, we observed that we would get the following error syslog during some test modules: E Failed: Processes "['analyze_logs--<MultiAsicSonicHost dut-lc1-1>']" failed with exit code "1" E Exception: E match: 1 E expected_match: 0 E expected_missing_match: 0 E E Match Messages: E 2025 Feb 3 03:03:29.550827 svcstr2-8800-lc1-1 ERR monit[914]: 'routeCheck' status failed (255) -- Failure results: {{#12 "asic1": {#12 "Unaccounted_ROUTE_ENTRY_TABLE_entries": [#12 "100.1.0.22/32",#12 After discussion, we decided to add a fixture so users can disable route check for a test module if they think that test tends to have such error syslog. How did you do it? How did you verify/test it? I ran the updated code and can confirm it's working well. co-authorized by: [email protected]
This PR adds qos tests to the KVM-based PR test framework with the following scope and modifications: * Update kvm qos config template * In kvm qos sai test, read kvm qos config from template json * Skip traffic test in qos sai test
…16897) What is the motivation for this PR? Need handle regex1 too How did you do it? Follow previous PR, modify the regex1 How did you verify/test it? >>> regex = re.compile(r'\b[A-Za-z]{3}\s{1,2}\d{1,2} \d{2}:\d{2}:\d{2}\.\d{6}\b') >>> syslog_msg_1 = "May 29 02:47:40.345257 str3-dut INFO" >>> syslog_msg_2 = "May 9 02:47:40.345257 str3-dut INFO" >>> syslog_msg_3 = "May 9 02:47:40.345257 str3-dut INFO" >>> syslog_msg_4 = "2024 May 29 02:47:40.345257 str3-dut INFO" >>> syslog_msg_5 = "2024 May 9 02:47:40.345257 str3-dut INFO" >>> syslog_msg_6 = "2024 May 9 02:47:40.345257 str3-dut INFO" >>> search_string = regex.search(syslog_msg_1) >>> search_string <re.Match object; span=(0, 22), match='May 29 02:47:40.345257'> >>> search_string.group() 'May 29 02:47:40.345257' >>> search_string = regex.search(syslog_msg_2) >>> search_string <re.Match object; span=(0, 21), match='May 9 02:47:40.345257'> >>> search_string.group() 'May 9 02:47:40.345257' >>> search_string = regex.search(syslog_msg_3) >>> search_string <re.Match object; span=(0, 22), match='May 9 02:47:40.345257'> >>> search_string.group() 'May 9 02:47:40.345257' >>> search_string = regex.search(syslog_msg_4) >>> search_string <re.Match object; span=(5, 27), match='May 29 02:47:40.345257'> >>> search_string.group() 'May 29 02:47:40.345257' >>> search_string = regex.search(syslog_msg_5) >>> search_string <re.Match object; span=(5, 26), match='May 9 02:47:40.345257'> >>> search_string.group() 'May 9 02:47:40.345257' >>> search_string = regex.search(syslog_msg_6) >>> search_string <re.Match object; span=(5, 27), match='May 9 02:47:40.345257'> >>> search_string.group() 'May 9 02:47:40.345257' >>> admin@bjw-can-7060-1:~$ admin@bjw-can-7060-1:~$ date -d 'May 29 02:47:40.345257' +%s%3N 1748486860345 admin@bjw-can-7060-1:~$ date -d 'May 9 02:47:40.345257' +%s%3N 1746758860345 admin@bjw-can-7060-1:~$ date -d 'May 9 02:47:40.345257' +%s%3N 1746758860345 admin@bjw-can-7060-1:~$ date -d 'May 29 02:47:40.345257' +%s%3N 1748486860345 admin@bjw-can-7060-1:~$ date -d 'May 9 02:47:40.345257' +%s%3N 1746758860345 admin@bjw-can-7060-1:~$ date -d 'May 9 02:47:40.345257' +%s%3N 1746758860345 admin@bjw-can-7060-1:~$
Increased timeout fixes the “Failed: Not all routes flushed from nexthop 10.0.0.25 on asic 0 on cmp210-4” errors seen in: pc/test_po_update.py::test_po_update::test_po_update_io_no_loss pc/test_po_voq.py::test_po_voq::test_voq_po_member_update
What is the motivation for this PR? There are cases where gnmi_get is called when server is not fully ready after rotation which is a timing issue for a few seconds. If we retry the gnmi_get call, it will succeed. Add wait_until to retry the client calls for a period of 30 seconds. How did you do it? Add wait_until to retry for 30 seconds How did you verify/test it? Manual test/pipeline
TSA-TSB service Testcases: Adjust the testcases to adhere to new behavior of config_reload
…16870) * Retry from the first provided password if none of the passwords work In the DeviceConnection class, if none of the passwords appear to work, then try again from the first provided password. The reason for this is if a a device initially needs some password specified in the alternate passwords list, but then later needs to use some earlier-specified password (because of some config change), then connection attempts will fail until a new DeviceConnection object is instantiated. Instead, work around that by trying the passwords in a loop (i.e. from the beginning again). This also means that the class doesn't really need to keep track of what password might be the "primary" password and what password might be alternates. Signed-off-by: Saikrishna Arcot <[email protected]> * Remove extra self Signed-off-by: Saikrishna Arcot <[email protected]> --------- Signed-off-by: Saikrishna Arcot <[email protected]>
Description of PR Summary: We need to recall collect_data again to update the stale state of wait_until. Fixes # (issue) Approach What is the motivation for this PR? How did you do it? recall the same collect_data function Signed-off-by: Austin Pham <[email protected]>
Description of PR Enable T2 auto health check and add running container info to return data. Summary: Fixes # (issue) Microsoft ADO 30293537 Approach What is the motivation for this PR? We want to enable T2 auto health check as we have now supported T2 auto recover. co-authorized by: [email protected]
Description of PR Disable parallel run for pc/test_po_cleanup.py test. Summary: Fixes # (issue) Microsoft ADO 31368079 Approach What is the motivation for this PR? There is config reload with wait_for_bgp=True in pc/test_po_cleanup.py, so when parallel run is enabled, LCs will do config reload at different time and we cannot make sure all the iBGP in one LC can be established after config reload within the given time limit. co-authorized by: [email protected]
…logy (#15905) Run ptfhost and ptfrunner based tests with macsec enabled topology. Enable send and receive macsec encrypted frames by overloading the testutils.send_packet and testutils.dp_poll APIs
What is the motivation for this PR? With 2vlan config on testbed, such as t0-118, by default, ipv4 range for vlan1000 is 192.168.0.1/25 and ipv4 range for vlan2000 192.168.0.129/25, so set increment to 65. Otherwise, incrementing by 129 will cause IP overlap within the second VLAN's IP range, 192.168.0.129. two_vlan_a: Vlan1000: id: 1000 intfs: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62] prefix: 192.168.0.1/25 prefix_v6: fc02:1000::1/64 tag: 1000 Vlan2000: id: 2000 intfs: [63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117] prefix: 192.168.0.129/25 prefix_v6: fc02:1000:0:1::1/64 tag: 2000 Get the ipv4 or ipv6 address for specific vlan interface from configuration. How did you do it? For vlan 1000, intf_ipv4_addr.network_address is 192.168.0.0, after increase 129, it becomes 192.168.0.129, which is same with ip address of the second vlan interface. ptf_intf_ipv4_addr = increment_ipv4_addr(intf_ipv4_addr.network_address, incr=129) Parse ipv4 or ipv6 address from config_facts['VLAN_INTERFACE'][vlan_name], different vlan_name will get differnet ip address How did you verify/test it? run tests/generic_config_updater/test_dynamic_acl.py -------------------------------------------------------------------------------------------------------------------------------------------------- live log sessionfinish --------------------------------------------------------------------------------------------------------------------------------------------------10:29:19 __init__.pytest_terminal_summary L0067 INFO | Can not get Allure report URL. Please check logs ====================================================================================================================================== 24 passed, 236 warnings in 4379.67s (1:12:59) =======================================================================================================================================DEBUG:tests.conftest:[log_custom_msg] item: <Function test_gcu_acl_nonexistent_table_removal[default-Vlan1000]> INFO:root:Can not get Allure report URL. Please check logs #### Any platform specific information?
Description of PR Split the bgp/test_reliable_tsa.py test into two parts. Summary: Fixes # (issue) Microsoft ADO Approach What is the motivation for this PR? We observed that the bgp/test_reliable_tsa.py test is taking too long to finish and there are some test cases that are very flaky. Therefore, we decided to split this test module into two parts: one for those stable test cases, and the other one for those flaky test cases. This will also make it easier to retry and debug the test cases when some of them failed. How did you do it? How did you verify/test it? I ran the split test modules and can confirm they are working well: https://elastictest.org/scheduler/testplan/67ac78446f7ee067ea75fe16?leftSideViewMode=detail co-authorized by: [email protected]
Summary: Created 2 new topology files for T2 Snappi based convergence tests. topo_tgen_t2_2lc_masic_route_conv.yml: for muti-asic testbed topo_tgen_t2_2lc_route_conv.yml: for single asic testbed
Description of PR Disable route checker for crm/test_crm.py::test_crm_nexthop_group test case. Summary: Fixes # (issue) Microsoft ADO 31326413 Approach What is the motivation for this PR? Recently, we found that we would also get the routeCheck log analyzer error in crm/test_crm.py::test_crm_nexthop_group test case, so we decided to disable route checker for it as well. How did you do it? How did you verify/test it? I ran the updated code and can confirm it's working well. co-authorized by: [email protected]
…buildimage#21201 (#16416) What is the motivation for this PR? Skip test_syslog_config_work_after_reboot How did you do it? when dut_mgmt_network is a sub_network of forced_mgmt_routes, skip it How did you verify/test it? run test_syslog_config_work_after_reboot Any platform specific information? no
…dward after port toggle (#16959) …dward after port toggle Description of PR Summary: Fixes flakiness nhop_group failures on chassis. Approach What is the motivation for this PR? Fixes flakiness nhop_group failures on chassis. We observe flakiness failure on chassis devices Suspect it's because the route is not programmed into hardware Add external sleep to make sure route is in hardware How did you do it? Add extra waiting time for chassis device port toggle tests. How did you verify/test it? Run on PR test, and physical test passes ipfwd/test_nhop_group.py::test_nhop_group_member_count PASSED [ 20%] ipfwd/test_nhop_group.py::test_nhop_group_member_order_capability[str3-xx-1-2] SKIPPED (Order ECMP is not configured so skipping the test-case) [ 40%] ipfwd/test_nhop_group.py::test_nhop_group_interface_flap[str3-xx-1-2] PASSED [ 60%] ipfwd/test_nhop_group.py::test_nhop_group_member_order_capability[str3-xx-1-0] SKIPPED (Order ECMP is not configured so skipping the test-case) [ 80%] ipfwd/test_nhop_group.py::test_nhop_group_interface_flap[str3-xx-1-2] PASSED [100%] authorized by: [email protected]
Description of PR Summary: Fixes #16436, caused by added BGP check in #15936, which doesn't account for T2 BGP time to come up Approach What is the motivation for this PR? Function wait_bgp_sessions timeout is too short for T2, fails in test_mgmt_ipv6_only test suite causing a fixture to error wrongly and not teardown properly leaving TB in bad state without ipv4 mgmt ip How did you do it? Increase timeout to 900s from 120s if duthost it is checking is supervisor How did you verify/test it? Run locally on T2 See for passing test: 17/01/2025 07:34:39 utilities.wait_until L0153 DEBUG | check_bgp_session_state_all_asics is False, wait 10 seconds and check again 17/01/2025 07:34:49 utilities.wait_until L0135 DEBUG | Time elapsed: 164.073392 seconds Confirming it needs more than 120 seconds Signed-off-by: Javier Tan [email protected] Co-authored-by: Jianquan Ye <[email protected]>
Summary: The previous test scripts did not present any messages when an assertion error is triggered, which makes it difficult for people who are not familiar with the SRv6 stack to triage the error. I added several assertion error messages to help people understand the errors.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description of PR
Summary:
Fixes # (issue)
Type of change
Approach
How did you do it?
How did you verify/test it?
Any platform specific information?
Supported testbed topology if it's a new test case?
Documentation