Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[xcvrd] [cmis manager] CMIS manager cannot automatically select correct host lane count when selecting module application #19336

Open
Junchao-Mellanox opened this issue Jun 18, 2024 · 11 comments · May be fixed by sonic-net/sonic-platform-daemons#507
Assignees
Labels
MSFT Triaged this issue has been triaged

Comments

@Junchao-Mellanox
Copy link
Collaborator

Description

When CMIS manager is enabled, following configuration will cause port link down:

speed: 100G
lanes: 0,1,2,3,4,5,6,7
module supported application: 100GAUI-2, 400GAUI-8

CMIS manager will deduce host lane count 8 from "0,1,2,3,4,5,6,7", and it will try to find an application by using speed 100G and host lane count 8. It cannot find a proper application because the supported application is 100G 4 lanes.

CMIS manager should be smart enough to automatically choose 100GAUI-2 via 100G lane 2.

A workaround for this issue is to set lanes to:

lanes: 0,1

But there is no CLI to set port lanes.

Steps to reproduce the issue:

  1. Say we have a port speed=400G, lanes="0,1,2,3,4,5,6,7", the link is up
  2. Change port speed to 100G: config interface speed EthernetX 100G
  3. link is down

Describe the results you received:

Link is down. And error log:

Jun 18 11:06:25.350031 sonic ERR pmon#xcvrd: CMIS: Ethernet240: no suitable app for the port appl None host_lane_count 8 host_speed 100000

Describe the results you expected:

xcvrd should be able to automatically choose the best application possible by using the current speed and a subset of the lanes.

Output of show version:

(paste your output here)

Output of show techsupport:

(paste your output here or download and attach the file here )

Additional information you deem important (e.g. issue happens only occasionally):

@ishidawataru
Copy link
Collaborator

If we have a port and a corresponding transceiver with the following configuration and capability, which app should xcvrd choose for the port?

speed: 100G
lanes: 0,1,2,3,4,5,6,7
module supported application: 100GAUI-2, 100GAUI-4

@Junchao-Mellanox
Copy link
Collaborator Author

This is a good question to discuss. From my POV, we should try lane number 8->4->2->1. So, I would prefer 100GAUI-4 in this case. @prgeor, @mihirpat1 , what do you think?

@ishidawataru
Copy link
Collaborator

Does configuring the breakout setting solve the problem?

As for the example you showed,

speed: 100G
lanes: 0,1,2,3,4,5,6,7
module supported application: 100GAUI-2, 400GAUI-8

How about setting the breakout configuration like below?

$ config interface breakout Ethernet240 "4x100G"

@prgeor prgeor added Triaged this issue has been triaged MSFT labels Jul 3, 2024
@prgeor prgeor self-assigned this Jul 3, 2024
@prgeor
Copy link
Contributor

prgeor commented Jul 5, 2024

@ishidawataru CMISmanager will select the application based upon what is there in the config DB's PORT table. Please share your CONFIG_DB'S PORT table dump here for 100G speed.

@ishidawataru
Copy link
Collaborator

@prgeor The current CMIS manager implementation searches for a module application that matches both speed and host lane counts. @Junchao-Mellanox is pointing out this behavior causes port link down with the following configuration and needs improvement.

When CMIS manager is enabled, following configuration will cause port link down:

speed: 100G
lanes: 0,1,2,3,4,5,6,7
module supported application: 100GAUI-2, 400GAUI-8

I initially agreed and implemented sonic-net/sonic-platform-daemons#507 as a draft PR.
However, after that, I realized that we can change the host lane counts by configuring DPB, which should also fix the problem without any modification to the current implementation.

Currently, I'm waiting for @Junchao-Mellanox's response.

@Junchao-Mellanox
Copy link
Collaborator Author

Hi @ishidawataru , DPB is not a perfect solution for this. As far as I know, DPB has many limitations, for example, it cannot automatically adjust other port related configuration when doing DPB. Also, it is not user friendly to ask sonic user to do an extra DPB configuration when hit this.

@ishidawataru
Copy link
Collaborator

ishidawataru commented Jul 9, 2024

@Junchao-Mellanox Does the SAI require any modification to support this? What happens when the port is configured as 100G with lanes 0,1,2,3,4,5,6,7 on the platform with 50G/lane for example? Will lanes 0 and 1 be used in this case?

@Junchao-Mellanox
Copy link
Collaborator Author

Hi @ishidawataru , it depends on how vendor implement this. Currently, I don't see a problem on nvidia platform regarding SAI.

@ishidawataru
Copy link
Collaborator

ishidawataru commented Jul 10, 2024

@Junchao-Mellanox How does the NVIDIA SAI choose the lanes to use for a speed configuration if there are multiple choices?
When the switch ASIC supports multiple lane speeds, I think we have the same problem that I mentioned above for the switch ASIC.

If we have a port and a corresponding transceiver with the following configuration and capability, which app should xcvrd choose for the port?

speed: 100G
lanes: 0,1,2,3,4,5,6,7
module supported application: 100GAUI-2, 100GAUI-4

Does the NVIDIA SAI choose the lane counts as you mentioned?

From my POV, we should try lane number 8->4->2->1.

If that is the case, does it make sense to spec this behavior in SAI so that the xcvrd implementation can work with non-NVIDIA SAI?

@Junchao-Mellanox
Copy link
Collaborator Author

Hi @ishidawataru , this is not a problem for ASIC side configuration. User has the ability to choose how many lanes shall be used by ASIC. Here is the sonic config:

config interface type Ethernet0 CR2 (2 means 2 lanes)

@ishidawataru
Copy link
Collaborator

@Junchao-Mellanox I see, in that case, can xcvrd use that configuration as a hint to choose the module application?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
MSFT Triaged this issue has been triaged
Projects
None yet
3 participants