Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Devlink Error message #413

Merged
merged 5 commits into from
Mar 2, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion pkg/utils/utils.go
Original file line number Diff line number Diff line change
Expand Up @@ -71,7 +71,7 @@ func GetPfName(pciAddr string) (string, error) {
if pfEswitchMode == "" {
// If device doesn't support eswitch mode query or doesn't have sriov enabled,
// fall back to the default implementation
if err == nil || strings.Contains(strings.ToLower(fmt.Sprint(err)), "no such device") {
if err == nil || strings.Contains(strings.ToLower(fmt.Sprint(err)), "error getting devlink device attributes for net device") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think we are intentionally matching on the internal error returned by netlink.

+@almaslennikov

@Eoghan1232 are you hitting an issue with intel NICs not returning "no such device" error when trying to get devlink device attributes ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adrianchiris "no such device" maybe gets returned in real environment but in a test environment with mock sysfs you won't necessarily get this exact error.
For example, in some external code that uses this function and have unit test with mock sysfs they might get "no such file or directory" instead. In that case, it breaks the non-devlink detection logic here.

Since GetPfEswitchMode() returns error string from GetDevLinkDeviceEswitchAttrs() it's safe to rely on that error message only if the underlying netlink fails to find such device in a system - real or mock.

Copy link
Contributor

@adrianchiris adrianchiris Feb 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adrianchiris "no such device" maybe gets returned in real environment but in a test environment with mock sysfs you won't necessarily get this exact error.

if there is a mock sysfs ,cant you also mock GetDevLinkDeviceEswitchAttrs ?

we do not want to fallback to the regular flow if netlink failed for reason other than

  • device doesnt support eswitch mode query
  • device doesnt have sriov enabled

since in this case, if device is in switchdev mode but failed for a reason other than the above, we would want to fail else we risk returning the wrong PF name.

I can think of two things:

  1. adjust tests (netlink provider is an interface)
  2. change this to a list of expected strings and check if one is a substring of the returned error

do you have in house e2e tests that just use mock sysfs ? there are other calls to netlink in general are you not calling them ? (i.e GetLinkAttr)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adrianchiris "no such device" maybe gets returned in real environment but in a test environment with mock sysfs you won't necessarily get this exact error.

if there is a mock sysfs ,cant you also mock GetDevLinkDeviceEswitchAttrs ?

we do not want to fallback to the regular flow if netlink failed for reason other than

  • device doesnt support eswitch mode query
  • device doesnt have sriov enabled

This is where the problem is. It does not come to the 2nd option at all because the first option return the error as with GetPfEswitchMode(pciAddr) returned ("", err). So it evaluates error string comparison and it evaluates to false and return GetPfName() to caller with error. Never get a chance to check sriov enable part. Please review the flow again.

since in this case, if device is in switchdev mode but failed for a reason other than the above, we would want to fail else we risk returning the wrong PF name.

I can think of two things:

  1. adjust tests (netlink provider is an interface)
  2. change this to a list of expected strings and check if one is a substring of the returned error

do you have in house e2e tests that just use mock sysfs ? there are other calls to netlink in general are you not calling them ? (i.e GetLinkAttr)

The NetlinkProvider interface is internal to this package only. External client of this package cannot and should not have to mock any internal interfaces of this package. The utils.GetPfName() function should be testable with a minimal mock sysfs as it used to be.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is where the problem is. It does not come to the 2nd option at all because the first option return the error as with GetPfEswitchMode(pciAddr) returned ("", err). So it evaluates error string comparison and it evaluates to false and return GetPfName() to caller with error. Never get a chance to check sriov enable part. Please review the flow again.

Do you mean the call to GetPfAddr() fails in GetPfEswitchMode() and returns error to GetPfName() ?
if GetPfAddr() fails, it means it failed to read symlink /sys/bus/pci/devices/<pci addr>/physfn
so in this case we would fail in GetPfName() even if we fallback to the previous flow since we try to read from the same base path. (unless in the mocked environment /sys/bus/pci/devices/<pci addr>/physfn is not a symlink)

or do you mean, GetPfAddr() call in GetPfEswitchMode() returns the PF addr of the VF (or its own address in case of PF) however the devlink call which follows this fails as the isnt actually a device ?

can you walk me through the flow to help me understand ? what does GetPfName gets for input ? what is the mocked sysfs ?

The NetlinkProvider interface is internal to this package only.

actually its public and you can get and set the implementation, defaulting to defaultNetlinkProvider which implements the NetlinkProvider interface, you can set a mock for it.

function should be testable with a minimal mock sysfs as it used to be.

I do not agree with this statement, i dont want to impose such constraints on implementation.
there are other places where we want to use devlink/netlink or other external libraries.

#399
#387
#306

Copy link
Contributor

@adrianchiris adrianchiris Mar 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Eoghan1232 could you rebase the PR ? there are some changes introduced with #410

i think we can simplify the whole flow flow here:

	pfEswitchMode, err := GetPfEswitchMode(pciAddr)
	if pfEswitchMode == "" || err != nil {
			glog.Infof("Devlink query for eswitch mode is not supported for device %s. err: %v", pciAddr, err)
	} else if pfEswitchMode == eswitchModeSwitchdev {
		name, err := GetSriovnetProvider().GetUplinkRepresentor(pciAddr)
		if err != nil {
			return "", err
		}

		return name, nil
	}

	path := filepath.Join(sysBusPci, pciAddr, "physfn", "net")

which should work with a mocked sysfs test env

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR rebased.

glog.Infof("Devlink query for eswitch mode is not supported for device %s. %v", pciAddr, err)
} else {
return "", err
Expand Down
2 changes: 1 addition & 1 deletion pkg/utils/utils_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -447,7 +447,7 @@ var _ = Describe("In the utils package", func() {
On("GetDevLinkDeviceEswitchAttrs", "devlinkDeviceSwitchdev").
Return(&nl.DevlinkDevEswitchAttr{Mode: "switchdev"}, nil).
On("GetDevLinkDeviceEswitchAttrs", "nonDevlinkDevice").
Return(nil, fmt.Errorf("devlink error: no such device")).
Return(nil, fmt.Errorf("error getting devlink device attributes for net device")).
On("GetDevLinkDeviceEswitchAttrs", "nonSriovDevice").
Return(&nl.DevlinkDevEswitchAttr{Mode: ""}, nil).
On("GetDevLinkDeviceEswitchAttrs", "unknownDevice").
Expand Down