Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support GetPreferredAllocation #443

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 19 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,7 @@ which are available on a Kubernetes host
- Supports devices with both Kernel and userspace (UIO and VFIO) drivers
- Allows resource grouping using "selector(s)"
- User configurable resourceName
- User configurable policy for preferred device allocation
- Detects Kubelet restarts and auto-re-register
- Detects Link status (for Linux network devices) and updates associated VFs health accordingly
- Extensible to support new device types with minimal effort if not already supported
Expand Down Expand Up @@ -269,17 +270,27 @@ This plugin creates device plugin endpoints based on the configurations given in

`"resourceList"` should contain a list of config objects. Each config object may consist of following fields:

| Field | Required | Description | Type/Defaults | Example/Accepted values |
|-------------------|----------|----------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------|------------------------------------------------------------------------|
| "resourceName" | Y | Endpoint resource name. Should not contain special characters including hyphens and must be unique in the scope of the resource prefix | string | "sriov_net_A" |
| "resourcePrefix" | N | Endpoint resource prefix name override. Should not contain special characters | string Default : "intel.com" | "yourcompany.com" |
| "deviceType" | N | Device Type for a resource pool. | string value of supported types. Default: "netDevice" | Currently supported values: "accelerator", "netDevice", "auxNetDevice" |
| "excludeTopology" | N | Exclude advertising of device's NUMA topology | bool Default: "false" | "excludeTopology": true |
| "selectors" | N | Either a single device selector map or a list of maps. The list syntax is preferred. The "deviceType" value determines the device selector options. | json list of objects or json object. Default: null | Example: "selectors": [{"vendors": ["8086"],"devices": ["154c"]}] |
| "additionalInfo" | N | A map of map to add additional information to the pod via environment variables to devices | json object as string Default: null | Example: "additionalInfo": {"*": {"token": "3e49019f-412f-4f02-824e-4cd195944205"}} |
| Field | Required | Description | Type/Defaults | Example/Accepted values |
|--------------------|----------|----------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------|------------------------------------------------------------------------|
| "resourceName" | Y | Endpoint resource name. Should not contain special characters including hyphens and must be unique in the scope of the resource prefix | string | "sriov_net_A" |
| "resourcePrefix" | N | Endpoint resource prefix name override. Should not contain special characters | string Default : "intel.com" | "yourcompany.com" |
| "deviceType" | N | Device Type for a resource pool. | string value of supported types. Default: "netDevice" | Currently supported values: "accelerator", "netDevice", "auxNetDevice" |
| "allocationPolicy" | N | Preferred device allocation policy for a resource pool | string value of supported allocation policy. Default: "" | Currently supported values: "", "packed" |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: allocationPolicy -> preferredAllocationPolicy ?

the latter name might be too long but it emphasized that its just a preference, thoughts ?

| "excludeTopology" | N | Exclude advertising of device's NUMA topology | bool Default: "false" | "excludeTopology": true |
| "selectors" | N | Either a single device selector map or a list of maps. The list syntax is preferred. The "deviceType" value determines the device selector options. | json list of objects or json object. Default: null | Example: "selectors": [{"vendors": ["8086"],"devices": ["154c"]}] |
| "additionalInfo" | N | A map of map to add additional information to the pod via environment variables to devices | json object as string Default: null | Example: "additionalInfo": {"*": {"token": "3e49019f-412f-4f02-824e-4cd195944205"}} |

Note: "resourceName" must be unique only in the scope of a given prefix, including the one specified globally in the CLI params, e.g. "example.com/10G", "acme.com/10G" and "acme.com/40G" are perfectly valid names.

#### Allocation Policy

The "allocationPolicy" value determines which device in a resource pool is allocated. Each policy acts as following:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets add some additional info that allocation policy is best effort, since it just advertises to kubelet which devices are preferred to be allocated.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is , we need to say what happens in case a policy cannot be satisfied for an allocation.

e.g say we have 3 devices to be allocated but only 2 can be allocated from the same "parent"


| Policy | Description |
|----------|---------------------------------------------------|
| "" | Disable preferred device allocation functionality |
| "packed" | Try to allocate VFs from same PF |

#### Device selectors

The "selectors" field accepts both a single object and a list of selector objects. While both formats are supported, the list syntax is preferred. When using the list syntax, each selector object is evaluated in the order present in the list. For example, a single object would look like:
Expand Down
19 changes: 18 additions & 1 deletion pkg/factory/factory.go
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,12 @@ func (rf *resourceFactory) GetResourceServer(rp types.ResourcePool) (types.Resou
if prefixOverride := rp.GetResourcePrefix(); prefixOverride != "" {
prefix = prefixOverride
}
return resources.NewResourceServer(prefix, rf.endPointSuffix, rf.pluginWatch, rp), nil
policy := rp.GetAllocationPolicy()
allocator, err := rf.GetAllocator(policy)
if err != nil {
return nil, err
}
return resources.NewResourceServer(prefix, rf.endPointSuffix, rf.pluginWatch, rp, allocator), nil
}
return nil, fmt.Errorf("factory: unable to get resource pool object")
}
Expand Down Expand Up @@ -221,3 +226,15 @@ func (rf *resourceFactory) GetDeviceFilter(rc *types.ResourceConfig) ([]interfac
func (rf *resourceFactory) GetNadUtils() types.NadUtils {
return netdevice.NewNadUtils()
}

// GetAllocator returns an instance of Allocator using preferredAllocationPolicy
func (rf *resourceFactory) GetAllocator(policy string) (types.Allocator, error) {
switch policy {
case "":
return nil, nil
case "packed":
return resources.NewPackedAllocator(), nil
default:
return nil, fmt.Errorf("GetAllocator(): invalid policy %s", policy)
}
}
25 changes: 24 additions & 1 deletion pkg/factory/factory_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -624,12 +624,35 @@ var _ = Describe("Factory", func() {
f := factory.NewResourceFactory("fake", "fake", true)
rp := mocks.ResourcePool{}
rp.On("GetResourcePrefix").Return("overridden").
On("GetResourceName").Return("fake")
On("GetResourceName").Return("fake").
On("GetAllocationPolicy").Return("")
rs, e := f.GetResourceServer(&rp)
It("should not fail", func() {
Expect(e).NotTo(HaveOccurred())
Expect(rs).NotTo(BeNil())
})
})
})
DescribeTable("getting allocator",
func(policy string, shouldSucceed bool, expected reflect.Type) {
f := factory.NewResourceFactory("fake", "fake", true)
allocator, error := f.GetAllocator(policy)

if shouldSucceed {
Expect(error).NotTo(HaveOccurred())
} else {
Expect(allocator).To(BeNil())
}

// if statement below because gomega refuses to do "nil == nil" assertions
if expected != nil {
Expect(reflect.TypeOf(allocator)).To(Equal(expected))
} else {
Expect(reflect.TypeOf(allocator)).To(BeNil())
}
},
Entry("packed", "packed", true, reflect.TypeOf(resources.NewPackedAllocator())),
Entry("empty", "", true, reflect.TypeOf(nil)),
Entry("invalid value", "invalid policy", false, reflect.TypeOf(nil)),
)
})
112 changes: 112 additions & 0 deletions pkg/resources/allocator.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,112 @@
// Copyright 2022 Intel Corp. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.

package resources

import (
"sort"

"github.com/golang/glog"

pluginapi "k8s.io/kubelet/pkg/apis/deviceplugin/v1beta1"

"github.com/k8snetworkplumbingwg/sriov-network-device-plugin/pkg/types"
)

// DeviceSet is to hold and manipulate a set of HostDevice
type DeviceSet map[string]types.HostDevice
adrianchiris marked this conversation as resolved.
Show resolved Hide resolved

// PackedAllocator implements the Allocator interface
type PackedAllocator struct {
}

// NewPackedAllocator create instance of PackedAllocator
func NewPackedAllocator() *PackedAllocator {
return &PackedAllocator{}
}

// NewDeviceSet is to create an empty DeviceSet
func NewDeviceSet() DeviceSet {
set := make(DeviceSet)
return set
}

// Insert is to add a HostDevice in DeviceSet
func (ds *DeviceSet) Insert(pciAddr string, device types.HostDevice) {
(*ds)[pciAddr] = device
}

// Delete is to delete a HostDevice in DeviceSet
func (ds *DeviceSet) Delete(pciAddr string) {
delete(*ds, pciAddr)
}

// AsSortedStrings is to sort the DeviceSet and return the sorted keys
func (ds *DeviceSet) AsSortedStrings() []string {
keys := make([]string, 0, len(*ds))
for k := range *ds {
keys = append(keys, k)
}
sort.Strings(keys)
return keys
}

// Allocate return the preferred allocation
func (pa *PackedAllocator) Allocate(rqt *pluginapi.ContainerPreferredAllocationRequest, rp types.ResourcePool) []string {
size := rqt.AllocationSize
preferredDevices := make([]string, 0)

if size <= 0 {
glog.Warningf("Allocator(): requested number of devices are negative. requested: %d", size)
return []string{}
}

if len(rqt.AvailableDeviceIDs) < int(size) {
glog.Warningf("Allocator(): not enough number of devices were available. available: %d, requested: %d", len(rqt.AvailableDeviceIDs), size)
return []string{}
}

if len(rqt.MustIncludeDeviceIDs) > int(size) {
glog.Warningf("Allocator(): allocated number of devices exceeded the number of requested devices. allocated: %d, requested: %d",
len(rqt.MustIncludeDeviceIDs),
size)
}

availableSet := NewDeviceSet()
for _, available := range rqt.AvailableDeviceIDs {
dev, ok := rp.GetDevicePool()[available]
if ok {
availableSet.Insert(available, dev)
} else {
glog.Warningf("Allocator(): not available device id was specified: %s", available)
return []string{}
}
}
for _, required := range rqt.MustIncludeDeviceIDs {
_, ok := rp.GetDevicePool()[required]
if ok {
availableSet.Delete(required)
} else {
glog.Warningf("Allocator(): not available device was included: %s", required)
return []string{}
}
}
sortedAvailableSet := availableSet.AsSortedStrings()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will work if device to be allocated is a PCI device and device ID is a PCI device ID.

This is not the case for AuxiliaryDevices of type SF (subfunction).
the ID/Name of such device is as follows: mlx5_core.sf.4 (4 being sort of device/instance index)

so allocator will not work as expected in this case unless the auxiliary device was created in certain order.

to achieve correctness in such case, best approach is to have a separate allocator for auxiliaryNetDevices where it would perform the check of being allocated from the same parent PCI device. based on auxiliaryNetDevice pfAddr field.

(have buckets of auxiliaryNetDevice by their parent PCI device then allocate from the same bucket)

Alternative is to block creation of a resource pool if the type is AuxNetDevice and non default policy is provided.
Then deal with that as a followup.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is something that needs to brought up in our weekly community meetings to get an agreement on which direction its preferrable to go.

technically i think we can use the same allocator for all if implemented in a different way than whats proposed in this PR


preferredDevices = append(preferredDevices, rqt.MustIncludeDeviceIDs...)
if len(preferredDevices) < int(size) {
preferredDevices = append(preferredDevices, sortedAvailableSet[:int(size)-len(preferredDevices)]...)
}
return preferredDevices
}
Loading
Loading