-
Notifications
You must be signed in to change notification settings - Fork 176
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support GetPreferredAllocation #443
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -51,6 +51,7 @@ which are available on a Kubernetes host | |
- Supports devices with both Kernel and userspace (UIO and VFIO) drivers | ||
- Allows resource grouping using "selector(s)" | ||
- User configurable resourceName | ||
- User configurable policy for preferred device allocation | ||
- Detects Kubelet restarts and auto-re-register | ||
- Detects Link status (for Linux network devices) and updates associated VFs health accordingly | ||
- Extensible to support new device types with minimal effort if not already supported | ||
|
@@ -269,17 +270,27 @@ This plugin creates device plugin endpoints based on the configurations given in | |
|
||
`"resourceList"` should contain a list of config objects. Each config object may consist of following fields: | ||
|
||
| Field | Required | Description | Type/Defaults | Example/Accepted values | | ||
|-------------------|----------|----------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------|------------------------------------------------------------------------| | ||
| "resourceName" | Y | Endpoint resource name. Should not contain special characters including hyphens and must be unique in the scope of the resource prefix | string | "sriov_net_A" | | ||
| "resourcePrefix" | N | Endpoint resource prefix name override. Should not contain special characters | string Default : "intel.com" | "yourcompany.com" | | ||
| "deviceType" | N | Device Type for a resource pool. | string value of supported types. Default: "netDevice" | Currently supported values: "accelerator", "netDevice", "auxNetDevice" | | ||
| "excludeTopology" | N | Exclude advertising of device's NUMA topology | bool Default: "false" | "excludeTopology": true | | ||
| "selectors" | N | Either a single device selector map or a list of maps. The list syntax is preferred. The "deviceType" value determines the device selector options. | json list of objects or json object. Default: null | Example: "selectors": [{"vendors": ["8086"],"devices": ["154c"]}] | | ||
| "additionalInfo" | N | A map of map to add additional information to the pod via environment variables to devices | json object as string Default: null | Example: "additionalInfo": {"*": {"token": "3e49019f-412f-4f02-824e-4cd195944205"}} | | ||
| Field | Required | Description | Type/Defaults | Example/Accepted values | | ||
|--------------------|----------|----------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------|------------------------------------------------------------------------| | ||
| "resourceName" | Y | Endpoint resource name. Should not contain special characters including hyphens and must be unique in the scope of the resource prefix | string | "sriov_net_A" | | ||
| "resourcePrefix" | N | Endpoint resource prefix name override. Should not contain special characters | string Default : "intel.com" | "yourcompany.com" | | ||
| "deviceType" | N | Device Type for a resource pool. | string value of supported types. Default: "netDevice" | Currently supported values: "accelerator", "netDevice", "auxNetDevice" | | ||
| "allocationPolicy" | N | Preferred device allocation policy for a resource pool | string value of supported allocation policy. Default: "" | Currently supported values: "", "packed" | | ||
| "excludeTopology" | N | Exclude advertising of device's NUMA topology | bool Default: "false" | "excludeTopology": true | | ||
| "selectors" | N | Either a single device selector map or a list of maps. The list syntax is preferred. The "deviceType" value determines the device selector options. | json list of objects or json object. Default: null | Example: "selectors": [{"vendors": ["8086"],"devices": ["154c"]}] | | ||
| "additionalInfo" | N | A map of map to add additional information to the pod via environment variables to devices | json object as string Default: null | Example: "additionalInfo": {"*": {"token": "3e49019f-412f-4f02-824e-4cd195944205"}} | | ||
|
||
Note: "resourceName" must be unique only in the scope of a given prefix, including the one specified globally in the CLI params, e.g. "example.com/10G", "acme.com/10G" and "acme.com/40G" are perfectly valid names. | ||
|
||
#### Allocation Policy | ||
|
||
The "allocationPolicy" value determines which device in a resource pool is allocated. Each policy acts as following: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. lets add some additional info that allocation policy is best effort, since it just advertises to kubelet which devices are preferred to be allocated. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That is , we need to say what happens in case a policy cannot be satisfied for an allocation. e.g say we have 3 devices to be allocated but only 2 can be allocated from the same "parent" |
||
|
||
| Policy | Description | | ||
|----------|---------------------------------------------------| | ||
| "" | Disable preferred device allocation functionality | | ||
| "packed" | Try to allocate VFs from same PF | | ||
|
||
#### Device selectors | ||
|
||
The "selectors" field accepts both a single object and a list of selector objects. While both formats are supported, the list syntax is preferred. When using the list syntax, each selector object is evaluated in the order present in the list. For example, a single object would look like: | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,112 @@ | ||
// Copyright 2022 Intel Corp. All Rights Reserved. | ||
// | ||
// Licensed under the Apache License, Version 2.0 (the "License"); | ||
// you may not use this file except in compliance with the License. | ||
// You may obtain a copy of the License at | ||
// | ||
// http://www.apache.org/licenses/LICENSE-2.0 | ||
// | ||
// Unless required by applicable law or agreed to in writing, software | ||
// distributed under the License is distributed on an "AS IS" BASIS, | ||
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
// See the License for the specific language governing permissions and | ||
// limitations under the License. | ||
|
||
package resources | ||
|
||
import ( | ||
"sort" | ||
|
||
"github.com/golang/glog" | ||
|
||
pluginapi "k8s.io/kubelet/pkg/apis/deviceplugin/v1beta1" | ||
|
||
"github.com/k8snetworkplumbingwg/sriov-network-device-plugin/pkg/types" | ||
) | ||
|
||
// DeviceSet is to hold and manipulate a set of HostDevice | ||
type DeviceSet map[string]types.HostDevice | ||
adrianchiris marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
// PackedAllocator implements the Allocator interface | ||
type PackedAllocator struct { | ||
} | ||
|
||
// NewPackedAllocator create instance of PackedAllocator | ||
func NewPackedAllocator() *PackedAllocator { | ||
return &PackedAllocator{} | ||
} | ||
|
||
// NewDeviceSet is to create an empty DeviceSet | ||
func NewDeviceSet() DeviceSet { | ||
set := make(DeviceSet) | ||
return set | ||
} | ||
|
||
// Insert is to add a HostDevice in DeviceSet | ||
func (ds *DeviceSet) Insert(pciAddr string, device types.HostDevice) { | ||
(*ds)[pciAddr] = device | ||
} | ||
|
||
// Delete is to delete a HostDevice in DeviceSet | ||
func (ds *DeviceSet) Delete(pciAddr string) { | ||
delete(*ds, pciAddr) | ||
} | ||
|
||
// AsSortedStrings is to sort the DeviceSet and return the sorted keys | ||
func (ds *DeviceSet) AsSortedStrings() []string { | ||
keys := make([]string, 0, len(*ds)) | ||
for k := range *ds { | ||
keys = append(keys, k) | ||
} | ||
sort.Strings(keys) | ||
return keys | ||
} | ||
|
||
// Allocate return the preferred allocation | ||
func (pa *PackedAllocator) Allocate(rqt *pluginapi.ContainerPreferredAllocationRequest, rp types.ResourcePool) []string { | ||
size := rqt.AllocationSize | ||
preferredDevices := make([]string, 0) | ||
|
||
if size <= 0 { | ||
glog.Warningf("Allocator(): requested number of devices are negative. requested: %d", size) | ||
return []string{} | ||
} | ||
|
||
if len(rqt.AvailableDeviceIDs) < int(size) { | ||
glog.Warningf("Allocator(): not enough number of devices were available. available: %d, requested: %d", len(rqt.AvailableDeviceIDs), size) | ||
return []string{} | ||
} | ||
|
||
if len(rqt.MustIncludeDeviceIDs) > int(size) { | ||
glog.Warningf("Allocator(): allocated number of devices exceeded the number of requested devices. allocated: %d, requested: %d", | ||
len(rqt.MustIncludeDeviceIDs), | ||
size) | ||
} | ||
|
||
availableSet := NewDeviceSet() | ||
for _, available := range rqt.AvailableDeviceIDs { | ||
dev, ok := rp.GetDevicePool()[available] | ||
if ok { | ||
availableSet.Insert(available, dev) | ||
} else { | ||
glog.Warningf("Allocator(): not available device id was specified: %s", available) | ||
return []string{} | ||
} | ||
} | ||
for _, required := range rqt.MustIncludeDeviceIDs { | ||
_, ok := rp.GetDevicePool()[required] | ||
if ok { | ||
availableSet.Delete(required) | ||
} else { | ||
glog.Warningf("Allocator(): not available device was included: %s", required) | ||
return []string{} | ||
} | ||
} | ||
sortedAvailableSet := availableSet.AsSortedStrings() | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This will work if device to be allocated is a PCI device and device ID is a PCI device ID. This is not the case for AuxiliaryDevices of type SF (subfunction). so allocator will not work as expected in this case unless the auxiliary device was created in certain order. to achieve correctness in such case, best approach is to have a separate allocator for auxiliaryNetDevices where it would perform the check of being allocated from the same parent PCI device. based on auxiliaryNetDevice pfAddr field. (have buckets of auxiliaryNetDevice by their parent PCI device then allocate from the same bucket) Alternative is to block creation of a resource pool if the type is AuxNetDevice and non default policy is provided. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is something that needs to brought up in our weekly community meetings to get an agreement on which direction its preferrable to go. technically i think we can use the same allocator for all if implemented in a different way than whats proposed in this PR |
||
|
||
preferredDevices = append(preferredDevices, rqt.MustIncludeDeviceIDs...) | ||
if len(preferredDevices) < int(size) { | ||
preferredDevices = append(preferredDevices, sortedAvailableSet[:int(size)-len(preferredDevices)]...) | ||
} | ||
return preferredDevices | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: allocationPolicy -> preferredAllocationPolicy ?
the latter name might be too long but it emphasized that its just a preference, thoughts ?