You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Sometimes, the agent creates multiple device plugin sockets (under /var/lib/kubelet/device-plugins) for a single discovered device.
Notice how both /var/lib/kubelet/device-plugins/udev-video-9d8a82-1674249624.sock and /var/lib/kubelet/device-plugins/udev-video-9d8a82-1674249625.sock are created for instance udev-video-9d8a82.
kagold@kagold-ThinkPad-X1-Carbon-6th:~/projects/akri-notes$ sudo ls /var/lib/kubelet/device-plugins
kubelet_internal_checkpoint udev-video-9d8a82-1674249624.sock udev-video-d804b0-1674249623.sock
kubelet.sock udev-video-9d8a82-1674249625.sock udev-video-d804b0-1674249625.sock
This means that when the Akri Configuration is later deleted, the extra sockets still persist:
It seems like what is happening is that the discovery handler is re-sending the discovered devices before the DiscoveryOperator has successfully created the device plugins and the instances have been created. Notice how handle_discovery_results is called twice for each device.
Agent Logs
The agent logs the creation of both /var/lib/kubelet/device-plugins/udev-video-9d8a82-1674249624.sock and /var/lib/kubelet/device-plugins/udev-video-9d8a82-1674249625.sock for instance udev-video-9d8a82.
[2023-01-20T21:20:23Z TRACE agent::util::discovery_operator] internal_do_discover - got discovery results [Device { id: "/dev/video0", properties: {"UDEV_DEVNODE": "/dev/video0"}, mounts: [Mount { container_path: "/dev/video0", host_path: "/dev/video0", read_only: true }], device_specs: [] }, Device { id: "/dev/video2", properties: {"UDEV_DEVNODE": "/dev/video2"}, mounts: [Mount { container_path: "/dev/video2", host_path: "/dev/video2", read_only: true }], device_specs: [] }]
[2023-01-20T21:20:23Z TRACE agent::util::discovery_operator] handle_discovery_results - for config udev-video with discovery results [Device { id: "/dev/video0", properties: {"UDEV_DEVNODE": "/dev/video0"}, mounts: [Mount { container_path: "/dev/video0", host_path: "/dev/video0", read_only: true }], device_specs: [] }, Device { id: "/dev/video2", properties: {"UDEV_DEVNODE": "/dev/video2"}, mounts: [Mount { container_path: "/dev/video2", host_path: "/dev/video2", read_only: true }], device_specs: [] }]
[2023-01-20T21:20:23Z TRACE agent::util::discovery_operator] handle_discovery_results - new instance udev-video-d804b0 came online
[2023-01-20T21:20:23Z INFO agent::util::device_plugin_builder] build_device_plugin - entered for device udev-video-d804b0
[2023-01-20T21:20:23Z INFO agent::util::device_plugin_builder] serve - creating a device plugin server that will listen at: /var/lib/kubelet/device-plugins/udev-video-d804b0-1674249623.sock
[2023-01-20T21:20:24Z TRACE agent::util::discovery_operator] internal_do_discover - got discovery results [Device { id: "/dev/video2", properties: {"UDEV_DEVNODE": "/dev/video2"}, mounts: [Mount { container_path: "/dev/video2", host_path: "/dev/video2", read_only: true }], device_specs: [] }, Device { id: "/dev/video0", properties: {"UDEV_DEVNODE": "/dev/video0"}, mounts: [Mount { container_path: "/dev/video0", host_path: "/dev/video0", read_only: true }], device_specs: [] }]
[2023-01-20T21:20:24Z TRACE agent::util::discovery_operator] handle_discovery_results - for config udev-video with discovery results [Device { id: "/dev/video2", properties: {"UDEV_DEVNODE": "/dev/video2"}, mounts: [Mount { container_path: "/dev/video2", host_path: "/dev/video2", read_only: true }], device_specs: [] }, Device { id: "/dev/video0", properties: {"UDEV_DEVNODE": "/dev/video0"}, mounts: [Mount { container_path: "/dev/video0", host_path: "/dev/video0", read_only: true }], device_specs: [] }]
[2023-01-20T21:20:24Z TRACE agent::util::discovery_operator] handle_discovery_results - new instance udev-video-9d8a82 came online
[2023-01-20T21:20:24Z INFO agent::util::device_plugin_builder] build_device_plugin - entered for device udev-video-9d8a82
[2023-01-20T21:20:24Z INFO agent::util::device_plugin_builder] serve - creating a device plugin server that will listen at: /var/lib/kubelet/device-plugins/udev-video-9d8a82-1674249624.sock
[2023-01-20T21:20:25Z INFO agent::util::device_plugin_builder] register - entered for Instance akri.sh/udev-video-d804b0 and socket_name: udev-video-d804b0-1674249623.sock
[2023-01-20T21:20:25Z TRACE agent::util::device_plugin_builder] register - before call to register with the kubelet at socket /var/lib/kubelet/device-plugins/kubelet.sock
[2023-01-20T21:20:25Z TRACE agent::util::device_plugin_service] get_device_plugin_options - kubelet called get_device_plugin_options
[2023-01-20T21:20:25Z INFO agent::util::device_plugin_builder] register - entered for Instance akri.sh/udev-video-9d8a82 and socket_name: udev-video-9d8a82-1674249624.sock
[2023-01-20T21:20:25Z TRACE agent::util::device_plugin_builder] register - before call to register with the kubelet at socket /var/lib/kubelet/device-plugins/kubelet.sock
[2023-01-20T21:20:25Z TRACE agent::util::discovery_operator] handle_discovery_results - new instance udev-video-9d8a82 came online
[2023-01-20T21:20:25Z INFO agent::util::device_plugin_builder] build_device_plugin - entered for device udev-video-9d8a82
[2023-01-20T21:20:25Z INFO agent::util::device_plugin_builder] serve - creating a device plugin server that will listen at: /var/lib/kubelet/device-plugins/udev-video-9d8a82-1674249625.sock
[2023-01-20T21:20:25Z INFO agent::util::device_plugin_service] list_and_watch - kubelet called list_and_watch for instance udev-video-d804b0
[2023-01-20T21:20:25Z TRACE agent::util::device_plugin_service] get_device_plugin_options - kubelet called get_device_plugin_options
[2023-01-20T21:20:25Z TRACE agent::util::discovery_operator] handle_discovery_results - new instance udev-video-d804b0 came online
[2023-01-20T21:20:25Z INFO agent::util::device_plugin_builder] build_device_plugin - entered for device udev-video-d804b0
[2023-01-20T21:20:25Z INFO agent::util::device_plugin_service] list_and_watch - kubelet called list_and_watch for instance udev-video-9d8a82
[2023-01-20T21:20:25Z INFO agent::util::device_plugin_builder] serve - creating a device plugin server that will listen at: /var/lib/kubelet/device-plugins/udev-video-d804b0-1674249625.sock
[2023-01-20T21:20:26Z INFO agent::util::device_plugin_builder] register - entered for Instance akri.sh/udev-video-d804b0 and socket_name: udev-video-d804b0-1674249625.sock
[2023-01-20T21:20:26Z INFO agent::util::device_plugin_builder] register - entered for Instance akri.sh/udev-video-9d8a82 and socket_name: udev-video-9d8a82-1674249625.sock
[2023-01-20T21:20:26Z TRACE agent::util::device_plugin_builder] register - before call to register with the kubelet at socket /var/lib/kubelet/device-plugins/kubelet.sock
[2023-01-20T21:20:26Z TRACE agent::util::device_plugin_builder] register - before call to register with the kubelet at socket /var/lib/kubelet/device-plugins/kubelet.sock
[2023-01-20T21:20:26Z TRACE agent::util::device_plugin_service] get_device_plugin_options - kubelet called get_device_plugin_options
[2023-01-20T21:20:26Z INFO agent::util::device_plugin_service] list_and_watch - kubelet called list_and_watch for instance udev-video-d804b0
[2023-01-20T21:20:27Z TRACE agent::util::device_plugin_service] get_device_plugin_options - kubelet called get_device_plugin_options
[2023-01-20T21:20:27Z INFO agent::util::device_plugin_service] list_and_watch - kubelet called list_and_watch for instance udev-video-9d8a82
Potential solution
To avoid the race case of a device plugin being recreated while creation is in process, we should add the instance to the instance map before calling build_device_pluginhere, setting (a new type of status of) InstanceConnectivityStatus::Connecting. Then once the DevicePluginService has been called by kubelet and the instance has been created, the status can be updated instead of created here. This may require wrapping the list_and_watch_mesage_sender of InstanceInfo in an option.
The text was updated successfully, but these errors were encountered:
Issue has been automatically marked as stale due to inactivity for 90 days. Update the issue to remove label, otherwise it will be automatically closed.
Describe the bug
Sometimes, the agent creates multiple device plugin sockets (under
/var/lib/kubelet/device-plugins
) for a single discovered device.Notice how both
/var/lib/kubelet/device-plugins/udev-video-9d8a82-1674249624.sock
and/var/lib/kubelet/device-plugins/udev-video-9d8a82-1674249625.sock
are created for instanceudev-video-9d8a82
.This means that when the Akri Configuration is later deleted, the extra sockets still persist:
It seems like what is happening is that the discovery handler is re-sending the discovered devices before the
DiscoveryOperator
has successfully created the device plugins and the instances have been created. Notice howhandle_discovery_results
is called twice for each device.Agent Logs
The agent logs the creation of both
/var/lib/kubelet/device-plugins/udev-video-9d8a82-1674249624.sock
and/var/lib/kubelet/device-plugins/udev-video-9d8a82-1674249625.sock
for instanceudev-video-9d8a82
.Potential solution
To avoid the race case of a device plugin being recreated while creation is in process, we should add the instance to the instance map before calling
build_device_plugin
here, setting (a new type of status of)InstanceConnectivityStatus::Connecting
. Then once theDevicePluginService
has been called by kubelet and the instance has been created, the status can be updated instead of created here. This may require wrapping thelist_and_watch_mesage_sender
ofInstanceInfo
in an option.The text was updated successfully, but these errors were encountered: