Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Talos shutdown stucks on mlx4_shutdown #10143

Open
paraddise opened this issue Jan 16, 2025 · 3 comments
Open

Talos shutdown stucks on mlx4_shutdown #10143

paraddise opened this issue Jan 16, 2025 · 3 comments

Comments

@paraddise
Copy link

Bug Report

Description

We connect pci device SRIOV and use it for network communications (cilium and drbd), i.e. we use IP over InfiniBand.
When I reboot nodes with small load on this devices like control plane or smal ingress nodes it reboots normally.
But when i'm trying to reboot worker node with several drbd resource connected over ib device talos stucks on mlx4_shutdown was called.

Logs

Can't copy as text.

Screenshot 2025-01-16 at 11 35 15

Environment

  • Talos version:
Client:
        Tag:         v1.8.2
        SHA:         88f861a0
        Built:       
        Go version:  go1.22.8
        OS/Arch:     darwin/amd64
Server:
        NODE:        10.21.20.11
        Tag:         v1.9.1
        SHA:         348472f9
        Built:       
        Go version:  go1.23.4
        OS/Arch:     linux/amd64
        Enabled:     RBAC
  • Kubernetes version: 1.30.3
  • Platform: Proxmox 8.2.2
@frezbo
Copy link
Member

frezbo commented Jan 16, 2025

was this just normal talosctl reboot which does a kexec, you'd need to use talosctl reboot --mode=powercycle some drivers are not designed to properly shutdown on kexec

@paraddise
Copy link
Author

Thanks I'll try it next time

@paraddise
Copy link
Author

No success, same error

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants