Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot unbind driver without killing the GUI #18

Open
midi1996 opened this issue Oct 7, 2021 · 5 comments
Open

Cannot unbind driver without killing the GUI #18

midi1996 opened this issue Oct 7, 2021 · 5 comments

Comments

@midi1996
Copy link

midi1996 commented Oct 7, 2021

This has been my issue for a long time, whenever the script (or manually) tries to unbind the driver, it will not work and hang with 100% CPU usage, and at worse the kernel will panic and the whole system freezes. The only solutions that I found are:

  • Method 1:
    • Kill the gui (display-manger or isolate multi-user) through tty3
    • unbind manually or run the script (using screen to keep it running in the bg)
    • start the gui
    • go back to screen
  • Method 2:
    • just blacklisting the driver

My intent is that I want to be able to run the vm and pass the GPU to it, and then bring it back to me when the vm isn't running, which I think is also one of the features in this script. So far I tried both master and unattended-win-install (which I think is the one being worked on). I do not like the first method as it's a pain to close all apps then re-open them each time the vm starts/shutdown.

Setup:

  • Lenovo Thinkpad P50
  • Intel HD P530 and Nvidia Quadro M2000M
  • OS: Fedora 34 and 35
  • kernel: 5.14.9-300.fc35.x86_64 (F35)
  • nvidia drivers tested: nouveau and the proprietary one.
@T-vK
Copy link
Owner

T-vK commented Oct 7, 2021

I think this happens to me occasionally as well, although I haven't checked the CPU usage and I have an AMD GPU in that laptop.
Unfortunately I have not been able to figure out what is causing this issue yet.
It's nice to hear that you found a workaround other than to reboot though. I guess that let's us rule out that the kernel is at fault.

unattended-win-install is indeed what you should be using. I just haven't merged it into the master because Ubuntu is not fully supported in that branch yet.

@midi1996
Copy link
Author

midi1996 commented Oct 8, 2021

Seems on ubuntu you can unload nouveau and nvidia as long as you kill the GUI, however on Fedora I get kp with nouveau when unloading, nvidia unloading works fine. It's still a bummer though to do this (kill GUI then go back to it).

@T-vK
Copy link
Owner

T-vK commented Oct 8, 2021

It might be worth it to try different kernel versions. Maybe the latest 5.15-rc4 or maybe an older version.
I must say, however, that I haven't tested mbpt on Fedora 35 at all yet, even though it should theoretically work.

@mauza
Copy link

mauza commented Nov 5, 2021

I'm experiencing something similar. I can't get past the unbinding nvidia driver

> Using a virtual OS drive...
> Warning: Bumblebee is not available or doesn't work properly. Continuing anyway...
> Retrieving and parsing DGPU IDs...
> Loading vfio-pci kernel module...
> Using Looking Glass...
> Calculating required buffer size for 1920x1080 for Looking Glass...
> Looking Glass buffer size set to: 32M
> Not using DGPU vBIOS override...
> Not using DGPU vBIOS override...
> Not using SMB share...
> Using dGPU passthrough...
> Unbinding dGPU from nvidia driver...

It just hangs there. I've tried killing it there and it leaves this command running which I can't kill sudo bash -c echo '0000:01:00.0' > '/sys/bus/pci/drivers/nvidia/unbind'

I have a dell precision 5760 with an A3000 GPU running fedora 34.

@mauza
Copy link

mauza commented Nov 5, 2021

I followed this: https://forum.level1techs.com/t/fedora-33-ultimiate-vfio-guide-for-2020-2021-wip/163814. I blacklisted the drivers and set them to only use vfio on boot. I commented out the binding stuff in the start vm script. I had to put in a MAC address I just put in a random one. Then I was able to start the VM. I got past the couldn't unbind stuff. I don't care if the dGPU is never usable on the host. If I need something it for something I can use it on another linux vm. Now I'm stuck on Networking, but I'll try to figure it out and start a new thread if I need something. Thanks so much for creating this, it is great.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants