Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NVIDIA proprietary drivers fail on Wayland with the latest Linux kernel in NixOS unstable (24.11) #343774

Closed
ccicnce113424 opened this issue Sep 22, 2024 · 13 comments · Fixed by #344460
Labels
0.kind: bug Something is broken

Comments

@ccicnce113424
Copy link

Describe the bug

I'm encountering an issue with the NVIDIA proprietary drivers when using the latest Linux kernel on NixOS 24.11 (Vicuna) (version: 24.11.20240919.c04d565). When configured to use the latest kernel (boot.kernelPackages = pkgs.linuxPackages_latest;), SDDM fails to start on Wayland and results in a black screen. Occasionally, a frozen cursor appears at a low resolution on top of the black screen, and moving the mouse has no effect.

Additionally, when booting with the latest kernel, the system remains stuck on the Plymouth boot screen for a noticeably longer time compared to the stable kernel.

A similar issue has been reported in #323396.

Steps To Reproduce

Steps to reproduce the behavior:

  1. Add the following NixOS configuration:
    boot.kernelPackages = pkgs.linuxPackages_latest;
  2. Rebuild and restart the system.
  3. Attempt to run SDDM on Wayland or launch a KDE session on Wayland from X11.

Expected behavior

SDDM and KDE should run on Wayland with the latest Linux kernel and the NVIDIA proprietary driver, without black screen or frozen cursor issues.

Screenshots

N/A

Additional context

I have tested both the stable and beta versions of the NVIDIA drivers, as well as using both open and proprietary kernel modules, but the problem persists in all cases. Only reverting to the stable kernel resolves the issue. This issue occurs every time I use the latest kernel.

The multi-user value in the metadata is set to no because I had to run the nix-info command from the command-line interface.

Notify maintainers

@Kiskae
@edwtjo

Metadata

 - system: "x86_64-linux"
 - host os: Linux 6.11.0, NixOS, 24.11 (Vicuna), 24.11.20240919.c04d565
 - multi-user?: no
 - sandbox: yes
 - version: nix-env (Nix) 2.18.5
 - nixpkgs: /nix/store/hiasfhl8f5yy88hcfbr3s8s4bm63wsjw-source

Add a 👍 reaction to issues you find important.

@ccicnce113424 ccicnce113424 added the 0.kind: bug Something is broken label Sep 22, 2024
@ccicnce113424 ccicnce113424 changed the title NVIDIA proprietary drivers fail on Wayland with the latest Linux kernel in NixOS 24.11 NVIDIA proprietary drivers fail on Wayland with the latest Linux kernel in NixOS unstable (24.11) Sep 22, 2024
@Kiskae
Copy link
Contributor

Kiskae commented Sep 22, 2024

Could you use journalctl -b-<no> to check your system logs of previous boots to see if there was something related to sddm in there? Unless there is an error logged somewhere I'm not sure where to start looking.

@ccicnce113424
Copy link
Author

9月 22 22:49:03 ccic-desktop sddm-helper-start-wayland[1619]: Starting Wayland process "/nix/store/yxy38krm4jpq9f4xbb3i31bszyp5dvv3-kwin-6.1.5/bin/kwin_wayland --no-global-shortcuts --no-kactivities --no-lockscreen --locale1" "sddm"
9月 22 22:49:03 ccic-desktop sddm-helper-start-wayland[1619]: started succesfully "/nix/store/yxy38krm4jpq9f4xbb3i31bszyp5dvv3-kwin-6.1.5/bin/kwin_wayland --no-global-shortcuts --no-kactivities --no-lockscreen --locale1"
9月 22 22:49:03 ccic-desktop sddm-helper-start-wayland[1619]: "No backend specified, automatically choosing drm\n"
9月 22 22:49:03 ccic-desktop sddm-helper-start-wayland[1619]: Directory "/run/user/175" has changed, checking for Wayland socket
9月 22 22:49:03 ccic-desktop sddm-helper-start-wayland[1619]: Found Wayland socket "/run/user/175/wayland-0"
9月 22 22:49:03 ccic-desktop sddm-helper-start-wayland[1619]: "Accepting client connections on sockets: QList(\"wayland-0\")\n"
9月 22 22:49:03 ccic-desktop sddm-helper-start-wayland[1619]: "kwin_scene_opengl: No render nodes have been found, falling back to primary node\n"
9月 22 22:49:04 ccic-desktop sddm-helper-start-wayland[1619]: "kwin_scene_opengl: 0x502: GL_INVALID_OPERATION error generated. <image> and <target> are incompatible\n"
9月 22 22:49:04 ccic-desktop sddm-helper-start-wayland[1619]: "kwin_wayland_drm: Failed to create framebuffer: Invalid argument\n"
9月 22 22:49:05 ccic-desktop sddm-helper-start-wayland[1619]: "kwin_wayland_drm: Failed to create framebuffer: Invalid argument\n"
9月 22 22:49:08 ccic-desktop sddm-helper-start-wayland[1619]: "kwin_wayland_drm: Presentation failed! Invalid argument\n"
9月 22 22:49:09 ccic-desktop sddm-helper-start-wayland[1619]: "kwin_core: Applying output config failed!\n"
9月 22 22:49:09 ccic-desktop sddm-helper-start-wayland[1619]: "kwin_wayland_drm: Failed to create framebuffer: Invalid argument\n"
9月 22 22:49:09 ccic-desktop sddm-helper-start-wayland[1619]: "kwin_wayland_drm: Presentation failed! Permission denied\n"

@inclyc
Copy link
Member

inclyc commented Sep 23, 2024

Same issue here, I found workaround trick here.

@Kiskae
Copy link
Contributor

Kiskae commented Sep 23, 2024

Same issue here, I found workaround trick here.

That specifically appears to be about the GL_FRAMEBUFFER_INCOMPLETE_MISSING_ATTACHMENT +
GL_FRAMEBUFFER_INCOMPLETE_ATTACHMENT errors.

The OP issue meanwhile looks like the wayland server and the compositor disagreeing about which drm node is the render node.

If the OP still has the issue, I've got some more diagnostics I could use:

  1. Anything in the journal related to OpenGL, looking at similar issues it looks like kwin_scene_opengl prints a lot of information about the driver it is using on errors.
  2. While experiencing the issue, use ctrl+alt+f2 to go to the virtual console, log in and run nix shell nixpkgs#libdrm^bin -c drmdevice to get some information about the current drm nodes
    2a. at this point you might want to try sudo systemctl restart graphical.target to manually restart the GUI and see if it starts working.

@ccicnce113424
Copy link
Author

Same issue here, I found workaround trick here.

I'd like to try this trick, but I am a beginner with NixOS, and I don't know how to apply patches to the open kernel module.

@inclyc
Copy link
Member

inclyc commented Sep 24, 2024

Hi @ccicnce113424,

I'd like to try this trick, but I am a beginner with NixOS, and I don't know how to apply patches to the open kernel module.

I added the kernel params according to pbo's reply to my configuration.

using initcall_blacklist=simpledrm_platform_driver_init : simpledrm isnt loaded, tty is black with [drm] User-defined mode not supported: "1920x1080" , but if I enter login, password and launch Hyprland blindy it works.

boot.kernelParams = [
  "initcall_blacklist=simpledrm_platform_driver_init"
]

And I can confirm that the tty is black (sad) but the desktop environment (kde-wayland for me) works.

@ccicnce113424
Copy link
Author

Hi @ccicnce113424,

I'd like to try this trick, but I am a beginner with NixOS, and I don't know how to apply patches to the open kernel module.

I added the kernel params according to pbo's reply to my configuration.

using initcall_blacklist=simpledrm_platform_driver_init : simpledrm isnt loaded, tty is black with [drm] User-defined mode not supported: "1920x1080" , but if I enter login, password and launch Hyprland blindy it works.

boot.kernelParams = [
  "initcall_blacklist=simpledrm_platform_driver_init"
]

And I can confirm that the tty is black (sad) but the desktop environment (kde-wayland for me) works.

I tried it and the result was exactly the same. So this should be an error in the NVIDIA kernel module, unrelated to SDDM, KDE, and KWin.

@ccicnce113424
Copy link
Author

ccicnce113424 commented Sep 24, 2024

Same issue here, I found workaround trick here.

I'd like to try this trick, but I am a beginner with NixOS, and I don't know how to apply patches to the open kernel module.

I created a patch using the following commands from the pull request submitted by leigh123linux:

git clone https://github.com/leigh123linux/open-gpu-kernel-modules.git -b 611_drm_change 
cd open-gpu-kernel-modules 
git diff HEAD^1 > kernel-modules.patch

Then I applied the patch with the following settings:

hardware.nvidia.package = config.boot.kernelPackages.nvidiaPackages.beta.overrideAttrs {
  open = config.boot.kernelPackages.nvidiaPackages.beta.open.overrideAttrs {
    patches = [ ./kernel-modules.patch ];
  };
};

This change did not have any effect; SDDM still does not work properly.

@blakeashleyjr
Copy link

blakeashleyjr commented Sep 24, 2024

For those looking for a working system until this is fixed without going back too far in kernel versions, pinning to kernel version 6.10.11 resolves the issue for me:

boot.kernelPackages = pkgs.linuxPackagesFor (pkgs.linux_5_10.override {
    argsOverride = rec {
      src = pkgs.fetchurl {
            url = "mirror://kernel/linux/kernel/v6.x/linux-${version}.tar.xz";
            sha256 = "+02gRvjBhRWfRTfe2IejCsxp2RxVWg/3+rxFIPWaMJY=";
      };
      version = "6.10.11";
      modDirVersion = "6.10.11";
      };
  });

@Binary-Eater
Copy link
Member

For Linux kernel 6.11, we released a fix in our production branch release
550.120, which uses drm_fbdev_ttm_setup in place of drm_fbdev_generic_setup for
kernels 6.11 and above. A future release in the new feature branch will contain
this fix as well but we do not have a plan to make a release for this branch in
the near future. For reference, please feel free to extract production branch
release 550.120 and apply the changes to nvidia-drm as you see fit.

Our forum post detailing this: https://forums.developer.nvidia.com/t/drm-fbdev-wayland-presentation-support-with-linux-kernel-6-11-and-above/307920.

@Kiskae
Copy link
Contributor

Kiskae commented Sep 27, 2024

^ when that lands and you still experience the issue, make sure you're using the open driver. The proprietary driver does not get the patch.

@BenA0
Copy link

BenA0 commented Sep 27, 2024

^ when that lands and you still experience the issue, make sure you're using the open driver. The proprietary driver does not get the patch.

I assume this means Pascal (1000 series) and before aren't supported by this patch, since the open modules only support Turing and above, and until nvidia address it in the next major release (565?) are stuck on 6.10 kernels.

@fpletz
Copy link
Member

fpletz commented Sep 28, 2024

You can maybe try to get the patch to apply for the proprietary modules as @Kiskae mentioned in the PR but I didn't want to invest more time. PRs are welcome though. Until then we have to wait for Nvidia to fix it. Note that the production version has been fixed by Nvidia and has been merged into nixpkgs in #344524.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0.kind: bug Something is broken
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants