Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

My 3rd Intel server does not seem to work. #23

Open
1 of 2 tasks
cswjl opened this issue Nov 2, 2024 · 4 comments
Open
1 of 2 tasks

My 3rd Intel server does not seem to work. #23

cswjl opened this issue Nov 2, 2024 · 4 comments
Labels
bug Something isn't working

Comments

@cswjl
Copy link

cswjl commented Nov 2, 2024

NVIDIA Open GPU Kernel Modules Version

550.90.07-p2p

Please confirm this issue does not happen with the proprietary driver (of the same version). This issue tracker is only for bugs specific to the open kernel driver.

  • I confirm that this does not happen with the proprietary driver package.

Operating System and Version

Ubuntu 22.04.4 LTS

Kernel Release

Linux ubuntu 6.5.0-26-generic

Please confirm you are running a stable release kernel (e.g. not a -rc). We do not accept bug reports for unreleased kernels.

  • I am running on a stable kernel release.

Hardware: GPU

NVIDIA GeForce RTX 4090

Describe the bug

After using ./install, I still find can not use p2p. The following diagram is the architecture of my server
image

nvidia-smi topo -p2p rw
GPU0 GPU1 GPU2 GPU3 GPU4 GPU5 GPU6 GPU7
GPU0 X CNS CNS CNS CNS CNS CNS CNS
GPU1 CNS X CNS CNS CNS CNS CNS CNS
GPU2 CNS CNS X CNS CNS CNS CNS CNS
GPU3 CNS CNS CNS X CNS CNS CNS CNS
GPU4 CNS CNS CNS CNS X CNS CNS CNS
GPU5 CNS CNS CNS CNS CNS X CNS CNS
GPU6 CNS CNS CNS CNS CNS CNS X CNS
GPU7 CNS CNS CNS CNS CNS CNS CNS X

Legend:

X = Self
OK = Status Ok
CNS = Chipset not supported
GNS = GPU not supported
TNS = Topology not supported
NS = Not supported
U = Unknown

To Reproduce

i just use ./install

Bug Incidence

Always

nvidia-bug-report.log.gz

n/a

More Info

No response

@cswjl cswjl added the bug Something isn't working label Nov 2, 2024
@cswjl
Copy link
Author

cswjl commented Nov 2, 2024

This is the output:./install.sh
make -C src/nvidia
make -C src/nvidia-modeset
make[1]: Entering directory '/home/ubuntu/Desktop/open-gpu-kernel-modules-550.90.07-p2p/src/nvidia'
make[1]: Entering directory '/home/ubuntu/Desktop/open-gpu-kernel-modules-550.90.07-p2p/src/nvidia-modeset'
make[1]: Nothing to be done for 'default'.
make[1]: Leaving directory '/home/ubuntu/Desktop/open-gpu-kernel-modules-550.90.07-p2p/src/nvidia-modeset'
cd kernel-open/nvidia-modeset/ && ln -sf ../../src/nvidia-modeset/_out/Linux_x86_64/nv-modeset-kernel.o nv-modeset-kernel.o_binary
make[1]: Nothing to be done for 'default'.
make[1]: Leaving directory '/home/ubuntu/Desktop/open-gpu-kernel-modules-550.90.07-p2p/src/nvidia'
cd kernel-open/nvidia/ && ln -sf ../../src/nvidia/_out/Linux_x86_64/nv-kernel.o nv-kernel.o_binary
make -C kernel-open modules
make[1]: Entering directory '/home/ubuntu/Desktop/open-gpu-kernel-modules-550.90.07-p2p/kernel-open'
make[2]: Entering directory '/usr/src/linux-headers-6.5.0-26-generic'
warning: the compiler differs from the one used to build the kernel
The kernel was built by: x86_64-linux-gnu-gcc-12 (Ubuntu 12.3.0-1ubuntu122.04) 12.3.0
You are using: cc (Ubuntu 12.3.0-1ubuntu1
22.04) 12.3.0
make[2]: Leaving directory '/usr/src/linux-headers-6.5.0-26-generic'
make[1]: Leaving directory '/home/ubuntu/Desktop/open-gpu-kernel-modules-550.90.07-p2p/kernel-open'
make -C kernel-open modules_install
make[1]: Entering directory '/home/ubuntu/Desktop/open-gpu-kernel-modules-550.90.07-p2p/kernel-open'
make[2]: Entering directory '/usr/src/linux-headers-6.5.0-26-generic'
INSTALL /lib/modules/6.5.0-26-generic/kernel/drivers/video/nvidia.ko
INSTALL /lib/modules/6.5.0-26-generic/kernel/drivers/video/nvidia-uvm.ko
INSTALL /lib/modules/6.5.0-26-generic/kernel/drivers/video/nvidia-modeset.ko
INSTALL /lib/modules/6.5.0-26-generic/kernel/drivers/video/nvidia-drm.ko
INSTALL /lib/modules/6.5.0-26-generic/kernel/drivers/video/nvidia-peermem.ko
SIGN /lib/modules/6.5.0-26-generic/kernel/drivers/video/nvidia-peermem.ko
SIGN /lib/modules/6.5.0-26-generic/kernel/drivers/video/nvidia-drm.ko
SIGN /lib/modules/6.5.0-26-generic/kernel/drivers/video/nvidia-modeset.ko
SIGN /lib/modules/6.5.0-26-generic/kernel/drivers/video/nvidia.ko
SIGN /lib/modules/6.5.0-26-generic/kernel/drivers/video/nvidia-uvm.ko
DEPMOD /lib/modules/6.5.0-26-generic
Warning: modules_install: missing 'System.map' file. Skipping depmod.
make[2]: Leaving directory '/usr/src/linux-headers-6.5.0-26-generic'
make[1]: Leaving directory '/home/ubuntu/Desktop/open-gpu-kernel-modules-550.90.07-p2p/kernel-open'
Sat Nov 2 21:41:39 2024

@mylesgoose
Copy link

DEPMOD /lib/modules/6.5.0-26-generic
Warning: modules_install: missing 'System.map' file. Skipping depmod. you will ahev to run deepmod manually. sudo depmod
sudo update initramfs

@cswjl
Copy link
Author

cswjl commented Nov 3, 2024

DEPMOD /lib/modules/6.5.0-26-generic 警告:modules_install:缺少“System.map”文件。跳过 depmod。您将需要手动运行 deepmod。sudo depmod sudo update initramfs

Thank you for your patient reply. After using "sudo depmod" and "sudo update-initramfs -u" it still does not work. Is there any additional information I can provide?

@mylesgoose
Copy link

What driver do you have a loaded. Search your root file system for this file nvidia.ko. and give me the file sizes. And tell me location of each. Have you tried just backing up the currently load3d 4 modules and replacing them with the ones compiled and rebooting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants