Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tar: *.ko: Cannot stat: No such file or directory #1

Open
DroiSystem opened this issue Sep 29, 2016 · 13 comments
Open

tar: *.ko: Cannot stat: No such file or directory #1

DroiSystem opened this issue Sep 29, 2016 · 13 comments

Comments

@DroiSystem
Copy link

$./build.sh 367.27 alpha 1097.0.0

I am following your instructions, but during the ./build.sh step I get the following message:

++ basename -a 'pkg/run_files/1097.0.0/NVIDIA-Linux-x86_64-367.27/kernel/*.ko'

  • tar -C pkg/run_files/1097.0.0/NVIDIA-Linux-x86_64-367.27/kernel -cvj '*.ko'
    tar: *.ko: Cannot stat: No such file or directory
    tar: Exiting with failure status due to previous errors

Did I miss something?

@jpap
Copy link

jpap commented Dec 10, 2016

I get something similar:

coreos coreos-nvidia # ./build.sh 275.09.07 stable 1185.5.0 
Downloading CoreOS stable developer image 1185.5.0
Decompressing
Downloading NVIDIA Linux drivers version 275.09.07
/home/core/coreos-nvidia/pkg/run_files/1185.5.0 /home/core/coreos-nvidia
Creating directory NVIDIA-Linux-x86_64-275.09.07
Verifying archive integrity... OK
Uncompressing NVIDIA Accelerated Graphics Driver for Linux-x86_64 275.09.07.............................................................................................................................................
/home/core/coreos-nvidia
Spawning container coreos_developer_container.bin.1185.5.0 on /home/core/coreos-nvidia/coreos_developer_container.bin.1185.5.0.
Press ^] three times within 1s to kill container.
+ VERSION=275.09.07
+ echo Building 275.09.07
Building 275.09.07
+ emerge-gitclone
>>> Cloning repository 'portage-stable' from 'https://github.com/coreos/portage-stable.git'...
>>> Starting git clone in /var/lib/portage/portage-stable
Cloning into '/var/lib/portage/portage-stable'...
remote: Counting objects: 65997, done.
remote: Compressing objects: 100% (69/69), done.
remote: Total 65997 (delta 31), reused 0 (delta 0), pack-reused 65927
Receiving objects: 100% (65997/65997), 41.55 MiB | 9.87 MiB/s, done.
Resolving deltas: 100% (27864/27864), done.
Checking connectivity... done.
>>> Git clone in /var/lib/portage/portage-stable successful
>>> Cloning repository 'coreos' from 'https://github.com/coreos/coreos-overlay.git'...
>>> Starting git clone in /var/lib/portage/coreos-overlay
Cloning into '/var/lib/portage/coreos-overlay'...
remote: Counting objects: 31678, done.
remote: Compressing objects: 100% (29/29), done.
remote: Total 31678 (delta 4), reused 0 (delta 0), pack-reused 31649
Receiving objects: 100% (31678/31678), 10.77 MiB | 7.41 MiB/s, done.
Resolving deltas: 100% (15268/15268), done.
Checking connectivity... done.
>>> Git clone in /var/lib/portage/coreos-overlay successful
Container coreos_developer_container.bin.1185.5.0 terminated by signal KILL.
+ ARTIFACT_DIR=pkg/run_files/1185.5.0/NVIDIA-Linux-x86_64-275.09.07
+ VERSION=275.09.07
+ COMBINED_VERSION=1185.5.0-275.09.07
+ TOOLS='nvidia-debugdump nvidia-cuda-mps-control nvidia-xconfig nvidia-modprobe nvidia-smi nvidia-cuda-mps-server
nvidia-persistenced nvidia-settings'
++ basename -a pkg/run_files/1185.5.0/NVIDIA-Linux-x86_64-275.09.07/libGL.so.275.09.07 pkg/run_files/1185.5.0/NVIDIA-Linux-x86_64-275.09.07/libOpenCL.so.1.0.0 pkg/run_files/1185.5.0/NVIDIA-Linux-x86_64-275.09.07/libXvMCNVIDIA.so.275.09.07 pkg/run_files/1185.5.0/NVIDIA-Linux-x86_64-275.09.07/libcuda.so.275.09.07 pkg/run_files/1185.5.0/NVIDIA-Linux-x86_64-275.09.07/libglx.so.275.09.07 pkg/run_files/1185.5.0/NVIDIA-Linux-x86_64-275.09.07/libnvcuvid.so.275.09.07 pkg/run_files/1185.5.0/NVIDIA-Linux-x86_64-275.09.07/libnvidia-cfg.so.275.09.07 pkg/run_files/1185.5.0/NVIDIA-Linux-x86_64-275.09.07/libnvidia-compiler.so.275.09.07 pkg/run_files/1185.5.0/NVIDIA-Linux-x86_64-275.09.07/libnvidia-glcore.so.275.09.07 pkg/run_files/1185.5.0/NVIDIA-Linux-x86_64-275.09.07/libnvidia-ml.so.275.09.07 pkg/run_files/1185.5.0/NVIDIA-Linux-x86_64-275.09.07/libnvidia-tls.so.275.09.07 pkg/run_files/1185.5.0/NVIDIA-Linux-x86_64-275.09.07/libnvidia-wfb.so.275.09.07 pkg/run_files/1185.5.0/NVIDIA-Linux-x86_64-275.09.07/libvdpau.so.275.09.07 pkg/run_files/1185.5.0/NVIDIA-Linux-x86_64-275.09.07/libvdpau_nvidia.so.275.09.07 pkg/run_files/1185.5.0/NVIDIA-Linux-x86_64-275.09.07/libvdpau_trace.so.275.09.07
+ tar -C pkg/run_files/1185.5.0/NVIDIA-Linux-x86_64-275.09.07 -cvj libGL.so.275.09.07 libOpenCL.so.1.0.0 libXvMCNVIDIA.so.275.09.07 libcuda.so.275.09.07 libglx.so.275.09.07 libnvcuvid.so.275.09.07 libnvidia-cfg.so.275.09.07 libnvidia-compiler.so.275.09.07 libnvidia-glcore.so.275.09.07 libnvidia-ml.so.275.09.07 libnvidia-tls.so.275.09.07 libnvidia-wfb.so.275.09.07 libvdpau.so.275.09.07 libvdpau_nvidia.so.275.09.07 libvdpau_trace.so.275.09.07
libGL.so.275.09.07
libOpenCL.so.1.0.0
libXvMCNVIDIA.so.275.09.07
libcuda.so.275.09.07
libglx.so.275.09.07
libnvcuvid.so.275.09.07
libnvidia-cfg.so.275.09.07
libnvidia-compiler.so.275.09.07
libnvidia-glcore.so.275.09.07
libnvidia-ml.so.275.09.07
libnvidia-tls.so.275.09.07
libnvidia-wfb.so.275.09.07
libvdpau.so.275.09.07
libvdpau_nvidia.so.275.09.07
libvdpau_trace.so.275.09.07
+ tar -C pkg/run_files/1185.5.0/NVIDIA-Linux-x86_64-275.09.07 -cvj nvidia-debugdump nvidia-cuda-mps-control nvidia-xconfig nvidia-modprobe nvidia-smi nvidia-cuda-mps-server nvidia-persistenced nvidia-settings
tar: nvidia-debugdump: Cannot stat: No such file or directory
tar: nvidia-cuda-mps-control: Cannot stat: No such file or directory
nvidia-xconfig
tar: nvidia-modprobe: Cannot stat: No such file or directory
nvidia-smi
tar: nvidia-cuda-mps-server: Cannot stat: No such file or directory
tar: nvidia-persistenced: Cannot stat: No such file or directory
nvidia-settings
tar: Exiting with failure status due to previous errors
++ basename -a 'pkg/run_files/1185.5.0/NVIDIA-Linux-x86_64-275.09.07/kernel/*.ko'
+ tar -C pkg/run_files/1185.5.0/NVIDIA-Linux-x86_64-275.09.07/kernel -cvj '*.ko'
tar: *.ko: Cannot stat: No such file or directory
tar: Exiting with failure status due to previous errors

@jpap
Copy link

jpap commented Dec 10, 2016

Looks like we're exiting the container after emerge-gitclone in the /build.sh script.

@therc
Copy link
Contributor

therc commented Dec 10, 2016

Sorry, I missed this. I now have a filter to highlight these issues.

Which systemd version are you running on the machine building the driver? I saw similar problems with older releases. That's why the README says "tested on version 229, there might be issues with <= 225".

Another thing worth checking... are you running low on disk?

@therc
Copy link
Contributor

therc commented Dec 10, 2016

One measure that I could take right away is stopping execution right away. There is no point in continuing, it only obfuscates the real issue.

When I last debugged this, I traced it to the main event loop in nspawn exiting in an unexpected fashion (the KILL signal came from nspawn itself). Upgrading systemd made the problem go away, so I didn't investigate further.

@jpap
Copy link

jpap commented Dec 17, 2016

Which systemd version are you running on the machine building the driver? I saw similar problems with older releases. That's why the README says "tested on version 229, there might be issues with <= 225".

Version 231, running under 4.7.3-coreos-r2:

# systemctl --version
systemd 231
+PAM +AUDIT +SELINUX +IMA -APPARMOR +SMACK -SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT -GNUTLS -ACL +XZ -LZ4 +SECCOMP +BLKID -ELFUTILS +KMOD -IDN

I do agree that this issue is a bug in systemd, and not your scripts.

Another thing worth checking... are you running low on disk?

No, I went to the trouble of expanding the image first with dd, then gdisk and e2size.

One measure that I could take right away is stopping execution right away. There is no point in continuing, it only obfuscates the real issue.

Good idea!

@therc
Copy link
Contributor

therc commented Dec 22, 2016

Speaking of -coreos-r2... I found and fixed a silly bug that affected 1185.2.0 and up. Can you retry with the latest commit?

@therc
Copy link
Contributor

therc commented Jan 24, 2017

I think I found the culprit. Running systemd-nspawn with --share-system increases the chances of a SIGKILL. Please try the latest version, I have just pushed a number of fixes. With the latest CoreOS security fix, you might need to add the --emerge-sources flag, since CoreOS engineers seem to not have uploaded portage binary packages for the those versions. To compensate for the larger of packages built now, the scripts build 4 of them at a time.

@dashesy
Copy link

dashesy commented Mar 6, 2017

I tried the latest version, got the same error:

$ ./build.sh 367.57 stable 1298.5.0

tar: *.ko: Cannot stat: No such file or directory
tar: Exiting with failure status due to previous errors

# systemctl --version
systemd 231

@therc
Copy link
Contributor

therc commented Mar 6, 2017

1298.5.0 is never going to work with 367.57.

Newer versions of the driver (375, etc.) support Linux 4.9.9 (which is what the latest CoreOS uses), but 367.57 dates back to last October and does not support it (get_user_pages() has a different signature).

Why are you using it instead of a more recent version? Is it because you have old GRID cards and Nvidia docs tell you to use v367, as that's the last version to support them? E.g. from http://us.download.nvidia.com/XFree86/Linux-x86_64/378.13/README/supportedchips.html

"Below are the legacy GPUs that are no longer supported in the unified driver. These GPUs will continue to be maintained through the special legacy NVIDIA GPU driver releases."

If that's the case, you need to ask Nvidia to make one of the special legacy releases they have promised. An example here:

https://devtalk.nvidia.com/default/topic/997603/linux/newer-367-driver-for-grid-k520-/

@dashesy
Copy link

dashesy commented Mar 6, 2017

Yes, also looked at the logs and it was "error: too many arguments to function 'get_user_pages'"

No particular reason! Will try v375 now, thanks.

@dashesy
Copy link

dashesy commented Mar 6, 2017

$ ./build.sh 375.20 stable 1298.5.0

succeeded. Thanks.

@dashesy
Copy link

dashesy commented Apr 2, 2017

I had another problem when I used --keep and ran ./build twice emerge complained about not enough space and did not build the modules. With a fresh run it worked. I did not try to reproduce it.

Otherwise this issue is resolved.

@ckome
Copy link

ckome commented Jul 17, 2018

apt install systemd-container

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

5 participants