-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
alternative method to rebuild & shipping full CUDA runtime #356
alternative method to rebuild & shipping full CUDA runtime #356
Conversation
- bot/build.sh - first runs EESSI-determine-rebuilds.sh to determine which software package directories have to be removed - it then processes the output and creates lower directories which are writable - finally it uses these lower directories as additional parameter when running EESSI-remove-software.sh
Instance
|
Instance
|
Instance
|
Instance
|
Just a test on a single architecture first bot: build inst:eX3-NESSI repo:nessi-2023.06-swl-deb10 arch:aarch64/generic |
Updates by the bot instance
|
Updates by the bot instance
|
Updates by the bot instance
|
Updates by the bot instance
|
New job on instance
|
Next test after fixing some lower dir logic bot: build inst:eX3-NESSI repo:nessi-2023.06-swl-deb10 arch:aarch64/generic |
Updates by the bot instance
|
Updates by the bot instance
|
Updates by the bot instance
|
Updates by the bot instance
|
New job on instance
|
Next test after creating full directory tree under lower dir bot: build inst:eX3-NESSI repo:nessi-2023.06-swl-deb10 arch:aarch64/generic |
Updates by the bot instance
|
Updates by the bot instance
|
Updates by the bot instance
|
Updates by the bot instance
|
Updates by the bot instance
|
Updates by the bot instance
|
Updates by the bot instance
|
New job on instance
|
Finally (using brute force)? bot: build inst:eX3-NESSI repo:nessi-2023.06-swl-deb10 arch:aarch64/generic |
Updates by the bot instance
|
Updates by the bot instance
|
Updates by the bot instance
|
Updates by the bot instance
|
New job on instance
|
Try all bot: build inst:Fram-NESSI repo:nessi-2023.06-swl-deb11 arch:x86_64/intel/broadwell |
Updates by the bot instance
|
Updates by the bot instance
|
Updates by the bot instance
|
Updates by the bot instance
|
New job on instance
|
New job on instance
|
New job on instance
|
New job on instance
|
New job on instance
|
Checklist before starting deployment (setting
|
Before we can ingest the tarballs, we need to manually remove the software packages and module files on the Stratum-0. The first step of that procedure is to test if the software is still available: cd /cvmfs/pilot.nessi.no/versions/2023.06/software/linux ; \
ls -l \
x86_64/{generic,amd/zen2,intel/[bsc]*}/{software,modules/all}/CUDA/12.1.1* \
aarch64/generic/{software,modules/all}/CUDA/12.1.1* ; \
cd - log
-rw-r--r--. 1 cvmfs cvmfs 2120 May 4 20:09 aarch64/generic/modules/all/CUDA/12.1.1.lua
-rw-rw-r--. 1 cvmfs cvmfs 2120 May 4 20:14 x86_64/amd/zen2/modules/all/CUDA/12.1.1.lua
-rw-rw-r--. 1 cvmfs cvmfs 2116 May 4 20:14 x86_64/generic/modules/all/CUDA/12.1.1.lua
-rw-rw-r--. 1 cvmfs cvmfs 2148 May 4 20:11 x86_64/intel/broadwell/modules/all/CUDA/12.1.1.lua
-rw-rw-r--. 1 cvmfs cvmfs 2168 May 4 20:10 x86_64/intel/skylake_avx512/modules/all/CUDA/12.1.1.lua
And particularly the missing runtime (only a symbolic link currently) cd /cvmfs/pilot.nessi.no/versions/2023.06/software/linux ; \
ls -l \
x86_64/{generic,amd/zen2,intel/[bs]*}/software/CUDA/12.1.1/lib/libcudart* \
aarch64/generic/software/CUDA/12.1.1/lib/libcudart* ; \
cd - log
lrwxrwxrwx. 1 cvmfs cvmfs 15 May 4 20:07 aarch64/generic/software/CUDA/12.1.1/lib/libcudart.so -> libcudart.so.12
lrwxrwxrwx. 1 cvmfs cvmfs 21 May 4 20:08 aarch64/generic/software/CUDA/12.1.1/lib/libcudart.so.12 -> libcudart.so.12.1.105
lrwxrwxrwx. 1 cvmfs cvmfs 142 May 4 20:09 aarch64/generic/software/CUDA/12.1.1/lib/libcudart.so.12.1.105 -> /cvmfs/pilot.nessi.no/host_injections/2023.06/software/linux/aarch64/generic/software/CUDA/12.1.1/targets/sbsa-linux/lib/libcudart.so.12.1.105
-r--r--r--. 1 cvmfs cvmfs 1068368 May 4 20:07 aarch64/generic/software/CUDA/12.1.1/lib/libcudart_static.a
lrwxrwxrwx. 1 cvmfs cvmfs 15 May 4 20:13 x86_64/amd/zen2/software/CUDA/12.1.1/lib/libcudart.so -> libcudart.so.12
lrwxrwxrwx. 1 cvmfs cvmfs 21 May 4 20:13 x86_64/amd/zen2/software/CUDA/12.1.1/lib/libcudart.so.12 -> libcudart.so.12.1.105
lrwxrwxrwx. 1 cvmfs cvmfs 144 May 4 20:14 x86_64/amd/zen2/software/CUDA/12.1.1/lib/libcudart.so.12.1.105 -> /cvmfs/pilot.nessi.no/host_injections/2023.06/software/linux/x86_64/amd/zen2/software/CUDA/12.1.1/targets/x86_64-linux/lib/libcudart.so.12.1.105
-r--r--r--. 1 cvmfs cvmfs 1176672 May 4 20:13 x86_64/amd/zen2/software/CUDA/12.1.1/lib/libcudart_static.a
lrwxrwxrwx. 1 cvmfs cvmfs 15 May 4 20:12 x86_64/generic/software/CUDA/12.1.1/lib/libcudart.so -> libcudart.so.12
lrwxrwxrwx. 1 cvmfs cvmfs 21 May 4 20:13 x86_64/generic/software/CUDA/12.1.1/lib/libcudart.so.12 -> libcudart.so.12.1.105
lrwxrwxrwx. 1 cvmfs cvmfs 143 May 4 20:14 x86_64/generic/software/CUDA/12.1.1/lib/libcudart.so.12.1.105 -> /cvmfs/pilot.nessi.no/host_injections/2023.06/software/linux/x86_64/generic/software/CUDA/12.1.1/targets/x86_64-linux/lib/libcudart.so.12.1.105
-r--r--r--. 1 cvmfs cvmfs 1176672 May 4 20:12 x86_64/generic/software/CUDA/12.1.1/lib/libcudart_static.a
lrwxrwxrwx. 1 cvmfs cvmfs 15 May 4 20:08 x86_64/intel/broadwell/software/CUDA/12.1.1/lib/libcudart.so -> libcudart.so.12
lrwxrwxrwx. 1 cvmfs cvmfs 21 May 4 20:09 x86_64/intel/broadwell/software/CUDA/12.1.1/lib/libcudart.so.12 -> libcudart.so.12.1.105
lrwxrwxrwx. 1 cvmfs cvmfs 151 May 4 20:11 x86_64/intel/broadwell/software/CUDA/12.1.1/lib/libcudart.so.12.1.105 -> /cvmfs/pilot.nessi.no/host_injections/2023.06/software/linux/x86_64/intel/broadwell/software/CUDA/12.1.1/targets/x86_64-linux/lib/libcudart.so.12.1.105
-r--r--r--. 1 cvmfs cvmfs 1176672 May 4 20:08 x86_64/intel/broadwell/software/CUDA/12.1.1/lib/libcudart_static.a
lrwxrwxrwx. 1 cvmfs cvmfs 15 May 4 20:09 x86_64/intel/skylake_avx512/software/CUDA/12.1.1/lib/libcudart.so -> libcudart.so.12
lrwxrwxrwx. 1 cvmfs cvmfs 21 May 4 20:09 x86_64/intel/skylake_avx512/software/CUDA/12.1.1/lib/libcudart.so.12 -> libcudart.so.12.1.105
lrwxrwxrwx. 1 cvmfs cvmfs 156 May 4 20:10 x86_64/intel/skylake_avx512/software/CUDA/12.1.1/lib/libcudart.so.12.1.105 -> /cvmfs/pilot.nessi.no/host_injections/2023.06/software/linux/x86_64/intel/skylake_avx512/software/CUDA/12.1.1/targets/x86_64-linux/lib/libcudart.so.12.1.105
-r--r--r--. 1 cvmfs cvmfs 1176672 May 4 20:09 x86_64/intel/skylake_avx512/software/CUDA/12.1.1/lib/libcudart_static.a
Next we can remove it in an transaction cd /cvmfs/pilot.nessi.no/versions/2023.06/software/linux ; \
rm -rf \
x86_64/{generic,amd/zen2,intel/[bsc]*}/{software,modules/all}/CUDA/12.1.1* \
aarch64/generic/{software,modules/all}/CUDA/12.1.1* ; \
cd - |
Target architectures
Checklist for deployment/ingestion
command & logcommand (particularly checking for previously missing
log
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good
This PR serves two purposes:
lowerdir
option of fuse-overlayfs to make certain directories writable (those directories contain software that shall be removed)