-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
{2023.06}[foss/2023a] cuDNN 8.9.2.26 w/ CUDA 12.1.1 (part 1) #772
{2023.06}[foss/2023a] cuDNN 8.9.2.26 w/ CUDA 12.1.1 (part 1) #772
Conversation
…-layer into 2023.06-software.eessi.io-cuDNN-8.9.2.26-part-1
Instance
|
Instance
|
Instance
|
bot: build repo:eessi.io-2023.06-software arch:x86_64/amd/zen2 accel:nvidia/cc80 |
Updates by the bot instance
|
Updates by the bot instance
|
Updates by the bot instance
|
New job on instance
|
New job on instance
|
Use post sanity-check hook... bot: build repo:eessi.io-2023.06-software arch:x86_64/amd/zen2 accel:nvidia/cc80 |
Updates by the bot instance
|
Updates by the bot instance
|
Updates by the bot instance
|
New job on instance
|
New job on instance
|
Rebuilding after having reverted back to using a post postproc hook... bot: build repo:eessi.io-2023.06-software arch:x86_64/amd/zen2 accel:nvidia/cc80 |
Updates by the bot instance
|
Updates by the bot instance
|
Updates by the bot instance
|
New job on instance
|
New job on instance
|
…-layer into 2023.06-software.eessi.io-cuDNN-8.9.2.26-part-1
Rebuilding after pulling in changes from EESSI/software-layer... bot: build repo:eessi.io-2023.06-software arch:x86_64/amd/zen2 accel:nvidia/cc80 |
New job on instance
|
New job on instance
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me.
- It correctly installs the full CUDA and cuDNN in
host_injections
withinstall_cuda_and_libraries.sh
. - It also skips those installations if they are already present
- It correctly strips symlinks anything that is not redistributable (i.e. anything not
.so
)
To prove the latter, this was in the tarball for build job 23586
[casparvl@login1 23586]$ ls -al 2023.06/software/linux/x86_64/amd/zen3/accel/nvidia/cc80/software/cuDNN/8.9.2.26-CUDA-12.1.1/include/
total 4
dr-xr-sr-x. 2 casparvl def-users 4096 Oct 16 08:50 .
dr-xr-sr-x. 5 casparvl def-users 101 Oct 16 08:50 ..
lrwxrwxrwx. 1 casparvl def-users 141 Oct 16 08:50 cudnn_adv_infer.h -> /cvmfs/software.eessi.io/host_injections/2023.06/software/linux/x86_64/amd/zen3/software/cuDNN/8.9.2.26-CUDA-12.1.1/include/cudnn_adv_infer.h
lrwxrwxrwx. 1 casparvl def-users 144 Oct 16 08:50 cudnn_adv_infer_v8.h -> /cvmfs/software.eessi.io/host_injections/2023.06/software/linux/x86_64/amd/zen3/software/cuDNN/8.9.2.26-CUDA-12.1.1/include/cudnn_adv_infer_v8.h
lrwxrwxrwx. 1 casparvl def-users 141 Oct 16 08:50 cudnn_adv_train.h -> /cvmfs/software.eessi.io/host_injections/2023.06/software/linux/x86_64/amd/zen3/software/cuDNN/8.9.2.26-CUDA-12.1.1/include/cudnn_adv_train.h
lrwxrwxrwx. 1 casparvl def-users 144 Oct 16 08:50 cudnn_adv_train_v8.h -> /cvmfs/software.eessi.io/host_injections/2023.06/software/linux/x86_64/amd/zen3/software/cuDNN/8.9.2.26-CUDA-12.1.1/include/cudnn_adv_train_v8.h
lrwxrwxrwx. 1 casparvl def-users 139 Oct 16 08:50 cudnn_backend.h -> /cvmfs/software.eessi.io/host_injections/2023.06/software/linux/x86_64/amd/zen3/software/cuDNN/8.9.2.26-CUDA-12.1.1/include/cudnn_backend.h
lrwxrwxrwx. 1 casparvl def-users 142 Oct 16 08:50 cudnn_backend_v8.h -> /cvmfs/software.eessi.io/host_injections/2023.06/software/linux/x86_64/amd/zen3/software/cuDNN/8.9.2.26-CUDA-12.1.1/include/cudnn_backend_v8.h
lrwxrwxrwx. 1 casparvl def-users 141 Oct 16 08:50 cudnn_cnn_infer.h -> /cvmfs/software.eessi.io/host_injections/2023.06/software/linux/x86_64/amd/zen3/software/cuDNN/8.9.2.26-CUDA-12.1.1/include/cudnn_cnn_infer.h
lrwxrwxrwx. 1 casparvl def-users 144 Oct 16 08:50 cudnn_cnn_infer_v8.h -> /cvmfs/software.eessi.io/host_injections/2023.06/software/linux/x86_64/amd/zen3/software/cuDNN/8.9.2.26-CUDA-12.1.1/include/cudnn_cnn_infer_v8.h
lrwxrwxrwx. 1 casparvl def-users 141 Oct 16 08:50 cudnn_cnn_train.h -> /cvmfs/software.eessi.io/host_injections/2023.06/software/linux/x86_64/amd/zen3/software/cuDNN/8.9.2.26-CUDA-12.1.1/include/cudnn_cnn_train.h
lrwxrwxrwx. 1 casparvl def-users 144 Oct 16 08:50 cudnn_cnn_train_v8.h -> /cvmfs/software.eessi.io/host_injections/2023.06/software/linux/x86_64/amd/zen3/software/cuDNN/8.9.2.26-CUDA-12.1.1/include/cudnn_cnn_train_v8.h
lrwxrwxrwx. 1 casparvl def-users 131 Oct 16 08:50 cudnn.h -> /cvmfs/software.eessi.io/host_injections/2023.06/software/linux/x86_64/amd/zen3/software/cuDNN/8.9.2.26-CUDA-12.1.1/include/cudnn.h
lrwxrwxrwx. 1 casparvl def-users 141 Oct 16 08:50 cudnn_ops_infer.h -> /cvmfs/software.eessi.io/host_injections/2023.06/software/linux/x86_64/amd/zen3/software/cuDNN/8.9.2.26-CUDA-12.1.1/include/cudnn_ops_infer.h
lrwxrwxrwx. 1 casparvl def-users 144 Oct 16 08:50 cudnn_ops_infer_v8.h -> /cvmfs/software.eessi.io/host_injections/2023.06/software/linux/x86_64/amd/zen3/software/cuDNN/8.9.2.26-CUDA-12.1.1/include/cudnn_ops_infer_v8.h
lrwxrwxrwx. 1 casparvl def-users 141 Oct 16 08:50 cudnn_ops_train.h -> /cvmfs/software.eessi.io/host_injections/2023.06/software/linux/x86_64/amd/zen3/software/cuDNN/8.9.2.26-CUDA-12.1.1/include/cudnn_ops_train.h
lrwxrwxrwx. 1 casparvl def-users 144 Oct 16 08:50 cudnn_ops_train_v8.h -> /cvmfs/software.eessi.io/host_injections/2023.06/software/linux/x86_64/amd/zen3/software/cuDNN/8.9.2.26-CUDA-12.1.1/include/cudnn_ops_train_v8.h
lrwxrwxrwx. 1 casparvl def-users 134 Oct 16 08:50 cudnn_v8.h -> /cvmfs/software.eessi.io/host_injections/2023.06/software/linux/x86_64/amd/zen3/software/cuDNN/8.9.2.26-CUDA-12.1.1/include/cudnn_v8.h
lrwxrwxrwx. 1 casparvl def-users 139 Oct 16 08:50 cudnn_version.h -> /cvmfs/software.eessi.io/host_injections/2023.06/software/linux/x86_64/amd/zen3/software/cuDNN/8.9.2.26-CUDA-12.1.1/include/cudnn_version.h
lrwxrwxrwx. 1 casparvl def-users 142 Oct 16 08:50 cudnn_version_v8.h -> /cvmfs/software.eessi.io/host_injections/2023.06/software/linux/x86_64/amd/zen3/software/cuDNN/8.9.2.26-CUDA-12.1.1/include/cudnn_version_v8.h
[casparvl@login1 23586]$ ls -al 2023.06/software/linux/x86_64/amd/zen3/accel/nvidia/cc80/software/cuDNN/8.9.2.26-CUDA-12.1.1/lib
lib/ lib64/
[casparvl@login1 23586]$ ls -al 2023.06/software/linux/x86_64/amd/zen3/accel/nvidia/cc80/software/cuDNN/8.9.2.26-CUDA-12.1.1/lib/
total 1568152
dr-xr-sr-x. 2 casparvl def-users 4096 Oct 16 08:50 .
dr-xr-sr-x. 5 casparvl def-users 101 Oct 16 08:50 ..
lrwxrwxrwx. 1 casparvl def-users 23 May 31 2023 libcudnn_adv_infer.so -> libcudnn_adv_infer.so.8
lrwxrwxrwx. 1 casparvl def-users 27 May 31 2023 libcudnn_adv_infer.so.8 -> libcudnn_adv_infer.so.8.9.2
-r-xr-xr-x. 1 casparvl def-users 125081680 May 31 2023 libcudnn_adv_infer.so.8.9.2
lrwxrwxrwx. 1 casparvl def-users 147 Oct 16 08:50 libcudnn_adv_infer_static.a -> /cvmfs/software.eessi.io/host_injections/2023.06/software/linux/x86_64/amd/zen3/software/cuDNN/8.9.2.26-CUDA-12.1.1/lib/libcudnn_adv_infer_static.a
lrwxrwxrwx. 1 casparvl def-users 27 May 31 2023 libcudnn_adv_infer_static_v8.a -> libcudnn_adv_infer_static.a
lrwxrwxrwx. 1 casparvl def-users 23 May 31 2023 libcudnn_adv_train.so -> libcudnn_adv_train.so.8
lrwxrwxrwx. 1 casparvl def-users 27 May 31 2023 libcudnn_adv_train.so.8 -> libcudnn_adv_train.so.8.9.2
-r-xr-xr-x. 1 casparvl def-users 116204496 May 31 2023 libcudnn_adv_train.so.8.9.2
lrwxrwxrwx. 1 casparvl def-users 147 Oct 16 08:50 libcudnn_adv_train_static.a -> /cvmfs/software.eessi.io/host_injections/2023.06/software/linux/x86_64/amd/zen3/software/cuDNN/8.9.2.26-CUDA-12.1.1/lib/libcudnn_adv_train_static.a
lrwxrwxrwx. 1 casparvl def-users 27 May 31 2023 libcudnn_adv_train_static_v8.a -> libcudnn_adv_train_static.a
lrwxrwxrwx. 1 casparvl def-users 23 May 31 2023 libcudnn_cnn_infer.so -> libcudnn_cnn_infer.so.8
lrwxrwxrwx. 1 casparvl def-users 27 May 31 2023 libcudnn_cnn_infer.so.8 -> libcudnn_cnn_infer.so.8.9.2
-r-xr-xr-x. 1 casparvl def-users 626039624 May 31 2023 libcudnn_cnn_infer.so.8.9.2
lrwxrwxrwx. 1 casparvl def-users 147 Oct 16 08:50 libcudnn_cnn_infer_static.a -> /cvmfs/software.eessi.io/host_injections/2023.06/software/linux/x86_64/amd/zen3/software/cuDNN/8.9.2.26-CUDA-12.1.1/lib/libcudnn_cnn_infer_static.a
lrwxrwxrwx. 1 casparvl def-users 27 May 31 2023 libcudnn_cnn_infer_static_v8.a -> libcudnn_cnn_infer_static.a
lrwxrwxrwx. 1 casparvl def-users 23 May 31 2023 libcudnn_cnn_train.so -> libcudnn_cnn_train.so.8
lrwxrwxrwx. 1 casparvl def-users 27 May 31 2023 libcudnn_cnn_train.so.8 -> libcudnn_cnn_train.so.8.9.2
-r-xr-xr-x. 1 casparvl def-users 132383440 May 31 2023 libcudnn_cnn_train.so.8.9.2
lrwxrwxrwx. 1 casparvl def-users 147 Oct 16 08:50 libcudnn_cnn_train_static.a -> /cvmfs/software.eessi.io/host_injections/2023.06/software/linux/x86_64/amd/zen3/software/cuDNN/8.9.2.26-CUDA-12.1.1/lib/libcudnn_cnn_train_static.a
lrwxrwxrwx. 1 casparvl def-users 27 May 31 2023 libcudnn_cnn_train_static_v8.a -> libcudnn_cnn_train_static.a
lrwxrwxrwx. 1 casparvl def-users 23 May 31 2023 libcudnn_ops_infer.so -> libcudnn_ops_infer.so.8
lrwxrwxrwx. 1 casparvl def-users 27 May 31 2023 libcudnn_ops_infer.so.8 -> libcudnn_ops_infer.so.8.9.2
-r-xr-xr-x. 1 casparvl def-users 90517920 May 31 2023 libcudnn_ops_infer.so.8.9.2
lrwxrwxrwx. 1 casparvl def-users 147 Oct 16 08:50 libcudnn_ops_infer_static.a -> /cvmfs/software.eessi.io/host_injections/2023.06/software/linux/x86_64/amd/zen3/software/cuDNN/8.9.2.26-CUDA-12.1.1/lib/libcudnn_ops_infer_static.a
lrwxrwxrwx. 1 casparvl def-users 27 May 31 2023 libcudnn_ops_infer_static_v8.a -> libcudnn_ops_infer_static.a
lrwxrwxrwx. 1 casparvl def-users 23 May 31 2023 libcudnn_ops_train.so -> libcudnn_ops_train.so.8
lrwxrwxrwx. 1 casparvl def-users 27 May 31 2023 libcudnn_ops_train.so.8 -> libcudnn_ops_train.so.8.9.2
-r-xr-xr-x. 1 casparvl def-users 70897912 May 31 2023 libcudnn_ops_train.so.8.9.2
lrwxrwxrwx. 1 casparvl def-users 147 Oct 16 08:50 libcudnn_ops_train_static.a -> /cvmfs/software.eessi.io/host_injections/2023.06/software/linux/x86_64/amd/zen3/software/cuDNN/8.9.2.26-CUDA-12.1.1/lib/libcudnn_ops_train_static.a
lrwxrwxrwx. 1 casparvl def-users 27 May 31 2023 libcudnn_ops_train_static_v8.a -> libcudnn_ops_train_static.a
lrwxrwxrwx. 1 casparvl def-users 13 May 31 2023 libcudnn.so -> libcudnn.so.8
lrwxrwxrwx. 1 casparvl def-users 17 May 31 2023 libcudnn.so.8 -> libcudnn.so.8.9.2
-r-xr-xr-x. 1 casparvl def-users 150200 May 31 2023 libcudnn.so.8.9.2
Co-authored-by: ocaisa <[email protected]>
EESSI-install-software.sh
Outdated
# $EESSI_SILENT - don't print any messages | ||
# $EESSI_BASIC_ENV - give a basic set of environment variables |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this doesn't agree with what's going on below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reverted it back to sourcing it silently. However, we need the full environment to be initialised at this stage or some needed environment variable is not set (particularly, EESSI_SITE_SOFTWARE_PATH
). Improved the comments. Will repeat tests (removing host-injections, building, ...).
See 77f3bc9
install_cuda_host_injections.sh link_nvidia_host_libraries.sh | ||
eessi-2023.06-cuda-and-libraries.yml | ||
install_cuda_and_libraries.sh | ||
install_cuda_host_injections.sh |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
follow-up on this via #789
scripts/gpu_support/nvidia/easystacks/eessi-2023.06-eb-4.9.4-2023a-CUDA-host-injections.yml
Outdated
Show resolved
Hide resolved
Co-authored-by: Kenneth Hoste <[email protected]>
EESSI_SILENT=1 EESSI_BASIC_ENV=1 source $TOPDIR/init/eessi_environment_variables | ||
# $EESSI_SILENT - don't print any messages if set (use 'unset EESSI_SILENT' to let script show messages) | ||
# $EESSI_BASIC_ENV - give a basic set of environment variables if set (use 'EESSI_BASIC_ENV=' to let script initialise a full environment) | ||
EESSI_SILENT=1 EESSI_BASIC_ENV= source $TOPDIR/init/eessi_environment_variables |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wait, now I'm confused. I think @boegel 's statement was more to adapt the comments (or remove them), wasn't it? Because EESSI_BASIC_ENV=
will set the EESSI_BASIC_ENV
, which will result in a failure because we were missing one of the environment variables then (EESSI_SOFTWARE_PATH
or something? I don't remember).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I was only pointing out that the comments didn't agree with the code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We check with -z $EESSI_BASIC_ENV
, which is false if $EESSI_BASIC_ENV
is defined but empty (or undefined):
bash-3.2$ EESSI_BASIC_ENV=; if [ ! -z $EESSI_BASIC_ENV ]; then echo "EESSI_BASIC_ENV is set: '$EESSI_BASIC_ENV'"; fi
bash-3.2$ EESSI_BASIC_ENV=1; if [ ! -z $EESSI_BASIC_ENV ]; then echo "EESSI_BASIC_ENV is set: '$EESSI_BASIC_ENV'"; fi
EESSI_BASIC_ENV is set: '1'
bash-3.2$ unset EESSI_BASIC_ENV; if [ ! -z $EESSI_BASIC_ENV ]; then echo "EESSI_BASIC_ENV is set: '$EESSI_BASIC_ENV'"; fi
bash-3.2$
Rebuilding after changes to bot: build repo:eessi.io-2023.06-software arch:x86_64/amd/zen2 accel:nvidia/cc80 |
Updates by the bot instance
|
Updates by the bot instance
|
Updates by the bot instance
|
New job on instance
|
New job on instance
|
) | ||
copy_files_by_list ${TOPDIR}/scripts/gpu_support/nvidia ${INSTALL_PREFIX}/scripts/gpu_support/nvidia "${nvidia_files[@]}" | ||
|
||
# Easystacks to be used to install software in host injections | ||
host_injections_easystacks=( | ||
eessi-2023.06-eb-4.9.4-2023a-CUDA-host-injections.yml |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happy to follow up on this in a future PR, but I wonder if we need a hardcoded list here, can't we use a glob here like eessi-*-CUDA-host-injections.yml
, so we don't need to remember to update this list whenever an additional easystack file is added under scripts/gpu_support/nvidia/easystacks
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
staging PRs merged |
Can't wait for part 2 😆 |
PR merged! Moved |
PR merged! Moved |
PR merged! Moved |
Edit by casparvl:
EESSI-extend
to useEESSI_SITE_SOFTWARE_PATH
#778 , since we need$EESSI_SITE_INSTALL_PATH
This adds cuDNN 8.9.2.26 to EESSI. It's based on #581, but only implements the parts of #581 that installs cuDNN in EESSI (replaces files that cannot be shipped with symlinks to
host_injections
and adjusts the module footer) and installing cuDNN underhost_injections
.Similarly to #581 it attempts to generalise some (parts) of the functions to avoid duplicate code.
After this PR, we have to implement the remaining part of #581, particularly
Sitepackage.lua
(viacreate_lmodsitepackage.py
)