Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

{2023.06}[foss/2023a] cuDNN 8.9.2.26 w/ CUDA 12.1.1 (part 1) #772

Conversation

trz42
Copy link
Collaborator

@trz42 trz42 commented Oct 2, 2024

Edit by casparvl:

This adds cuDNN 8.9.2.26 to EESSI. It's based on #581, but only implements the parts of #581 that installs cuDNN in EESSI (replaces files that cannot be shipped with symlinks to host_injections and adjusts the module footer) and installing cuDNN under host_injections.

Similarly to #581 it attempts to generalise some (parts) of the functions to avoid duplicate code.

After this PR, we have to implement the remaining part of #581, particularly

  • updated Lmod hooks in Sitepackage.lua (via create_lmodsitepackage.py)

@trz42 trz42 added 2023.06-software.eessi.io 2023.06 version of software.eessi.io accel:nvidia labels Oct 2, 2024
Copy link

eessi-bot bot commented Oct 2, 2024

Instance eessi-bot-mc-aws is configured to build for:

  • architectures: x86_64/generic, x86_64/intel/haswell, x86_64/intel/skylake_avx512, x86_64/amd/zen2, x86_64/amd/zen3, aarch64/generic, aarch64/neoverse_n1, aarch64/neoverse_v1
  • repositories: eessi.io-2023.06-compat, eessi-hpc.org-2023.06-software, eessi-hpc.org-2023.06-compat, eessi.io-2023.06-software

Instance boegel-bot-deucalion is configured to build for:

  • architectures: aarch64/a64fx
  • repositories: eessi.io-2023.06-software

Copy link

eessi-bot bot commented Oct 2, 2024

Instance eessi-bot-mc-azure is configured to build for:

  • architectures: x86_64/amd/zen4
  • repositories: eessi-hpc.org-2023.06-software, eessi-hpc.org-2023.06-compat, eessi.io-2023.06-software, eessi.io-2023.06-compat

@trz42
Copy link
Collaborator Author

trz42 commented Oct 2, 2024

bot: build repo:eessi.io-2023.06-software arch:x86_64/amd/zen2 accel:nvidia/cc80
bot: build repo:eessi.io-2023.06-software arch:x86_64/amd/zen3 accel:nvidia/cc80

Updates by the bot instance boegel-bot-deucalion (click for details)
  • account trz42 has NO permission to send commands to the bot

Copy link

eessi-bot bot commented Oct 2, 2024

Updates by the bot instance eessi-bot-mc-aws (click for details)
  • received bot command build repo:eessi.io-2023.06-software arch:x86_64/amd/zen2 accel:nvidia/cc80 from trz42

    • expanded format: build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen2 accelerator:nvidia/cc80
  • received bot command build repo:eessi.io-2023.06-software arch:x86_64/amd/zen3 accel:nvidia/cc80 from trz42

    • expanded format: build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen3 accelerator:nvidia/cc80
  • handling command build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen2 accelerator:nvidia/cc80 resulted in:

  • handling command build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen3 accelerator:nvidia/cc80 resulted in:

Copy link

eessi-bot bot commented Oct 2, 2024

Updates by the bot instance eessi-bot-mc-azure (click for details)
  • received bot command build repo:eessi.io-2023.06-software arch:x86_64/amd/zen2 accel:nvidia/cc80 from trz42

    • expanded format: build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen2 accelerator:nvidia/cc80
  • received bot command build repo:eessi.io-2023.06-software arch:x86_64/amd/zen3 accel:nvidia/cc80 from trz42

    • expanded format: build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen3 accelerator:nvidia/cc80
  • handling command build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen2 accelerator:nvidia/cc80 resulted in:

    • no jobs were submitted
  • handling command build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen3 accelerator:nvidia/cc80 resulted in:

    • no jobs were submitted

Copy link

eessi-bot bot commented Oct 2, 2024

New job on instance eessi-bot-mc-aws for CPU micro-architecture x86_64-amd-zen2 and accelerator nvidia/cc80 for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2024.10/pr_772/21002

date job status comment
Oct 02 20:50:22 UTC 2024 submitted job id 21002 awaits release by job manager
Oct 02 20:50:34 UTC 2024 released job awaits launch by Slurm scheduler
Oct 02 20:55:39 UTC 2024 running job 21002 is running
Oct 02 21:21:32 UTC 2024 finished
😁 SUCCESS (click triangle for details)
Details
✅ job output file slurm-21002.out
✅ no message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-x86_64-amd-zen2-1727902984.tar.gzsize: 698 MiB (732485973 bytes)
entries: 72
modules under 2023.06/software/linux/x86_64/amd/zen2/accel/nvidia/cc80/modules/all
cuDNN/8.9.2.26-CUDA-12.1.1.lua
software under 2023.06/software/linux/x86_64/amd/zen2/accel/nvidia/cc80/software
cuDNN/8.9.2.26-CUDA-12.1.1
other under 2023.06/software/linux/x86_64/amd/zen2/accel/nvidia/cc80
2023.06/init/easybuild/eb_hooks.py
Oct 02 21:21:32 UTC 2024 test result
😁 SUCCESS (click triangle for details)
ReFrame Summary
[ PASSED ] Ran 10/10 test case
Details
✅ job output file slurm-21002.out
✅ no message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

Copy link

eessi-bot bot commented Oct 2, 2024

New job on instance eessi-bot-mc-aws for CPU micro-architecture x86_64-amd-zen3 and accelerator nvidia/cc80 for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2024.10/pr_772/21003

date job status comment
Oct 02 20:50:26 UTC 2024 submitted job id 21003 awaits release by job manager
Oct 02 20:50:36 UTC 2024 released job awaits launch by Slurm scheduler
Oct 02 20:55:40 UTC 2024 running job 21003 is running
Oct 02 21:13:23 UTC 2024 finished
😢 FAILURE (click triangle for details)
Details
✅ job output file slurm-21003.out
❌ found message matching ERROR:
❌ found message matching FAILED:
❌ found message matching required modules missing:
❌ no message matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-x86_64-amd-zen3-1727902836.tar.gzsize: 0 MiB (12045 bytes)
entries: 1
modules under 2023.06/software/linux/x86_64/amd/zen3/accel/nvidia/cc80/modules/all
no module files in tarball
software under 2023.06/software/linux/x86_64/amd/zen3/accel/nvidia/cc80/software
no software packages in tarball
other under 2023.06/software/linux/x86_64/amd/zen3/accel/nvidia/cc80
2023.06/init/easybuild/eb_hooks.py
Oct 02 21:13:23 UTC 2024 test result
😁 SUCCESS (click triangle for details)
ReFrame Summary
[ PASSED ] Ran 10/10 test case
Details
✅ job output file slurm-21003.out
❌ found message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

@trz42
Copy link
Collaborator Author

trz42 commented Oct 2, 2024

Use post sanity-check hook...

bot: build repo:eessi.io-2023.06-software arch:x86_64/amd/zen2 accel:nvidia/cc80
bot: build repo:eessi.io-2023.06-software arch:x86_64/amd/zen3 accel:nvidia/cc80

Copy link

eessi-bot bot commented Oct 2, 2024

Updates by the bot instance eessi-bot-mc-aws (click for details)
  • received bot command build repo:eessi.io-2023.06-software arch:x86_64/amd/zen2 accel:nvidia/cc80 from trz42

    • expanded format: build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen2 accelerator:nvidia/cc80
  • received bot command build repo:eessi.io-2023.06-software arch:x86_64/amd/zen3 accel:nvidia/cc80 from trz42

    • expanded format: build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen3 accelerator:nvidia/cc80
  • handling command build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen2 accelerator:nvidia/cc80 resulted in:

  • handling command build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen3 accelerator:nvidia/cc80 resulted in:

Updates by the bot instance boegel-bot-deucalion (click for details)
  • account trz42 has NO permission to send commands to the bot

Copy link

eessi-bot bot commented Oct 2, 2024

Updates by the bot instance eessi-bot-mc-azure (click for details)
  • received bot command build repo:eessi.io-2023.06-software arch:x86_64/amd/zen2 accel:nvidia/cc80 from trz42

    • expanded format: build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen2 accelerator:nvidia/cc80
  • received bot command build repo:eessi.io-2023.06-software arch:x86_64/amd/zen3 accel:nvidia/cc80 from trz42

    • expanded format: build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen3 accelerator:nvidia/cc80
  • handling command build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen2 accelerator:nvidia/cc80 resulted in:

    • no jobs were submitted
  • handling command build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen3 accelerator:nvidia/cc80 resulted in:

    • no jobs were submitted

Copy link

eessi-bot bot commented Oct 2, 2024

New job on instance eessi-bot-mc-aws for CPU micro-architecture x86_64-amd-zen2 and accelerator nvidia/cc80 for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2024.10/pr_772/21006

date job status comment
Oct 02 21:28:25 UTC 2024 submitted job id 21006 awaits release by job manager
Oct 02 21:28:37 UTC 2024 released job awaits launch by Slurm scheduler
Oct 02 21:29:43 UTC 2024 running job 21006 is running
Oct 02 21:54:35 UTC 2024 finished
😁 SUCCESS (click triangle for details)
Details
✅ job output file slurm-21006.out
✅ no message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-x86_64-amd-zen2-1727904975.tar.gzsize: 698 MiB (732487105 bytes)
entries: 72
modules under 2023.06/software/linux/x86_64/amd/zen2/accel/nvidia/cc80/modules/all
cuDNN/8.9.2.26-CUDA-12.1.1.lua
software under 2023.06/software/linux/x86_64/amd/zen2/accel/nvidia/cc80/software
cuDNN/8.9.2.26-CUDA-12.1.1
other under 2023.06/software/linux/x86_64/amd/zen2/accel/nvidia/cc80
2023.06/init/easybuild/eb_hooks.py
Oct 02 21:54:35 UTC 2024 test result
😁 SUCCESS (click triangle for details)
ReFrame Summary
[ PASSED ] Ran 10/10 test case
Details
✅ job output file slurm-21006.out
✅ no message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

Copy link

eessi-bot bot commented Oct 2, 2024

New job on instance eessi-bot-mc-aws for CPU micro-architecture x86_64-amd-zen3 and accelerator nvidia/cc80 for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2024.10/pr_772/21007

date job status comment
Oct 02 21:28:29 UTC 2024 submitted job id 21007 awaits release by job manager
Oct 02 21:28:39 UTC 2024 released job awaits launch by Slurm scheduler
Oct 02 21:29:45 UTC 2024 running job 21007 is running
Oct 02 21:50:30 UTC 2024 finished
😁 SUCCESS (click triangle for details)
Details
✅ job output file slurm-21007.out
✅ no message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-x86_64-amd-zen3-1727904886.tar.gzsize: 698 MiB (732488652 bytes)
entries: 72
modules under 2023.06/software/linux/x86_64/amd/zen3/accel/nvidia/cc80/modules/all
cuDNN/8.9.2.26-CUDA-12.1.1.lua
software under 2023.06/software/linux/x86_64/amd/zen3/accel/nvidia/cc80/software
cuDNN/8.9.2.26-CUDA-12.1.1
other under 2023.06/software/linux/x86_64/amd/zen3/accel/nvidia/cc80
2023.06/init/easybuild/eb_hooks.py
Oct 02 21:50:30 UTC 2024 test result
😁 SUCCESS (click triangle for details)
ReFrame Summary
[ PASSED ] Ran 10/10 test case
Details
✅ job output file slurm-21007.out
✅ no message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

@trz42
Copy link
Collaborator Author

trz42 commented Oct 3, 2024

Rebuilding after having reverted back to using a post postproc hook...

bot: build repo:eessi.io-2023.06-software arch:x86_64/amd/zen2 accel:nvidia/cc80
bot: build repo:eessi.io-2023.06-software arch:x86_64/amd/zen3 accel:nvidia/cc80

Copy link

eessi-bot bot commented Oct 3, 2024

Updates by the bot instance eessi-bot-mc-aws (click for details)
  • received bot command build repo:eessi.io-2023.06-software arch:x86_64/amd/zen2 accel:nvidia/cc80 from trz42

    • expanded format: build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen2 accelerator:nvidia/cc80
  • received bot command build repo:eessi.io-2023.06-software arch:x86_64/amd/zen3 accel:nvidia/cc80 from trz42

    • expanded format: build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen3 accelerator:nvidia/cc80
  • handling command build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen2 accelerator:nvidia/cc80 resulted in:

  • handling command build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen3 accelerator:nvidia/cc80 resulted in:

Updates by the bot instance boegel-bot-deucalion (click for details)
  • account trz42 has NO permission to send commands to the bot

Copy link

eessi-bot bot commented Oct 3, 2024

Updates by the bot instance eessi-bot-mc-azure (click for details)
  • received bot command build repo:eessi.io-2023.06-software arch:x86_64/amd/zen2 accel:nvidia/cc80 from trz42

    • expanded format: build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen2 accelerator:nvidia/cc80
  • received bot command build repo:eessi.io-2023.06-software arch:x86_64/amd/zen3 accel:nvidia/cc80 from trz42

    • expanded format: build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen3 accelerator:nvidia/cc80
  • handling command build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen2 accelerator:nvidia/cc80 resulted in:

    • no jobs were submitted
  • handling command build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen3 accelerator:nvidia/cc80 resulted in:

    • no jobs were submitted

Copy link

eessi-bot bot commented Oct 3, 2024

New job on instance eessi-bot-mc-aws for CPU micro-architecture x86_64-amd-zen2 and accelerator nvidia/cc80 for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2024.10/pr_772/21169

date job status comment
Oct 03 13:02:46 UTC 2024 submitted job id 21169 awaits release by job manager
Oct 03 13:03:40 UTC 2024 released job awaits launch by Slurm scheduler
Oct 03 13:08:51 UTC 2024 running job 21169 is running
Oct 03 13:39:23 UTC 2024 finished
😁 SUCCESS (click triangle for details)
Details
✅ job output file slurm-21169.out
✅ no message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-x86_64-amd-zen2-1727961670.tar.gzsize: 698 MiB (732484453 bytes)
entries: 76
modules under 2023.06/software/linux/x86_64/amd/zen2/accel/nvidia/cc80/modules/all
cuDNN/8.9.2.26-CUDA-12.1.1.lua
software under 2023.06/software/linux/x86_64/amd/zen2/accel/nvidia/cc80/software
cuDNN/8.9.2.26-CUDA-12.1.1
other under 2023.06/software/linux/x86_64/amd/zen2/accel/nvidia/cc80
2023.06/init/easybuild/eb_hooks.py
2023.06/scripts/gpu_support/nvidia/eessi-2023.06-cuda-and-libraries.yml
2023.06/scripts/gpu_support/nvidia/install_cuda_and_libraries.sh
2023.06/software/linux/x86_64/amd/zen2/.lmod/lmodrc.lua
2023.06/software/linux/x86_64/amd/zen2/.lmod/SitePackage.lua
Oct 03 13:39:23 UTC 2024 test result
😁 SUCCESS (click triangle for details)
ReFrame Summary
[ PASSED ] Ran 10/10 test case
Details
✅ job output file slurm-21169.out
✅ no message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

Copy link

eessi-bot bot commented Oct 3, 2024

New job on instance eessi-bot-mc-aws for CPU micro-architecture x86_64-amd-zen3 and accelerator nvidia/cc80 for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2024.10/pr_772/21170

date job status comment
Oct 03 13:02:50 UTC 2024 submitted job id 21170 awaits release by job manager
Oct 03 13:03:42 UTC 2024 released job awaits launch by Slurm scheduler
Oct 03 13:08:53 UTC 2024 running job 21170 is running
Oct 03 13:34:50 UTC 2024 finished
😁 SUCCESS (click triangle for details)
Details
✅ job output file slurm-21170.out
✅ no message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-x86_64-amd-zen3-1727961581.tar.gzsize: 698 MiB (732490113 bytes)
entries: 76
modules under 2023.06/software/linux/x86_64/amd/zen3/accel/nvidia/cc80/modules/all
cuDNN/8.9.2.26-CUDA-12.1.1.lua
software under 2023.06/software/linux/x86_64/amd/zen3/accel/nvidia/cc80/software
cuDNN/8.9.2.26-CUDA-12.1.1
other under 2023.06/software/linux/x86_64/amd/zen3/accel/nvidia/cc80
2023.06/init/easybuild/eb_hooks.py
2023.06/scripts/gpu_support/nvidia/eessi-2023.06-cuda-and-libraries.yml
2023.06/scripts/gpu_support/nvidia/install_cuda_and_libraries.sh
2023.06/software/linux/x86_64/amd/zen3/.lmod/lmodrc.lua
2023.06/software/linux/x86_64/amd/zen3/.lmod/SitePackage.lua
Oct 03 13:34:50 UTC 2024 test result
😁 SUCCESS (click triangle for details)
ReFrame Summary
[ PASSED ] Ran 10/10 test case
Details
✅ job output file slurm-21170.out
✅ no message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

…-layer into 2023.06-software.eessi.io-cuDNN-8.9.2.26-part-1
@trz42
Copy link
Collaborator Author

trz42 commented Oct 3, 2024

Rebuilding after pulling in changes from EESSI/software-layer...

bot: build repo:eessi.io-2023.06-software arch:x86_64/amd/zen2 accel:nvidia/cc80
bot: build repo:eessi.io-2023.06-software arch:x86_64/amd/zen3 accel:nvidia/cc80

Copy link

eessi-bot bot commented Oct 16, 2024

New job on instance eessi-bot-mc-aws for CPU micro-architecture x86_64-amd-zen2 and accelerator nvidia/cc80 for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2024.10/pr_772/23585

date job status comment
Oct 16 08:43:17 UTC 2024 submitted job id 23585 awaits release by job manager
Oct 16 08:43:58 UTC 2024 released job awaits launch by Slurm scheduler
Oct 16 08:49:09 UTC 2024 running job 23585 is running
Oct 16 09:15:18 UTC 2024 finished
😁 SUCCESS (click triangle for details)
Details
✅ job output file slurm-23585.out
✅ no message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-x86_64-amd-zen2-1729069029.tar.gzsize: 698 MiB (732487880 bytes)
entries: 75
modules under 2023.06/software/linux/x86_64/amd/zen2/accel/nvidia/cc80/modules/all
cuDNN/8.9.2.26-CUDA-12.1.1.lua
software under 2023.06/software/linux/x86_64/amd/zen2/accel/nvidia/cc80/software
cuDNN/8.9.2.26-CUDA-12.1.1
other under 2023.06/software/linux/x86_64/amd/zen2/accel/nvidia/cc80
2023.06/init/easybuild/eb_hooks.py
2023.06/init/eessi_environment_variables
2023.06/scripts/gpu_support/nvidia/easystacks/eessi-2023.06-eb-4.9.4-2023a-CUDA-host-injections.yml
2023.06/scripts/gpu_support/nvidia/install_cuda_and_libraries.sh
Oct 16 09:15:18 UTC 2024 test result
😁 SUCCESS (click triangle for details)
ReFrame Summary
[ OK ] ( 1/10) EESSI_LAMMPS_lj %scale=1_node %device_type=cpu %module_name=LAMMPS/29Aug2024-foss-2023b-kokkos /aeb2d9df @BotBuildTests:x86-64-amd-zen2-node+default
P: perf: 435.749 timesteps/s (r:0, l:None, u:None)
[ OK ] ( 2/10) EESSI_LAMMPS_lj %scale=1_node %device_type=cpu %module_name=LAMMPS/2Aug2023_update2-foss-2023a-kokkos /04ff9ece @BotBuildTests:x86-64-amd-zen2-node+default
P: perf: 443.858 timesteps/s (r:0, l:None, u:None)
[ OK ] ( 3/10) EESSI_OSU_Micro_Benchmarks_coll %benchmark_info=mpi.collective.osu_allreduce %scale=1_node %module_name=OSU-Micro-Benchmarks/7.2-gompi-2023b %device_type=cpu /31ac6ab9 @BotBuildTests:x86-64-amd-zen2-node+default
P: latency: 4.78 us (r:0, l:None, u:None)
[ OK ] ( 4/10) EESSI_OSU_Micro_Benchmarks_coll %benchmark_info=mpi.collective.osu_allreduce %scale=1_node %module_name=OSU-Micro-Benchmarks/7.1-1-gompi-2023a %device_type=cpu /f3be40a2 @BotBuildTests:x86-64-amd-zen2-node+default
P: latency: 4.65 us (r:0, l:None, u:None)
[ OK ] ( 5/10) EESSI_OSU_Micro_Benchmarks_coll %benchmark_info=mpi.collective.osu_alltoall %scale=1_node %module_name=OSU-Micro-Benchmarks/7.2-gompi-2023b %device_type=cpu /10e66fba @BotBuildTests:x86-64-amd-zen2-node+default
P: latency: 9.03 us (r:0, l:None, u:None)
[ OK ] ( 6/10) EESSI_OSU_Micro_Benchmarks_coll %benchmark_info=mpi.collective.osu_alltoall %scale=1_node %module_name=OSU-Micro-Benchmarks/7.1-1-gompi-2023a %device_type=cpu /5be57ae7 @BotBuildTests:x86-64-amd-zen2-node+default
P: latency: 8.5 us (r:0, l:None, u:None)
[ OK ] ( 7/10) EESSI_OSU_Micro_Benchmarks_pt2pt %benchmark_info=mpi.pt2pt.osu_latency %scale=1_node %module_name=OSU-Micro-Benchmarks/7.2-gompi-2023b %device_type=cpu /c8c9aff5 @BotBuildTests:x86-64-amd-zen2-node+default
P: latency: 0.34 us (r:0, l:None, u:None)
[ OK ] ( 8/10) EESSI_OSU_Micro_Benchmarks_pt2pt %benchmark_info=mpi.pt2pt.osu_latency %scale=1_node %module_name=OSU-Micro-Benchmarks/7.1-1-gompi-2023a %device_type=cpu /9795e491 @BotBuildTests:x86-64-amd-zen2-node+default
P: latency: 0.31 us (r:0, l:None, u:None)
[ OK ] ( 9/10) EESSI_OSU_Micro_Benchmarks_pt2pt %benchmark_info=mpi.pt2pt.osu_bw %scale=1_node %module_name=OSU-Micro-Benchmarks/7.2-gompi-2023b %device_type=cpu /48da21c5 @BotBuildTests:x86-64-amd-zen2-node+default
P: bandwidth: 7822.49 MB/s (r:0, l:None, u:None)
[ OK ] (10/10) EESSI_OSU_Micro_Benchmarks_pt2pt %benchmark_info=mpi.pt2pt.osu_bw %scale=1_node %module_name=OSU-Micro-Benchmarks/7.1-1-gompi-2023a %device_type=cpu /1b8c1ca2 @BotBuildTests:x86-64-amd-zen2-node+default
P: bandwidth: 7805.14 MB/s (r:0, l:None, u:None)
[ PASSED ] Ran 10/10 test case(s) from 10 check(s) (0 failure(s), 0 skipped, 0 aborted)
Details
✅ job output file slurm-23585.out
✅ no message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

Copy link

eessi-bot bot commented Oct 16, 2024

New job on instance eessi-bot-mc-aws for CPU micro-architecture x86_64-amd-zen3 and accelerator nvidia/cc80 for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2024.10/pr_772/23586

date job status comment
Oct 16 08:43:22 UTC 2024 submitted job id 23586 awaits release by job manager
Oct 16 08:44:02 UTC 2024 released job awaits launch by Slurm scheduler
Oct 16 08:49:12 UTC 2024 running job 23586 is running
Oct 16 09:11:07 UTC 2024 finished
😁 SUCCESS (click triangle for details)
Details
✅ job output file slurm-23586.out
✅ no message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-x86_64-amd-zen3-1729068945.tar.gzsize: 698 MiB (732491129 bytes)
entries: 75
modules under 2023.06/software/linux/x86_64/amd/zen3/accel/nvidia/cc80/modules/all
cuDNN/8.9.2.26-CUDA-12.1.1.lua
software under 2023.06/software/linux/x86_64/amd/zen3/accel/nvidia/cc80/software
cuDNN/8.9.2.26-CUDA-12.1.1
other under 2023.06/software/linux/x86_64/amd/zen3/accel/nvidia/cc80
2023.06/init/easybuild/eb_hooks.py
2023.06/init/eessi_environment_variables
2023.06/scripts/gpu_support/nvidia/easystacks/eessi-2023.06-eb-4.9.4-2023a-CUDA-host-injections.yml
2023.06/scripts/gpu_support/nvidia/install_cuda_and_libraries.sh
Oct 16 09:11:07 UTC 2024 test result
😁 SUCCESS (click triangle for details)
ReFrame Summary
[ OK ] ( 1/10) EESSI_LAMMPS_lj %scale=1_node %device_type=cpu %module_name=LAMMPS/29Aug2024-foss-2023b-kokkos /aeb2d9df @BotBuildTests:x86-64-amd-zen3-node+default
P: perf: 526.319 timesteps/s (r:0, l:None, u:None)
[ OK ] ( 2/10) EESSI_LAMMPS_lj %scale=1_node %device_type=cpu %module_name=LAMMPS/2Aug2023_update2-foss-2023a-kokkos /04ff9ece @BotBuildTests:x86-64-amd-zen3-node+default
P: perf: 502.45 timesteps/s (r:0, l:None, u:None)
[ OK ] ( 3/10) EESSI_OSU_Micro_Benchmarks_coll %benchmark_info=mpi.collective.osu_allreduce %scale=1_node %module_name=OSU-Micro-Benchmarks/7.2-gompi-2023b %device_type=cpu /31ac6ab9 @BotBuildTests:x86-64-amd-zen3-node+default
P: latency: 2.43 us (r:0, l:None, u:None)
[ OK ] ( 4/10) EESSI_OSU_Micro_Benchmarks_coll %benchmark_info=mpi.collective.osu_allreduce %scale=1_node %module_name=OSU-Micro-Benchmarks/7.1-1-gompi-2023a %device_type=cpu /f3be40a2 @BotBuildTests:x86-64-amd-zen3-node+default
P: latency: 2.34 us (r:0, l:None, u:None)
[ OK ] ( 5/10) EESSI_OSU_Micro_Benchmarks_coll %benchmark_info=mpi.collective.osu_alltoall %scale=1_node %module_name=OSU-Micro-Benchmarks/7.2-gompi-2023b %device_type=cpu /10e66fba @BotBuildTests:x86-64-amd-zen3-node+default
P: latency: 5.56 us (r:0, l:None, u:None)
[ OK ] ( 6/10) EESSI_OSU_Micro_Benchmarks_coll %benchmark_info=mpi.collective.osu_alltoall %scale=1_node %module_name=OSU-Micro-Benchmarks/7.1-1-gompi-2023a %device_type=cpu /5be57ae7 @BotBuildTests:x86-64-amd-zen3-node+default
P: latency: 5.39 us (r:0, l:None, u:None)
[ OK ] ( 7/10) EESSI_OSU_Micro_Benchmarks_pt2pt %benchmark_info=mpi.pt2pt.osu_latency %scale=1_node %module_name=OSU-Micro-Benchmarks/7.2-gompi-2023b %device_type=cpu /c8c9aff5 @BotBuildTests:x86-64-amd-zen3-node+default
P: latency: 0.25 us (r:0, l:None, u:None)
[ OK ] ( 8/10) EESSI_OSU_Micro_Benchmarks_pt2pt %benchmark_info=mpi.pt2pt.osu_latency %scale=1_node %module_name=OSU-Micro-Benchmarks/7.1-1-gompi-2023a %device_type=cpu /9795e491 @BotBuildTests:x86-64-amd-zen3-node+default
P: latency: 0.22 us (r:0, l:None, u:None)
[ OK ] ( 9/10) EESSI_OSU_Micro_Benchmarks_pt2pt %benchmark_info=mpi.pt2pt.osu_bw %scale=1_node %module_name=OSU-Micro-Benchmarks/7.2-gompi-2023b %device_type=cpu /48da21c5 @BotBuildTests:x86-64-amd-zen3-node+default
P: bandwidth: 14325.79 MB/s (r:0, l:None, u:None)
[ OK ] (10/10) EESSI_OSU_Micro_Benchmarks_pt2pt %benchmark_info=mpi.pt2pt.osu_bw %scale=1_node %module_name=OSU-Micro-Benchmarks/7.1-1-gompi-2023a %device_type=cpu /1b8c1ca2 @BotBuildTests:x86-64-amd-zen3-node+default
P: bandwidth: 14318.9 MB/s (r:0, l:None, u:None)
[ PASSED ] Ran 10/10 test case(s) from 10 check(s) (0 failure(s), 0 skipped, 0 aborted)
Details
✅ job output file slurm-23586.out
✅ no message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

casparvl
casparvl previously approved these changes Oct 16, 2024
Copy link
Collaborator

@casparvl casparvl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me.

  • It correctly installs the full CUDA and cuDNN in host_injections with install_cuda_and_libraries.sh.
  • It also skips those installations if they are already present
  • It correctly strips symlinks anything that is not redistributable (i.e. anything not .so)

To prove the latter, this was in the tarball for build job 23586

[casparvl@login1 23586]$ ls -al 2023.06/software/linux/x86_64/amd/zen3/accel/nvidia/cc80/software/cuDNN/8.9.2.26-CUDA-12.1.1/include/
total 4
dr-xr-sr-x. 2 casparvl def-users 4096 Oct 16 08:50 .
dr-xr-sr-x. 5 casparvl def-users  101 Oct 16 08:50 ..
lrwxrwxrwx. 1 casparvl def-users  141 Oct 16 08:50 cudnn_adv_infer.h -> /cvmfs/software.eessi.io/host_injections/2023.06/software/linux/x86_64/amd/zen3/software/cuDNN/8.9.2.26-CUDA-12.1.1/include/cudnn_adv_infer.h
lrwxrwxrwx. 1 casparvl def-users  144 Oct 16 08:50 cudnn_adv_infer_v8.h -> /cvmfs/software.eessi.io/host_injections/2023.06/software/linux/x86_64/amd/zen3/software/cuDNN/8.9.2.26-CUDA-12.1.1/include/cudnn_adv_infer_v8.h
lrwxrwxrwx. 1 casparvl def-users  141 Oct 16 08:50 cudnn_adv_train.h -> /cvmfs/software.eessi.io/host_injections/2023.06/software/linux/x86_64/amd/zen3/software/cuDNN/8.9.2.26-CUDA-12.1.1/include/cudnn_adv_train.h
lrwxrwxrwx. 1 casparvl def-users  144 Oct 16 08:50 cudnn_adv_train_v8.h -> /cvmfs/software.eessi.io/host_injections/2023.06/software/linux/x86_64/amd/zen3/software/cuDNN/8.9.2.26-CUDA-12.1.1/include/cudnn_adv_train_v8.h
lrwxrwxrwx. 1 casparvl def-users  139 Oct 16 08:50 cudnn_backend.h -> /cvmfs/software.eessi.io/host_injections/2023.06/software/linux/x86_64/amd/zen3/software/cuDNN/8.9.2.26-CUDA-12.1.1/include/cudnn_backend.h
lrwxrwxrwx. 1 casparvl def-users  142 Oct 16 08:50 cudnn_backend_v8.h -> /cvmfs/software.eessi.io/host_injections/2023.06/software/linux/x86_64/amd/zen3/software/cuDNN/8.9.2.26-CUDA-12.1.1/include/cudnn_backend_v8.h
lrwxrwxrwx. 1 casparvl def-users  141 Oct 16 08:50 cudnn_cnn_infer.h -> /cvmfs/software.eessi.io/host_injections/2023.06/software/linux/x86_64/amd/zen3/software/cuDNN/8.9.2.26-CUDA-12.1.1/include/cudnn_cnn_infer.h
lrwxrwxrwx. 1 casparvl def-users  144 Oct 16 08:50 cudnn_cnn_infer_v8.h -> /cvmfs/software.eessi.io/host_injections/2023.06/software/linux/x86_64/amd/zen3/software/cuDNN/8.9.2.26-CUDA-12.1.1/include/cudnn_cnn_infer_v8.h
lrwxrwxrwx. 1 casparvl def-users  141 Oct 16 08:50 cudnn_cnn_train.h -> /cvmfs/software.eessi.io/host_injections/2023.06/software/linux/x86_64/amd/zen3/software/cuDNN/8.9.2.26-CUDA-12.1.1/include/cudnn_cnn_train.h
lrwxrwxrwx. 1 casparvl def-users  144 Oct 16 08:50 cudnn_cnn_train_v8.h -> /cvmfs/software.eessi.io/host_injections/2023.06/software/linux/x86_64/amd/zen3/software/cuDNN/8.9.2.26-CUDA-12.1.1/include/cudnn_cnn_train_v8.h
lrwxrwxrwx. 1 casparvl def-users  131 Oct 16 08:50 cudnn.h -> /cvmfs/software.eessi.io/host_injections/2023.06/software/linux/x86_64/amd/zen3/software/cuDNN/8.9.2.26-CUDA-12.1.1/include/cudnn.h
lrwxrwxrwx. 1 casparvl def-users  141 Oct 16 08:50 cudnn_ops_infer.h -> /cvmfs/software.eessi.io/host_injections/2023.06/software/linux/x86_64/amd/zen3/software/cuDNN/8.9.2.26-CUDA-12.1.1/include/cudnn_ops_infer.h
lrwxrwxrwx. 1 casparvl def-users  144 Oct 16 08:50 cudnn_ops_infer_v8.h -> /cvmfs/software.eessi.io/host_injections/2023.06/software/linux/x86_64/amd/zen3/software/cuDNN/8.9.2.26-CUDA-12.1.1/include/cudnn_ops_infer_v8.h
lrwxrwxrwx. 1 casparvl def-users  141 Oct 16 08:50 cudnn_ops_train.h -> /cvmfs/software.eessi.io/host_injections/2023.06/software/linux/x86_64/amd/zen3/software/cuDNN/8.9.2.26-CUDA-12.1.1/include/cudnn_ops_train.h
lrwxrwxrwx. 1 casparvl def-users  144 Oct 16 08:50 cudnn_ops_train_v8.h -> /cvmfs/software.eessi.io/host_injections/2023.06/software/linux/x86_64/amd/zen3/software/cuDNN/8.9.2.26-CUDA-12.1.1/include/cudnn_ops_train_v8.h
lrwxrwxrwx. 1 casparvl def-users  134 Oct 16 08:50 cudnn_v8.h -> /cvmfs/software.eessi.io/host_injections/2023.06/software/linux/x86_64/amd/zen3/software/cuDNN/8.9.2.26-CUDA-12.1.1/include/cudnn_v8.h
lrwxrwxrwx. 1 casparvl def-users  139 Oct 16 08:50 cudnn_version.h -> /cvmfs/software.eessi.io/host_injections/2023.06/software/linux/x86_64/amd/zen3/software/cuDNN/8.9.2.26-CUDA-12.1.1/include/cudnn_version.h
lrwxrwxrwx. 1 casparvl def-users  142 Oct 16 08:50 cudnn_version_v8.h -> /cvmfs/software.eessi.io/host_injections/2023.06/software/linux/x86_64/amd/zen3/software/cuDNN/8.9.2.26-CUDA-12.1.1/include/cudnn_version_v8.h
[casparvl@login1 23586]$ ls -al 2023.06/software/linux/x86_64/amd/zen3/accel/nvidia/cc80/software/cuDNN/8.9.2.26-CUDA-12.1.1/lib
lib/   lib64/
[casparvl@login1 23586]$ ls -al 2023.06/software/linux/x86_64/amd/zen3/accel/nvidia/cc80/software/cuDNN/8.9.2.26-CUDA-12.1.1/lib/
total 1568152
dr-xr-sr-x. 2 casparvl def-users      4096 Oct 16 08:50 .
dr-xr-sr-x. 5 casparvl def-users       101 Oct 16 08:50 ..
lrwxrwxrwx. 1 casparvl def-users        23 May 31  2023 libcudnn_adv_infer.so -> libcudnn_adv_infer.so.8
lrwxrwxrwx. 1 casparvl def-users        27 May 31  2023 libcudnn_adv_infer.so.8 -> libcudnn_adv_infer.so.8.9.2
-r-xr-xr-x. 1 casparvl def-users 125081680 May 31  2023 libcudnn_adv_infer.so.8.9.2
lrwxrwxrwx. 1 casparvl def-users       147 Oct 16 08:50 libcudnn_adv_infer_static.a -> /cvmfs/software.eessi.io/host_injections/2023.06/software/linux/x86_64/amd/zen3/software/cuDNN/8.9.2.26-CUDA-12.1.1/lib/libcudnn_adv_infer_static.a
lrwxrwxrwx. 1 casparvl def-users        27 May 31  2023 libcudnn_adv_infer_static_v8.a -> libcudnn_adv_infer_static.a
lrwxrwxrwx. 1 casparvl def-users        23 May 31  2023 libcudnn_adv_train.so -> libcudnn_adv_train.so.8
lrwxrwxrwx. 1 casparvl def-users        27 May 31  2023 libcudnn_adv_train.so.8 -> libcudnn_adv_train.so.8.9.2
-r-xr-xr-x. 1 casparvl def-users 116204496 May 31  2023 libcudnn_adv_train.so.8.9.2
lrwxrwxrwx. 1 casparvl def-users       147 Oct 16 08:50 libcudnn_adv_train_static.a -> /cvmfs/software.eessi.io/host_injections/2023.06/software/linux/x86_64/amd/zen3/software/cuDNN/8.9.2.26-CUDA-12.1.1/lib/libcudnn_adv_train_static.a
lrwxrwxrwx. 1 casparvl def-users        27 May 31  2023 libcudnn_adv_train_static_v8.a -> libcudnn_adv_train_static.a
lrwxrwxrwx. 1 casparvl def-users        23 May 31  2023 libcudnn_cnn_infer.so -> libcudnn_cnn_infer.so.8
lrwxrwxrwx. 1 casparvl def-users        27 May 31  2023 libcudnn_cnn_infer.so.8 -> libcudnn_cnn_infer.so.8.9.2
-r-xr-xr-x. 1 casparvl def-users 626039624 May 31  2023 libcudnn_cnn_infer.so.8.9.2
lrwxrwxrwx. 1 casparvl def-users       147 Oct 16 08:50 libcudnn_cnn_infer_static.a -> /cvmfs/software.eessi.io/host_injections/2023.06/software/linux/x86_64/amd/zen3/software/cuDNN/8.9.2.26-CUDA-12.1.1/lib/libcudnn_cnn_infer_static.a
lrwxrwxrwx. 1 casparvl def-users        27 May 31  2023 libcudnn_cnn_infer_static_v8.a -> libcudnn_cnn_infer_static.a
lrwxrwxrwx. 1 casparvl def-users        23 May 31  2023 libcudnn_cnn_train.so -> libcudnn_cnn_train.so.8
lrwxrwxrwx. 1 casparvl def-users        27 May 31  2023 libcudnn_cnn_train.so.8 -> libcudnn_cnn_train.so.8.9.2
-r-xr-xr-x. 1 casparvl def-users 132383440 May 31  2023 libcudnn_cnn_train.so.8.9.2
lrwxrwxrwx. 1 casparvl def-users       147 Oct 16 08:50 libcudnn_cnn_train_static.a -> /cvmfs/software.eessi.io/host_injections/2023.06/software/linux/x86_64/amd/zen3/software/cuDNN/8.9.2.26-CUDA-12.1.1/lib/libcudnn_cnn_train_static.a
lrwxrwxrwx. 1 casparvl def-users        27 May 31  2023 libcudnn_cnn_train_static_v8.a -> libcudnn_cnn_train_static.a
lrwxrwxrwx. 1 casparvl def-users        23 May 31  2023 libcudnn_ops_infer.so -> libcudnn_ops_infer.so.8
lrwxrwxrwx. 1 casparvl def-users        27 May 31  2023 libcudnn_ops_infer.so.8 -> libcudnn_ops_infer.so.8.9.2
-r-xr-xr-x. 1 casparvl def-users  90517920 May 31  2023 libcudnn_ops_infer.so.8.9.2
lrwxrwxrwx. 1 casparvl def-users       147 Oct 16 08:50 libcudnn_ops_infer_static.a -> /cvmfs/software.eessi.io/host_injections/2023.06/software/linux/x86_64/amd/zen3/software/cuDNN/8.9.2.26-CUDA-12.1.1/lib/libcudnn_ops_infer_static.a
lrwxrwxrwx. 1 casparvl def-users        27 May 31  2023 libcudnn_ops_infer_static_v8.a -> libcudnn_ops_infer_static.a
lrwxrwxrwx. 1 casparvl def-users        23 May 31  2023 libcudnn_ops_train.so -> libcudnn_ops_train.so.8
lrwxrwxrwx. 1 casparvl def-users        27 May 31  2023 libcudnn_ops_train.so.8 -> libcudnn_ops_train.so.8.9.2
-r-xr-xr-x. 1 casparvl def-users  70897912 May 31  2023 libcudnn_ops_train.so.8.9.2
lrwxrwxrwx. 1 casparvl def-users       147 Oct 16 08:50 libcudnn_ops_train_static.a -> /cvmfs/software.eessi.io/host_injections/2023.06/software/linux/x86_64/amd/zen3/software/cuDNN/8.9.2.26-CUDA-12.1.1/lib/libcudnn_ops_train_static.a
lrwxrwxrwx. 1 casparvl def-users        27 May 31  2023 libcudnn_ops_train_static_v8.a -> libcudnn_ops_train_static.a
lrwxrwxrwx. 1 casparvl def-users        13 May 31  2023 libcudnn.so -> libcudnn.so.8
lrwxrwxrwx. 1 casparvl def-users        17 May 31  2023 libcudnn.so.8 -> libcudnn.so.8.9.2
-r-xr-xr-x. 1 casparvl def-users    150200 May 31  2023 libcudnn.so.8.9.2

ocaisa
ocaisa previously requested changes Oct 16, 2024
eb_hooks.py Show resolved Hide resolved
Comment on lines 176 to 177
# $EESSI_SILENT - don't print any messages
# $EESSI_BASIC_ENV - give a basic set of environment variables
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this doesn't agree with what's going on below?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverted it back to sourcing it silently. However, we need the full environment to be initialised at this stage or some needed environment variable is not set (particularly, EESSI_SITE_SOFTWARE_PATH). Improved the comments. Will repeat tests (removing host-injections, building, ...).

See 77f3bc9

install_cuda_host_injections.sh link_nvidia_host_libraries.sh
eessi-2023.06-cuda-and-libraries.yml
install_cuda_and_libraries.sh
install_cuda_host_injections.sh
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

follow-up on this via #789

EESSI_SILENT=1 EESSI_BASIC_ENV=1 source $TOPDIR/init/eessi_environment_variables
# $EESSI_SILENT - don't print any messages if set (use 'unset EESSI_SILENT' to let script show messages)
# $EESSI_BASIC_ENV - give a basic set of environment variables if set (use 'EESSI_BASIC_ENV=' to let script initialise a full environment)
EESSI_SILENT=1 EESSI_BASIC_ENV= source $TOPDIR/init/eessi_environment_variables
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, now I'm confused. I think @boegel 's statement was more to adapt the comments (or remove them), wasn't it? Because EESSI_BASIC_ENV= will set the EESSI_BASIC_ENV, which will result in a failure because we were missing one of the environment variables then (EESSI_SOFTWARE_PATH or something? I don't remember).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I was only pointing out that the comments didn't agree with the code

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We check with -z $EESSI_BASIC_ENV, which is false if $EESSI_BASIC_ENV is defined but empty (or undefined):

bash-3.2$ EESSI_BASIC_ENV=; if [ ! -z $EESSI_BASIC_ENV ]; then echo "EESSI_BASIC_ENV is set: '$EESSI_BASIC_ENV'"; fi
bash-3.2$ EESSI_BASIC_ENV=1; if [ ! -z $EESSI_BASIC_ENV ]; then echo "EESSI_BASIC_ENV is set: '$EESSI_BASIC_ENV'"; fi
EESSI_BASIC_ENV is set: '1'
bash-3.2$ unset EESSI_BASIC_ENV; if [ ! -z $EESSI_BASIC_ENV ]; then echo "EESSI_BASIC_ENV is set: '$EESSI_BASIC_ENV'"; fi
bash-3.2$

@trz42
Copy link
Collaborator Author

trz42 commented Oct 16, 2024

Rebuilding after changes to EESSI_SILENT and EESSI_BASIC_ENV when sourcing init/eessi_environment_variables. Also removed contents of host-injections directory...

bot: build repo:eessi.io-2023.06-software arch:x86_64/amd/zen2 accel:nvidia/cc80
bot: build repo:eessi.io-2023.06-software arch:x86_64/amd/zen3 accel:nvidia/cc80

Copy link

eessi-bot bot commented Oct 16, 2024

Updates by the bot instance eessi-bot-mc-aws (click for details)
  • received bot command build repo:eessi.io-2023.06-software arch:x86_64/amd/zen2 accel:nvidia/cc80 from trz42

    • expanded format: build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen2 accelerator:nvidia/cc80
  • received bot command build repo:eessi.io-2023.06-software arch:x86_64/amd/zen3 accel:nvidia/cc80 from trz42

    • expanded format: build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen3 accelerator:nvidia/cc80
  • handling command build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen2 accelerator:nvidia/cc80 resulted in:

  • handling command build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen3 accelerator:nvidia/cc80 resulted in:

Copy link

eessi-bot bot commented Oct 16, 2024

Updates by the bot instance eessi-bot-mc-azure (click for details)
  • received bot command build repo:eessi.io-2023.06-software arch:x86_64/amd/zen2 accel:nvidia/cc80 from trz42

    • expanded format: build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen2 accelerator:nvidia/cc80
  • received bot command build repo:eessi.io-2023.06-software arch:x86_64/amd/zen3 accel:nvidia/cc80 from trz42

    • expanded format: build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen3 accelerator:nvidia/cc80
  • handling command build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen2 accelerator:nvidia/cc80 resulted in:

    • no jobs were submitted
  • handling command build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen3 accelerator:nvidia/cc80 resulted in:

    • no jobs were submitted

Updates by the bot instance boegel-bot-deucalion (click for details)
  • account trz42 has NO permission to send commands to the bot

Copy link

eessi-bot bot commented Oct 16, 2024

New job on instance eessi-bot-mc-aws for CPU micro-architecture x86_64-amd-zen2 and accelerator nvidia/cc80 for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2024.10/pr_772/23590

date job status comment
Oct 16 10:12:57 UTC 2024 submitted job id 23590 awaits release by job manager
Oct 16 10:13:47 UTC 2024 released job awaits launch by Slurm scheduler
Oct 16 10:14:55 UTC 2024 running job 23590 is running
Oct 16 10:45:11 UTC 2024 finished
😁 SUCCESS (click triangle for details)
Details
✅ job output file slurm-23590.out
✅ no message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-x86_64-amd-zen2-1729074398.tar.gzsize: 698 MiB (732485706 bytes)
entries: 75
modules under 2023.06/software/linux/x86_64/amd/zen2/accel/nvidia/cc80/modules/all
cuDNN/8.9.2.26-CUDA-12.1.1.lua
software under 2023.06/software/linux/x86_64/amd/zen2/accel/nvidia/cc80/software
cuDNN/8.9.2.26-CUDA-12.1.1
other under 2023.06/software/linux/x86_64/amd/zen2/accel/nvidia/cc80
2023.06/init/easybuild/eb_hooks.py
2023.06/init/eessi_environment_variables
2023.06/scripts/gpu_support/nvidia/easystacks/eessi-2023.06-eb-4.9.4-2023a-CUDA-host-injections.yml
2023.06/scripts/gpu_support/nvidia/install_cuda_and_libraries.sh
Oct 16 10:45:11 UTC 2024 test result
😁 SUCCESS (click triangle for details)
ReFrame Summary
[ OK ] ( 1/10) EESSI_LAMMPS_lj %scale=1_node %device_type=cpu %module_name=LAMMPS/29Aug2024-foss-2023b-kokkos /aeb2d9df @BotBuildTests:x86-64-amd-zen2-node+default
P: perf: 336.866 timesteps/s (r:0, l:None, u:None)
[ OK ] ( 2/10) EESSI_LAMMPS_lj %scale=1_node %device_type=cpu %module_name=LAMMPS/2Aug2023_update2-foss-2023a-kokkos /04ff9ece @BotBuildTests:x86-64-amd-zen2-node+default
P: perf: 434.873 timesteps/s (r:0, l:None, u:None)
[ OK ] ( 3/10) EESSI_OSU_Micro_Benchmarks_coll %benchmark_info=mpi.collective.osu_allreduce %scale=1_node %module_name=OSU-Micro-Benchmarks/7.2-gompi-2023b %device_type=cpu /31ac6ab9 @BotBuildTests:x86-64-amd-zen2-node+default
P: latency: 4.74 us (r:0, l:None, u:None)
[ OK ] ( 4/10) EESSI_OSU_Micro_Benchmarks_coll %benchmark_info=mpi.collective.osu_allreduce %scale=1_node %module_name=OSU-Micro-Benchmarks/7.1-1-gompi-2023a %device_type=cpu /f3be40a2 @BotBuildTests:x86-64-amd-zen2-node+default
P: latency: 4.65 us (r:0, l:None, u:None)
[ OK ] ( 5/10) EESSI_OSU_Micro_Benchmarks_coll %benchmark_info=mpi.collective.osu_alltoall %scale=1_node %module_name=OSU-Micro-Benchmarks/7.2-gompi-2023b %device_type=cpu /10e66fba @BotBuildTests:x86-64-amd-zen2-node+default
P: latency: 8.81 us (r:0, l:None, u:None)
[ OK ] ( 6/10) EESSI_OSU_Micro_Benchmarks_coll %benchmark_info=mpi.collective.osu_alltoall %scale=1_node %module_name=OSU-Micro-Benchmarks/7.1-1-gompi-2023a %device_type=cpu /5be57ae7 @BotBuildTests:x86-64-amd-zen2-node+default
P: latency: 8.45 us (r:0, l:None, u:None)
[ OK ] ( 7/10) EESSI_OSU_Micro_Benchmarks_pt2pt %benchmark_info=mpi.pt2pt.osu_latency %scale=1_node %module_name=OSU-Micro-Benchmarks/7.2-gompi-2023b %device_type=cpu /c8c9aff5 @BotBuildTests:x86-64-amd-zen2-node+default
P: latency: 0.29 us (r:0, l:None, u:None)
[ OK ] ( 8/10) EESSI_OSU_Micro_Benchmarks_pt2pt %benchmark_info=mpi.pt2pt.osu_latency %scale=1_node %module_name=OSU-Micro-Benchmarks/7.1-1-gompi-2023a %device_type=cpu /9795e491 @BotBuildTests:x86-64-amd-zen2-node+default
P: latency: 0.3 us (r:0, l:None, u:None)
[ OK ] ( 9/10) EESSI_OSU_Micro_Benchmarks_pt2pt %benchmark_info=mpi.pt2pt.osu_bw %scale=1_node %module_name=OSU-Micro-Benchmarks/7.2-gompi-2023b %device_type=cpu /48da21c5 @BotBuildTests:x86-64-amd-zen2-node+default
P: bandwidth: 7737.77 MB/s (r:0, l:None, u:None)
[ OK ] (10/10) EESSI_OSU_Micro_Benchmarks_pt2pt %benchmark_info=mpi.pt2pt.osu_bw %scale=1_node %module_name=OSU-Micro-Benchmarks/7.1-1-gompi-2023a %device_type=cpu /1b8c1ca2 @BotBuildTests:x86-64-amd-zen2-node+default
P: bandwidth: 7782.66 MB/s (r:0, l:None, u:None)
[ PASSED ] Ran 10/10 test case(s) from 10 check(s) (0 failure(s), 0 skipped, 0 aborted)
Details
✅ job output file slurm-23590.out
✅ no message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case
Oct 16 11:41:11 UTC 2024 uploaded transfer of eessi-2023.06-software-linux-x86_64-amd-zen2-1729074398.tar.gz to S3 bucket succeeded

Copy link

eessi-bot bot commented Oct 16, 2024

New job on instance eessi-bot-mc-aws for CPU micro-architecture x86_64-amd-zen3 and accelerator nvidia/cc80 for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2024.10/pr_772/23591

date job status comment
Oct 16 10:13:01 UTC 2024 submitted job id 23591 awaits release by job manager
Oct 16 10:13:51 UTC 2024 released job awaits launch by Slurm scheduler
Oct 16 10:20:11 UTC 2024 running job 23591 is running
Oct 16 10:46:13 UTC 2024 finished
😁 SUCCESS (click triangle for details)
Details
✅ job output file slurm-23591.out
✅ no message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-x86_64-amd-zen3-1729074646.tar.gzsize: 698 MiB (732490152 bytes)
entries: 75
modules under 2023.06/software/linux/x86_64/amd/zen3/accel/nvidia/cc80/modules/all
cuDNN/8.9.2.26-CUDA-12.1.1.lua
software under 2023.06/software/linux/x86_64/amd/zen3/accel/nvidia/cc80/software
cuDNN/8.9.2.26-CUDA-12.1.1
other under 2023.06/software/linux/x86_64/amd/zen3/accel/nvidia/cc80
2023.06/init/easybuild/eb_hooks.py
2023.06/init/eessi_environment_variables
2023.06/scripts/gpu_support/nvidia/easystacks/eessi-2023.06-eb-4.9.4-2023a-CUDA-host-injections.yml
2023.06/scripts/gpu_support/nvidia/install_cuda_and_libraries.sh
Oct 16 10:46:13 UTC 2024 test result
😁 SUCCESS (click triangle for details)
ReFrame Summary
[ OK ] ( 1/10) EESSI_LAMMPS_lj %scale=1_node %device_type=cpu %module_name=LAMMPS/29Aug2024-foss-2023b-kokkos /aeb2d9df @BotBuildTests:x86-64-amd-zen3-node+default
P: perf: 518.316 timesteps/s (r:0, l:None, u:None)
[ OK ] ( 2/10) EESSI_LAMMPS_lj %scale=1_node %device_type=cpu %module_name=LAMMPS/2Aug2023_update2-foss-2023a-kokkos /04ff9ece @BotBuildTests:x86-64-amd-zen3-node+default
P: perf: 530.578 timesteps/s (r:0, l:None, u:None)
[ OK ] ( 3/10) EESSI_OSU_Micro_Benchmarks_coll %benchmark_info=mpi.collective.osu_allreduce %scale=1_node %module_name=OSU-Micro-Benchmarks/7.2-gompi-2023b %device_type=cpu /31ac6ab9 @BotBuildTests:x86-64-amd-zen3-node+default
P: latency: 2.54 us (r:0, l:None, u:None)
[ OK ] ( 4/10) EESSI_OSU_Micro_Benchmarks_coll %benchmark_info=mpi.collective.osu_allreduce %scale=1_node %module_name=OSU-Micro-Benchmarks/7.1-1-gompi-2023a %device_type=cpu /f3be40a2 @BotBuildTests:x86-64-amd-zen3-node+default
P: latency: 2.33 us (r:0, l:None, u:None)
[ OK ] ( 5/10) EESSI_OSU_Micro_Benchmarks_coll %benchmark_info=mpi.collective.osu_alltoall %scale=1_node %module_name=OSU-Micro-Benchmarks/7.2-gompi-2023b %device_type=cpu /10e66fba @BotBuildTests:x86-64-amd-zen3-node+default
P: latency: 5.58 us (r:0, l:None, u:None)
[ OK ] ( 6/10) EESSI_OSU_Micro_Benchmarks_coll %benchmark_info=mpi.collective.osu_alltoall %scale=1_node %module_name=OSU-Micro-Benchmarks/7.1-1-gompi-2023a %device_type=cpu /5be57ae7 @BotBuildTests:x86-64-amd-zen3-node+default
P: latency: 5.4 us (r:0, l:None, u:None)
[ OK ] ( 7/10) EESSI_OSU_Micro_Benchmarks_pt2pt %benchmark_info=mpi.pt2pt.osu_latency %scale=1_node %module_name=OSU-Micro-Benchmarks/7.2-gompi-2023b %device_type=cpu /c8c9aff5 @BotBuildTests:x86-64-amd-zen3-node+default
P: latency: 0.26 us (r:0, l:None, u:None)
[ OK ] ( 8/10) EESSI_OSU_Micro_Benchmarks_pt2pt %benchmark_info=mpi.pt2pt.osu_latency %scale=1_node %module_name=OSU-Micro-Benchmarks/7.1-1-gompi-2023a %device_type=cpu /9795e491 @BotBuildTests:x86-64-amd-zen3-node+default
P: latency: 0.22 us (r:0, l:None, u:None)
[ OK ] ( 9/10) EESSI_OSU_Micro_Benchmarks_pt2pt %benchmark_info=mpi.pt2pt.osu_bw %scale=1_node %module_name=OSU-Micro-Benchmarks/7.2-gompi-2023b %device_type=cpu /48da21c5 @BotBuildTests:x86-64-amd-zen3-node+default
P: bandwidth: 14183.13 MB/s (r:0, l:None, u:None)
[ OK ] (10/10) EESSI_OSU_Micro_Benchmarks_pt2pt %benchmark_info=mpi.pt2pt.osu_bw %scale=1_node %module_name=OSU-Micro-Benchmarks/7.1-1-gompi-2023a %device_type=cpu /1b8c1ca2 @BotBuildTests:x86-64-amd-zen3-node+default
P: bandwidth: 14326.05 MB/s (r:0, l:None, u:None)
[ PASSED ] Ran 10/10 test case(s) from 10 check(s) (0 failure(s), 0 skipped, 0 aborted)
Details
✅ job output file slurm-23591.out
✅ no message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case
Oct 16 11:41:36 UTC 2024 uploaded transfer of eessi-2023.06-software-linux-x86_64-amd-zen3-1729074646.tar.gz to S3 bucket succeeded

)
copy_files_by_list ${TOPDIR}/scripts/gpu_support/nvidia ${INSTALL_PREFIX}/scripts/gpu_support/nvidia "${nvidia_files[@]}"

# Easystacks to be used to install software in host injections
host_injections_easystacks=(
eessi-2023.06-eb-4.9.4-2023a-CUDA-host-injections.yml
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to follow up on this in a future PR, but I wonder if we need a hardcoded list here, can't we use a glob here like eessi-*-CUDA-host-injections.yml, so we don't need to remember to update this list whenever an additional easystack file is added under scripts/gpu_support/nvidia/easystacks?

@boegel boegel added the bot:deploy Ask bot to deploy missing software installations to EESSI label Oct 16, 2024
Copy link
Contributor

@boegel boegel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@boegel
Copy link
Contributor

boegel commented Oct 16, 2024

staging PRs merged

@boegel boegel dismissed ocaisa’s stale review October 16, 2024 13:21

review taking into account

@boegel
Copy link
Contributor

boegel commented Oct 16, 2024

Can't wait for part 2 😆

@boegel boegel merged commit 490a9ee into EESSI:2023.06-software.eessi.io Oct 16, 2024
43 checks passed
Copy link

eessi-bot bot commented Oct 16, 2024

PR merged! Moved ['/project/def-users/SHARED/jobs/2024.10/pr_772/23387', '/project/def-users/SHARED/jobs/2024.10/pr_772/23388', '/project/def-users/SHARED/jobs/2024.10/pr_772/23389', '/project/def-users/SHARED/jobs/2024.10/pr_772/23390', '/project/def-users/SHARED/jobs/2024.10/pr_772/23393', '/project/def-users/SHARED/jobs/2024.10/pr_772/23394', '/project/def-users/SHARED/jobs/2024.10/pr_772/23395', '/project/def-users/SHARED/jobs/2024.10/pr_772/23396', '/project/def-users/SHARED/jobs/2024.10/pr_772/23401', '/project/def-users/SHARED/jobs/2024.10/pr_772/23403', '/project/def-users/SHARED/jobs/2024.10/pr_772/23404', '/project/def-users/SHARED/jobs/2024.10/pr_772/23405', '/project/def-users/SHARED/jobs/2024.10/pr_772/23407', '/project/def-users/SHARED/jobs/2024.10/pr_772/23585', '/project/def-users/SHARED/jobs/2024.10/pr_772/23586', '/project/def-users/SHARED/jobs/2024.10/pr_772/23590', '/project/def-users/SHARED/jobs/2024.10/pr_772/23591'] to /project/def-users/SHARED/trash_bin/EESSI/software-layer/2024.10.16

PR merged! Moved [] to /home/kehoste/project_dir/bot/trash-bin #$HOME/trash_bin/EESSI/software-layer/2024.10.16

Copy link

eessi-bot bot commented Oct 16, 2024

PR merged! Moved [] to /project/def-users/SHARED/trash_bin/EESSI/software-layer/2024.10.16

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2023.06-software.eessi.io 2023.06 version of software.eessi.io accel:nvidia bot:deploy Ask bot to deploy missing software installations to EESSI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants