Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

alternative method to rebuild & shipping full CUDA runtime #356

Merged

Conversation

trz42
Copy link
Collaborator

@trz42 trz42 commented May 6, 2024

This PR serves two purposes:

  • it implements an alternative method using the lowerdir option of fuse-overlayfs to make certain directories writable (those directories contain software that shall be removed)
  • it attempts to rebuild CUDA because the previously built module lacks the runtime library

truib added 4 commits May 6, 2024 22:03
- bot/build.sh
  - first runs EESSI-determine-rebuilds.sh to determine which software
    package directories have to be removed
  - it then processes the output and creates lower directories which are
    writable
  - finally it uses these lower directories as additional parameter when running
    EESSI-remove-software.sh
@trz42 trz42 added enhancement New feature or request NESSI/2023.06 GPU labels May 6, 2024
@nessi-bot
Copy link

nessi-bot bot commented May 6, 2024

Instance AWS-MC-NESSI is configured to build:

  • arch x86_64/generic for repo nessi-2023.06-cl
  • arch x86_64/generic for repo nessi-2023.06-swl-deb10
  • arch x86_64/generic for repo nessi-2023.06-swl-deb11
  • arch x86_64/intel/skylake_avx512 for repo nessi-2023.06-cl
  • arch x86_64/intel/skylake_avx512 for repo nessi-2023.06-swl-deb10
  • arch x86_64/intel/skylake_avx512 for repo nessi-2023.06-swl-deb11
  • arch x86_64/amd/zen2 for repo nessi-2023.06-cl
  • arch x86_64/amd/zen2 for repo nessi-2023.06-swl-deb10
  • arch x86_64/amd/zen2 for repo nessi-2023.06-swl-deb11
  • arch aarch64/generic for repo nessi-2023.06-cl
  • arch aarch64/generic for repo nessi-2023.06-swl-deb10
  • arch aarch64/generic for repo nessi-2023.06-swl-deb11

@nessi-bot
Copy link

nessi-bot bot commented May 6, 2024

Instance eX3-NESSI is configured to build:

  • arch x86_64/amd/zen2 for repo nessi-2023.06-cl
  • arch x86_64/amd/zen2 for repo nessi-2023.06-swl-deb10
  • arch x86_64/amd/zen2 for repo nessi-2023.06-swl-deb11
  • arch aarch64/generic for repo nessi-2023.06-cl
  • arch aarch64/generic for repo nessi-2023.06-swl-deb10
  • arch aarch64/generic for repo nessi-2023.06-swl-deb11

@nessi-bot
Copy link

nessi-bot bot commented May 6, 2024

Instance Fram-NESSI is configured to build:

  • arch x86_64/generic for repo nessi-2023.06-cl
  • arch x86_64/generic for repo nessi-2023.06-swl-deb10
  • arch x86_64/generic for repo nessi-2023.06-swl-deb11
  • arch x86_64/intel/broadwell for repo nessi-2023.06-cl
  • arch x86_64/intel/broadwell for repo nessi-2023.06-swl-deb10
  • arch x86_64/intel/broadwell for repo nessi-2023.06-swl-deb11

@nessi-bot
Copy link

nessi-bot bot commented May 6, 2024

Instance Saga-NESSI is configured to build:

  • arch x86_64/intel/skylake_avx512 for repo nessi-2023.06-cl
  • arch x86_64/intel/skylake_avx512 for repo nessi-2023.06-swl-deb10
  • arch x86_64/intel/skylake_avx512 for repo nessi-2023.06-swl-deb11
  • arch x86_64/intel/broadwell for repo nessi-2023.06-cl
  • arch x86_64/intel/broadwell for repo nessi-2023.06-swl-deb10
  • arch x86_64/intel/broadwell for repo nessi-2023.06-swl-deb11
  • arch x86_64/generic for repo nessi-2023.06-cl
  • arch x86_64/generic for repo nessi-2023.06-swl-deb10
  • arch x86_64/generic for repo nessi-2023.06-swl-deb11

@trz42
Copy link
Collaborator Author

trz42 commented May 6, 2024

Just a test on a single architecture first

bot: build inst:eX3-NESSI repo:nessi-2023.06-swl-deb10 arch:aarch64/generic

@nessi-bot
Copy link

nessi-bot bot commented May 6, 2024

Updates by the bot instance AWS-MC-NESSI (click for details)
  • received bot command build inst:eX3-NESSI repo:nessi-2023.06-swl-deb10 arch:aarch64/generic from trz42

    • expanded format: build instance:eX3-NESSI repository:nessi-2023.06-swl-deb10 architecture:aarch64/generic
  • handling command build instance:eX3-NESSI repository:nessi-2023.06-swl-deb10 architecture:aarch64/generic resulted in:

    • no jobs were submitted

@nessi-bot
Copy link

nessi-bot bot commented May 6, 2024

Updates by the bot instance eX3-NESSI (click for details)
  • received bot command build inst:eX3-NESSI repo:nessi-2023.06-swl-deb10 arch:aarch64/generic from trz42

    • expanded format: build instance:eX3-NESSI repository:nessi-2023.06-swl-deb10 architecture:aarch64/generic
  • handling command build instance:eX3-NESSI repository:nessi-2023.06-swl-deb10 architecture:aarch64/generic resulted in:

@nessi-bot
Copy link

nessi-bot bot commented May 6, 2024

Updates by the bot instance Saga-NESSI (click for details)
  • received bot command build inst:eX3-NESSI repo:nessi-2023.06-swl-deb10 arch:aarch64/generic from trz42

    • expanded format: build instance:eX3-NESSI repository:nessi-2023.06-swl-deb10 architecture:aarch64/generic
  • handling command build instance:eX3-NESSI repository:nessi-2023.06-swl-deb10 architecture:aarch64/generic resulted in:

    • no jobs were submitted

@nessi-bot
Copy link

nessi-bot bot commented May 6, 2024

Updates by the bot instance Fram-NESSI (click for details)
  • received bot command build inst:eX3-NESSI repo:nessi-2023.06-swl-deb10 arch:aarch64/generic from trz42

    • expanded format: build instance:eX3-NESSI repository:nessi-2023.06-swl-deb10 architecture:aarch64/generic
  • handling command build instance:eX3-NESSI repository:nessi-2023.06-swl-deb10 architecture:aarch64/generic resulted in:

    • no jobs were submitted

@nessi-bot
Copy link

nessi-bot bot commented May 6, 2024

New job on instance eX3-NESSI for architecture aarch64-generic for repository nessi-2023.06-swl-deb10 in job dir /home/thomarob/pilot.nessi.no/jobs/2024.05/pr_356/188997

date job status comment
May 06 09:04:50 PM UTC 2024 submitted job id 188997 awaits release by job manager
May 06 09:05:24 PM UTC 2024 released job awaits launch by Slurm scheduler
May 06 09:06:26 PM UTC 2024 running job 188997 is running
May 06 09:26:48 PM UTC 2024 finished
😢 FAILURE (click triangle for details)
Details
✅ job output file slurm-188997.out
❌ found message matching ERROR:
❌ found message matching FAILED:
❌ found message matching required modules missing:
❌ no message matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
No artefacts were created or found.
May 06 09:26:48 PM UTC 2024 test result
🤷 UNKNOWN (click triangle for details)
  • Job test file _bot_job188997.test does not exist in job directory, or parsing it failed.

@trz42
Copy link
Collaborator Author

trz42 commented May 6, 2024

Next test after fixing some lower dir logic

bot: build inst:eX3-NESSI repo:nessi-2023.06-swl-deb10 arch:aarch64/generic

@nessi-bot
Copy link

nessi-bot bot commented May 6, 2024

Updates by the bot instance AWS-MC-NESSI (click for details)
  • received bot command build inst:eX3-NESSI repo:nessi-2023.06-swl-deb10 arch:aarch64/generic from trz42

    • expanded format: build instance:eX3-NESSI repository:nessi-2023.06-swl-deb10 architecture:aarch64/generic
  • handling command build instance:eX3-NESSI repository:nessi-2023.06-swl-deb10 architecture:aarch64/generic resulted in:

    • no jobs were submitted

@nessi-bot
Copy link

nessi-bot bot commented May 6, 2024

Updates by the bot instance eX3-NESSI (click for details)
  • received bot command build inst:eX3-NESSI repo:nessi-2023.06-swl-deb10 arch:aarch64/generic from trz42

    • expanded format: build instance:eX3-NESSI repository:nessi-2023.06-swl-deb10 architecture:aarch64/generic
  • handling command build instance:eX3-NESSI repository:nessi-2023.06-swl-deb10 architecture:aarch64/generic resulted in:

@nessi-bot
Copy link

nessi-bot bot commented May 6, 2024

Updates by the bot instance Fram-NESSI (click for details)
  • received bot command build inst:eX3-NESSI repo:nessi-2023.06-swl-deb10 arch:aarch64/generic from trz42

    • expanded format: build instance:eX3-NESSI repository:nessi-2023.06-swl-deb10 architecture:aarch64/generic
  • handling command build instance:eX3-NESSI repository:nessi-2023.06-swl-deb10 architecture:aarch64/generic resulted in:

    • no jobs were submitted

@nessi-bot
Copy link

nessi-bot bot commented May 6, 2024

Updates by the bot instance Saga-NESSI (click for details)
  • received bot command build inst:eX3-NESSI repo:nessi-2023.06-swl-deb10 arch:aarch64/generic from trz42

    • expanded format: build instance:eX3-NESSI repository:nessi-2023.06-swl-deb10 architecture:aarch64/generic
  • handling command build instance:eX3-NESSI repository:nessi-2023.06-swl-deb10 architecture:aarch64/generic resulted in:

    • no jobs were submitted

@nessi-bot
Copy link

nessi-bot bot commented May 6, 2024

New job on instance eX3-NESSI for architecture aarch64-generic for repository nessi-2023.06-swl-deb10 in job dir /home/thomarob/pilot.nessi.no/jobs/2024.05/pr_356/189003

date job status comment
May 06 09:27:59 PM UTC 2024 submitted job id 189003 awaits release by job manager
May 06 09:28:52 PM UTC 2024 released job awaits launch by Slurm scheduler
May 06 09:29:54 PM UTC 2024 running job 189003 is running
May 06 09:31:56 PM UTC 2024 finished
🤷 UNKNOWN (click triangle for details)
  • Job results file _bot_job189003.result does not exist in job directory, or parsing it failed.
  • No artefacts were found/reported.
May 06 09:31:56 PM UTC 2024 test result
🤷 UNKNOWN (click triangle for details)
  • Job test file _bot_job189003.test does not exist in job directory, or parsing it failed.

@trz42
Copy link
Collaborator Author

trz42 commented May 6, 2024

Next test after creating full directory tree under lower dir

bot: build inst:eX3-NESSI repo:nessi-2023.06-swl-deb10 arch:aarch64/generic

@nessi-bot
Copy link

nessi-bot bot commented May 6, 2024

Updates by the bot instance AWS-MC-NESSI (click for details)
  • received bot command build inst:eX3-NESSI repo:nessi-2023.06-swl-deb10 arch:aarch64/generic from trz42

    • expanded format: build instance:eX3-NESSI repository:nessi-2023.06-swl-deb10 architecture:aarch64/generic
  • handling command build instance:eX3-NESSI repository:nessi-2023.06-swl-deb10 architecture:aarch64/generic resulted in:

    • no jobs were submitted

@nessi-bot
Copy link

nessi-bot bot commented May 6, 2024

Updates by the bot instance eX3-NESSI (click for details)
  • received bot command build inst:eX3-NESSI repo:nessi-2023.06-swl-deb10 arch:aarch64/generic from trz42

    • expanded format: build instance:eX3-NESSI repository:nessi-2023.06-swl-deb10 architecture:aarch64/generic
  • handling command build instance:eX3-NESSI repository:nessi-2023.06-swl-deb10 architecture:aarch64/generic resulted in:

@nessi-bot
Copy link

nessi-bot bot commented May 6, 2024

Updates by the bot instance Fram-NESSI (click for details)
  • received bot command build inst:eX3-NESSI repo:nessi-2023.06-swl-deb10 arch:aarch64/generic from trz42

    • expanded format: build instance:eX3-NESSI repository:nessi-2023.06-swl-deb10 architecture:aarch64/generic
  • handling command build instance:eX3-NESSI repository:nessi-2023.06-swl-deb10 architecture:aarch64/generic resulted in:

    • no jobs were submitted

@nessi-bot
Copy link

nessi-bot bot commented May 6, 2024

Updates by the bot instance Saga-NESSI (click for details)
  • received bot command build inst:eX3-NESSI repo:nessi-2023.06-swl-deb10 arch:aarch64/generic from trz42

    • expanded format: build instance:eX3-NESSI repository:nessi-2023.06-swl-deb10 architecture:aarch64/generic
  • handling command build instance:eX3-NESSI repository:nessi-2023.06-swl-deb10 architecture:aarch64/generic resulted in:

    • no jobs were submitted

@nessi-bot
Copy link

nessi-bot bot commented May 6, 2024

Updates by the bot instance Fram-NESSI (click for details)
  • received bot command build inst:eX3-NESSI repo:nessi-2023.06-swl-deb10 arch:aarch64/generic from trz42

    • expanded format: build instance:eX3-NESSI repository:nessi-2023.06-swl-deb10 architecture:aarch64/generic
  • handling command build instance:eX3-NESSI repository:nessi-2023.06-swl-deb10 architecture:aarch64/generic resulted in:

    • no jobs were submitted

@nessi-bot
Copy link

nessi-bot bot commented May 6, 2024

Updates by the bot instance eX3-NESSI (click for details)
  • received bot command build inst:eX3-NESSI repo:nessi-2023.06-swl-deb10 arch:aarch64/generic from trz42

    • expanded format: build instance:eX3-NESSI repository:nessi-2023.06-swl-deb10 architecture:aarch64/generic
  • handling command build instance:eX3-NESSI repository:nessi-2023.06-swl-deb10 architecture:aarch64/generic resulted in:

@nessi-bot
Copy link

nessi-bot bot commented May 6, 2024

Updates by the bot instance Saga-NESSI (click for details)
  • received bot command build inst:eX3-NESSI repo:nessi-2023.06-swl-deb10 arch:aarch64/generic from trz42

    • expanded format: build instance:eX3-NESSI repository:nessi-2023.06-swl-deb10 architecture:aarch64/generic
  • handling command build instance:eX3-NESSI repository:nessi-2023.06-swl-deb10 architecture:aarch64/generic resulted in:

    • no jobs were submitted

@nessi-bot
Copy link

nessi-bot bot commented May 6, 2024

New job on instance eX3-NESSI for architecture aarch64-generic for repository nessi-2023.06-swl-deb10 in job dir /home/thomarob/pilot.nessi.no/jobs/2024.05/pr_356/189014

date job status comment
May 06 11:16:21 PM UTC 2024 submitted job id 189014 awaits release by job manager
May 06 11:16:25 PM UTC 2024 released job awaits launch by Slurm scheduler
May 06 11:17:28 PM UTC 2024 running job 189014 is running
May 06 11:20:33 PM UTC 2024 finished
🤷 UNKNOWN (click triangle for details)
  • Job results file _bot_job189014.result does not exist in job directory, or parsing it failed.
  • No artefacts were found/reported.
May 06 11:20:33 PM UTC 2024 test result
🤷 UNKNOWN (click triangle for details)
  • Job test file _bot_job189014.test does not exist in job directory, or parsing it failed.

@trz42
Copy link
Collaborator Author

trz42 commented May 6, 2024

Finally (using brute force)?

bot: build inst:eX3-NESSI repo:nessi-2023.06-swl-deb10 arch:aarch64/generic

@nessi-bot
Copy link

nessi-bot bot commented May 6, 2024

Updates by the bot instance AWS-MC-NESSI (click for details)
  • received bot command build inst:eX3-NESSI repo:nessi-2023.06-swl-deb10 arch:aarch64/generic from trz42

    • expanded format: build instance:eX3-NESSI repository:nessi-2023.06-swl-deb10 architecture:aarch64/generic
  • handling command build instance:eX3-NESSI repository:nessi-2023.06-swl-deb10 architecture:aarch64/generic resulted in:

    • no jobs were submitted

@nessi-bot
Copy link

nessi-bot bot commented May 6, 2024

Updates by the bot instance eX3-NESSI (click for details)
  • received bot command build inst:eX3-NESSI repo:nessi-2023.06-swl-deb10 arch:aarch64/generic from trz42

    • expanded format: build instance:eX3-NESSI repository:nessi-2023.06-swl-deb10 architecture:aarch64/generic
  • handling command build instance:eX3-NESSI repository:nessi-2023.06-swl-deb10 architecture:aarch64/generic resulted in:

@nessi-bot
Copy link

nessi-bot bot commented May 6, 2024

Updates by the bot instance Fram-NESSI (click for details)
  • received bot command build inst:eX3-NESSI repo:nessi-2023.06-swl-deb10 arch:aarch64/generic from trz42

    • expanded format: build instance:eX3-NESSI repository:nessi-2023.06-swl-deb10 architecture:aarch64/generic
  • handling command build instance:eX3-NESSI repository:nessi-2023.06-swl-deb10 architecture:aarch64/generic resulted in:

    • no jobs were submitted

@nessi-bot
Copy link

nessi-bot bot commented May 6, 2024

Updates by the bot instance Saga-NESSI (click for details)
  • received bot command build inst:eX3-NESSI repo:nessi-2023.06-swl-deb10 arch:aarch64/generic from trz42

    • expanded format: build instance:eX3-NESSI repository:nessi-2023.06-swl-deb10 architecture:aarch64/generic
  • handling command build instance:eX3-NESSI repository:nessi-2023.06-swl-deb10 architecture:aarch64/generic resulted in:

    • no jobs were submitted

@nessi-bot
Copy link

nessi-bot bot commented May 6, 2024

New job on instance eX3-NESSI for architecture aarch64-generic for repository nessi-2023.06-swl-deb10 in job dir /home/thomarob/pilot.nessi.no/jobs/2024.05/pr_356/189016

date job status comment
May 06 11:26:09 PM UTC 2024 submitted job id 189016 awaits release by job manager
May 06 11:26:37 PM UTC 2024 released job awaits launch by Slurm scheduler
May 06 11:27:40 PM UTC 2024 running job 189016 is running
May 07 12:07:18 AM UTC 2024 finished
😁 SUCCESS (click triangle for details)
Details
✅ job output file slurm-189016.out
✅ no message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-aarch64-generic-1715038420.tar.gzsize: 2010 MiB (2107801514 bytes)
entries: 3777
modules under 2023.06/software/linux/aarch64/generic/modules/all
CUDA/12.1.1.lua
software under 2023.06/software/linux/aarch64/generic/software
CUDA/12.1.1
other under 2023.06/software/linux/aarch64/generic
no other files in tarball
May 07 12:07:18 AM UTC 2024 test result
😁 SUCCESS (click triangle for details)
ReFrame Summary
[ PASSED ] Ran 7/7 test case(s) from 7 check(s) (0 failure(s), 0 skipped, 0 aborted)
Details
✅ job output file slurm-189016.out
✅ no message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

@trz42
Copy link
Collaborator Author

trz42 commented May 6, 2024

Try all

bot: build inst:Fram-NESSI repo:nessi-2023.06-swl-deb11 arch:x86_64/intel/broadwell
bot: build inst:Saga-NESSI repo:nessi-2023.06-swl-deb11 arch:x86_64/intel/skylake
bot: build inst:eX3-NESSI repo:nessi-2023.06-swl-deb10 arch:aarch64/generic
bot: build inst:AWS-MC-NESSI repo:nessi-2023.06-swl-deb11 arch:zen2
bot: build inst:AWS-MC-NESSI repo:nessi-2023.06-swl-deb11 arch:x86_64/generic

@nessi-bot
Copy link

nessi-bot bot commented May 6, 2024

Updates by the bot instance AWS-MC-NESSI (click for details)
  • received bot command build inst:Fram-NESSI repo:nessi-2023.06-swl-deb11 arch:x86_64/intel/broadwell from trz42

    • expanded format: build instance:Fram-NESSI repository:nessi-2023.06-swl-deb11 architecture:x86_64/intel/broadwell
  • received bot command build inst:Saga-NESSI repo:nessi-2023.06-swl-deb11 arch:x86_64/intel/skylake from trz42

    • expanded format: build instance:Saga-NESSI repository:nessi-2023.06-swl-deb11 architecture:x86_64/intel/skylake
  • received bot command build inst:eX3-NESSI repo:nessi-2023.06-swl-deb10 arch:aarch64/generic from trz42

    • expanded format: build instance:eX3-NESSI repository:nessi-2023.06-swl-deb10 architecture:aarch64/generic
  • received bot command build inst:AWS-MC-NESSI repo:nessi-2023.06-swl-deb11 arch:zen2 from trz42

    • expanded format: build instance:AWS-MC-NESSI repository:nessi-2023.06-swl-deb11 architecture:zen2
  • received bot command build inst:AWS-MC-NESSI repo:nessi-2023.06-swl-deb11 arch:x86_64/generic from trz42

    • expanded format: build instance:AWS-MC-NESSI repository:nessi-2023.06-swl-deb11 architecture:x86_64/generic
  • handling command build instance:Fram-NESSI repository:nessi-2023.06-swl-deb11 architecture:x86_64/intel/broadwell resulted in:

    • no jobs were submitted
  • handling command build instance:Saga-NESSI repository:nessi-2023.06-swl-deb11 architecture:x86_64/intel/skylake resulted in:

    • no jobs were submitted
  • handling command build instance:eX3-NESSI repository:nessi-2023.06-swl-deb10 architecture:aarch64/generic resulted in:

    • no jobs were submitted
  • handling command build instance:AWS-MC-NESSI repository:nessi-2023.06-swl-deb11 architecture:zen2 resulted in:

  • handling command build instance:AWS-MC-NESSI repository:nessi-2023.06-swl-deb11 architecture:x86_64/generic resulted in:

@nessi-bot
Copy link

nessi-bot bot commented May 6, 2024

Updates by the bot instance eX3-NESSI (click for details)
  • received bot command build inst:Fram-NESSI repo:nessi-2023.06-swl-deb11 arch:x86_64/intel/broadwell from trz42

    • expanded format: build instance:Fram-NESSI repository:nessi-2023.06-swl-deb11 architecture:x86_64/intel/broadwell
  • received bot command build inst:Saga-NESSI repo:nessi-2023.06-swl-deb11 arch:x86_64/intel/skylake from trz42

    • expanded format: build instance:Saga-NESSI repository:nessi-2023.06-swl-deb11 architecture:x86_64/intel/skylake
  • received bot command build inst:eX3-NESSI repo:nessi-2023.06-swl-deb10 arch:aarch64/generic from trz42

    • expanded format: build instance:eX3-NESSI repository:nessi-2023.06-swl-deb10 architecture:aarch64/generic
  • received bot command build inst:AWS-MC-NESSI repo:nessi-2023.06-swl-deb11 arch:zen2 from trz42

    • expanded format: build instance:AWS-MC-NESSI repository:nessi-2023.06-swl-deb11 architecture:zen2
  • received bot command build inst:AWS-MC-NESSI repo:nessi-2023.06-swl-deb11 arch:x86_64/generic from trz42

    • expanded format: build instance:AWS-MC-NESSI repository:nessi-2023.06-swl-deb11 architecture:x86_64/generic
  • handling command build instance:Fram-NESSI repository:nessi-2023.06-swl-deb11 architecture:x86_64/intel/broadwell resulted in:

    • no jobs were submitted
  • handling command build instance:Saga-NESSI repository:nessi-2023.06-swl-deb11 architecture:x86_64/intel/skylake resulted in:

    • no jobs were submitted
  • handling command build instance:eX3-NESSI repository:nessi-2023.06-swl-deb10 architecture:aarch64/generic resulted in:

  • handling command build instance:AWS-MC-NESSI repository:nessi-2023.06-swl-deb11 architecture:zen2 resulted in:

    • no jobs were submitted
  • handling command build instance:AWS-MC-NESSI repository:nessi-2023.06-swl-deb11 architecture:x86_64/generic resulted in:

    • no jobs were submitted

@nessi-bot
Copy link

nessi-bot bot commented May 6, 2024

Updates by the bot instance Fram-NESSI (click for details)
  • received bot command build inst:Fram-NESSI repo:nessi-2023.06-swl-deb11 arch:x86_64/intel/broadwell from trz42

    • expanded format: build instance:Fram-NESSI repository:nessi-2023.06-swl-deb11 architecture:x86_64/intel/broadwell
  • received bot command build inst:Saga-NESSI repo:nessi-2023.06-swl-deb11 arch:x86_64/intel/skylake from trz42

    • expanded format: build instance:Saga-NESSI repository:nessi-2023.06-swl-deb11 architecture:x86_64/intel/skylake
  • received bot command build inst:eX3-NESSI repo:nessi-2023.06-swl-deb10 arch:aarch64/generic from trz42

    • expanded format: build instance:eX3-NESSI repository:nessi-2023.06-swl-deb10 architecture:aarch64/generic
  • received bot command build inst:AWS-MC-NESSI repo:nessi-2023.06-swl-deb11 arch:zen2 from trz42

    • expanded format: build instance:AWS-MC-NESSI repository:nessi-2023.06-swl-deb11 architecture:zen2
  • received bot command build inst:AWS-MC-NESSI repo:nessi-2023.06-swl-deb11 arch:x86_64/generic from trz42

    • expanded format: build instance:AWS-MC-NESSI repository:nessi-2023.06-swl-deb11 architecture:x86_64/generic
  • handling command build instance:Fram-NESSI repository:nessi-2023.06-swl-deb11 architecture:x86_64/intel/broadwell resulted in:

  • handling command build instance:Saga-NESSI repository:nessi-2023.06-swl-deb11 architecture:x86_64/intel/skylake resulted in:

    • no jobs were submitted
  • handling command build instance:eX3-NESSI repository:nessi-2023.06-swl-deb10 architecture:aarch64/generic resulted in:

    • no jobs were submitted
  • handling command build instance:AWS-MC-NESSI repository:nessi-2023.06-swl-deb11 architecture:zen2 resulted in:

    • no jobs were submitted
  • handling command build instance:AWS-MC-NESSI repository:nessi-2023.06-swl-deb11 architecture:x86_64/generic resulted in:

    • no jobs were submitted

@nessi-bot
Copy link

nessi-bot bot commented May 6, 2024

Updates by the bot instance Saga-NESSI (click for details)
  • received bot command build inst:Fram-NESSI repo:nessi-2023.06-swl-deb11 arch:x86_64/intel/broadwell from trz42

    • expanded format: build instance:Fram-NESSI repository:nessi-2023.06-swl-deb11 architecture:x86_64/intel/broadwell
  • received bot command build inst:Saga-NESSI repo:nessi-2023.06-swl-deb11 arch:x86_64/intel/skylake from trz42

    • expanded format: build instance:Saga-NESSI repository:nessi-2023.06-swl-deb11 architecture:x86_64/intel/skylake
  • received bot command build inst:eX3-NESSI repo:nessi-2023.06-swl-deb10 arch:aarch64/generic from trz42

    • expanded format: build instance:eX3-NESSI repository:nessi-2023.06-swl-deb10 architecture:aarch64/generic
  • received bot command build inst:AWS-MC-NESSI repo:nessi-2023.06-swl-deb11 arch:zen2 from trz42

    • expanded format: build instance:AWS-MC-NESSI repository:nessi-2023.06-swl-deb11 architecture:zen2
  • received bot command build inst:AWS-MC-NESSI repo:nessi-2023.06-swl-deb11 arch:x86_64/generic from trz42

    • expanded format: build instance:AWS-MC-NESSI repository:nessi-2023.06-swl-deb11 architecture:x86_64/generic
  • handling command build instance:Fram-NESSI repository:nessi-2023.06-swl-deb11 architecture:x86_64/intel/broadwell resulted in:

    • no jobs were submitted
  • handling command build instance:Saga-NESSI repository:nessi-2023.06-swl-deb11 architecture:x86_64/intel/skylake resulted in:

  • handling command build instance:eX3-NESSI repository:nessi-2023.06-swl-deb10 architecture:aarch64/generic resulted in:

    • no jobs were submitted
  • handling command build instance:AWS-MC-NESSI repository:nessi-2023.06-swl-deb11 architecture:zen2 resulted in:

    • no jobs were submitted
  • handling command build instance:AWS-MC-NESSI repository:nessi-2023.06-swl-deb11 architecture:x86_64/generic resulted in:

    • no jobs were submitted

@nessi-bot
Copy link

nessi-bot bot commented May 6, 2024

New job on instance Fram-NESSI for architecture x86_64-intel-broadwell for repository nessi-2023.06-swl-deb11 in job dir /cluster/projects/nn9992k/pilot.nessi.no/jobs/2024.05/pr_356/5792197

date job status comment
May 06 23:49:39 UTC 2024 submitted job id 5792197 awaits release by job manager
May 06 23:49:52 UTC 2024 released job awaits launch by Slurm scheduler
May 06 23:50:55 UTC 2024 running job 5792197 is running
May 07 00:12:39 UTC 2024 finished
😁 SUCCESS (click triangle for details)
Details
✅ job output file slurm-5792197.out
✅ no message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-x86_64-intel-broadwell-1715039791.tar.gzsize: 2067 MiB (2167671285 bytes)
entries: 5518
modules under 2023.06/software/linux/x86_64/intel/broadwell/modules/all
CUDA/12.1.1.lua
software under 2023.06/software/linux/x86_64/intel/broadwell/software
CUDA/12.1.1
other under 2023.06/software/linux/x86_64/intel/broadwell
no other files in tarball
May 07 00:12:39 UTC 2024 test result
😁 SUCCESS (click triangle for details)
ReFrame Summary
[ PASSED ] Ran 7/7 test case(s) from 7 check(s) (0 failure(s), 0 skipped, 0 aborted)
Details
✅ job output file slurm-5792197.out
✅ no message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case
May 07 04:15:34 UTC 2024 uploaded transfer of eessi-2023.06-software-linux-x86_64-intel-broadwell-1715039791.tar.gz to S3 bucket succeeded
May 07 04:22:09 AM UTC 2024 staged tarball eessi-2023.06-software-linux-x86_64-intel-broadwell-1715039791.tar.gz downloaded to Stratum-0
May 07 04:25:54 AM UTC 2024 pr_opened merge PR https://github.com/NorESSI/2024-staging/pull/288 to approve ingest
May 07 04:33:33 AM UTC 2024 approved 👍 tarball eessi-2023.06-software-linux-x86_64-intel-broadwell-1715039791.tar.gz approved, see PR https://github.com/NorESSI/2024-staging/pull/288
May 07 04:59:31 AM UTC 2024 ingested 🎉 tarball eessi-2023.06-software-linux-x86_64-intel-broadwell-1715039791.tar.gz successfully ingested at 2023.06/software/linux/x86_64/intel/broadwell/

@nessi-bot
Copy link

nessi-bot bot commented May 6, 2024

New job on instance eX3-NESSI for architecture aarch64-generic for repository nessi-2023.06-swl-deb10 in job dir /home/thomarob/pilot.nessi.no/jobs/2024.05/pr_356/189018

date job status comment
May 06 11:49:40 PM UTC 2024 submitted job id 189018 awaits release by job manager
May 06 11:50:20 PM UTC 2024 released job awaits launch by Slurm scheduler
May 06 11:51:26 PM UTC 2024 running job 189018 is running
May 07 12:31:00 AM UTC 2024 finished
😁 SUCCESS (click triangle for details)
Details
✅ job output file slurm-189018.out
✅ no message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-aarch64-generic-1715039869.tar.gzsize: 2010 MiB (2107799041 bytes)
entries: 3777
modules under 2023.06/software/linux/aarch64/generic/modules/all
CUDA/12.1.1.lua
software under 2023.06/software/linux/aarch64/generic/software
CUDA/12.1.1
other under 2023.06/software/linux/aarch64/generic
no other files in tarball
May 07 12:31:00 AM UTC 2024 test result
😁 SUCCESS (click triangle for details)
ReFrame Summary
[ PASSED ] Ran 7/7 test case(s) from 7 check(s) (0 failure(s), 0 skipped, 0 aborted)
Details
✅ job output file slurm-189018.out
✅ no message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case
May 07 04:15:26 AM UTC 2024 uploaded transfer of eessi-2023.06-software-linux-aarch64-generic-1715039869.tar.gz to S3 bucket succeeded
May 07 04:21:27 AM UTC 2024 staged tarball eessi-2023.06-software-linux-aarch64-generic-1715039869.tar.gz downloaded to Stratum-0
May 07 04:23:17 AM UTC 2024 pr_opened merge PR https://github.com/NorESSI/2024-staging/pull/285 to approve ingest
May 07 04:33:24 AM UTC 2024 approved 👍 tarball eessi-2023.06-software-linux-aarch64-generic-1715039869.tar.gz approved, see PR https://github.com/NorESSI/2024-staging/pull/285
May 07 04:40:05 AM UTC 2024 ingested 🎉 tarball eessi-2023.06-software-linux-aarch64-generic-1715039869.tar.gz successfully ingested at 2023.06/software/linux/aarch64/generic/

@nessi-bot
Copy link

nessi-bot bot commented May 6, 2024

New job on instance AWS-MC-NESSI for architecture x86_64-amd-zen2 for repository nessi-2023.06-swl-deb11 in job dir /project/def-nessi/SHARED/jobs/2024.05/pr_356/10244

date job status comment
May 06 23:49:41 UTC 2024 submitted job id 10244 awaits release by job manager
May 06 23:50:15 UTC 2024 released job awaits launch by Slurm scheduler
May 06 23:54:20 UTC 2024 running job 10244 is running
May 07 00:30:12 UTC 2024 finished
😁 SUCCESS (click triangle for details)
Details
✅ job output file slurm-10244.out
✅ no message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-x86_64-amd-zen2-1715040369.tar.gzsize: 2067 MiB (2167685563 bytes)
entries: 5518
modules under 2023.06/software/linux/x86_64/amd/zen2/modules/all
CUDA/12.1.1.lua
software under 2023.06/software/linux/x86_64/amd/zen2/software
CUDA/12.1.1
other under 2023.06/software/linux/x86_64/amd/zen2
no other files in tarball
May 07 00:30:12 UTC 2024 test result
😁 SUCCESS (click triangle for details)
ReFrame Summary
[ PASSED ] Ran 7/7 test case(s) from 7 check(s) (0 failure(s), 0 skipped, 0 aborted)
Details
✅ job output file slurm-10244.out
✅ no message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case
May 07 04:15:30 UTC 2024 uploaded transfer of eessi-2023.06-software-linux-x86_64-amd-zen2-1715040369.tar.gz to S3 bucket succeeded
May 07 04:21:42 AM UTC 2024 staged tarball eessi-2023.06-software-linux-x86_64-amd-zen2-1715040369.tar.gz downloaded to Stratum-0
May 07 04:24:09 AM UTC 2024 pr_opened merge PR https://github.com/NorESSI/2024-staging/pull/286 to approve ingest
May 07 04:33:27 AM UTC 2024 approved 👍 tarball eessi-2023.06-software-linux-x86_64-amd-zen2-1715040369.tar.gz approved, see PR https://github.com/NorESSI/2024-staging/pull/286
May 07 04:47:10 AM UTC 2024 ingested 🎉 tarball eessi-2023.06-software-linux-x86_64-amd-zen2-1715040369.tar.gz successfully ingested at 2023.06/software/linux/x86_64/amd/zen2/

@nessi-bot
Copy link

nessi-bot bot commented May 6, 2024

New job on instance AWS-MC-NESSI for architecture x86_64-generic for repository nessi-2023.06-swl-deb11 in job dir /project/def-nessi/SHARED/jobs/2024.05/pr_356/10245

date job status comment
May 06 23:49:44 UTC 2024 submitted job id 10245 awaits release by job manager
May 06 23:50:18 UTC 2024 released job awaits launch by Slurm scheduler
May 06 23:55:25 UTC 2024 running job 10245 is running
May 07 00:40:29 UTC 2024 finished
😁 SUCCESS (click triangle for details)
Details
✅ job output file slurm-10245.out
✅ no message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-x86_64-generic-1715040434.tar.gzsize: 2067 MiB (2167684460 bytes)
entries: 5518
modules under 2023.06/software/linux/x86_64/generic/modules/all
CUDA/12.1.1.lua
software under 2023.06/software/linux/x86_64/generic/software
CUDA/12.1.1
other under 2023.06/software/linux/x86_64/generic
no other files in tarball
May 07 00:40:29 UTC 2024 test result
😁 SUCCESS (click triangle for details)
ReFrame Summary
[ PASSED ] Ran 7/7 test case(s) from 7 check(s) (0 failure(s), 0 skipped, 0 aborted)
Details
✅ job output file slurm-10245.out
✅ no message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case
May 07 04:16:32 UTC 2024 uploaded transfer of eessi-2023.06-software-linux-x86_64-generic-1715040434.tar.gz to S3 bucket succeeded
May 07 04:21:55 AM UTC 2024 staged tarball eessi-2023.06-software-linux-x86_64-generic-1715040434.tar.gz downloaded to Stratum-0
May 07 04:25:03 AM UTC 2024 pr_opened merge PR https://github.com/NorESSI/2024-staging/pull/287 to approve ingest
May 07 04:33:30 AM UTC 2024 approved 👍 tarball eessi-2023.06-software-linux-x86_64-generic-1715040434.tar.gz approved, see PR https://github.com/NorESSI/2024-staging/pull/287
May 07 04:53:13 AM UTC 2024 ingested 🎉 tarball eessi-2023.06-software-linux-x86_64-generic-1715040434.tar.gz successfully ingested at 2023.06/software/linux/x86_64/generic/

@nessi-bot
Copy link

nessi-bot bot commented May 6, 2024

New job on instance Saga-NESSI for architecture x86_64-intel-skylake_avx512 for repository nessi-2023.06-swl-deb11 in job dir /cluster/projects/nn9992k/pilot.nessi.no/jobs/2024.05/pr_356/11415397

date job status comment
May 06 11:49:54 PM UTC 2024 submitted job id 11415397 awaits release by job manager
May 06 11:50:17 PM UTC 2024 released job awaits launch by Slurm scheduler
May 06 11:52:20 PM UTC 2024 running job 11415397 is running
May 07 12:38:51 AM UTC 2024 finished
😁 SUCCESS (click triangle for details)
Details
✅ job output file slurm-11415397.out
✅ no message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-x86_64-intel-skylake_avx512-1715040025.tar.gzsize: 2067 MiB (2167686409 bytes)
entries: 5518
modules under 2023.06/software/linux/x86_64/intel/skylake_avx512/modules/all
CUDA/12.1.1.lua
software under 2023.06/software/linux/x86_64/intel/skylake_avx512/software
CUDA/12.1.1
other under 2023.06/software/linux/x86_64/intel/skylake_avx512
no other files in tarball
May 07 12:38:51 AM UTC 2024 test result
😁 SUCCESS (click triangle for details)
ReFrame Summary
[ PASSED ] Ran 7/7 test case(s) from 7 check(s) (0 failure(s), 0 skipped, 0 aborted)
Details
✅ job output file slurm-11415397.out
✅ no message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case
May 07 04:15:32 AM UTC 2024 uploaded transfer of eessi-2023.06-software-linux-x86_64-intel-skylake_avx512-1715040025.tar.gz to S3 bucket succeeded
May 07 04:22:24 AM UTC 2024 staged tarball eessi-2023.06-software-linux-x86_64-intel-skylake_avx512-1715040025.tar.gz downloaded to Stratum-0
May 07 04:26:49 AM UTC 2024 pr_opened merge PR https://github.com/NorESSI/2024-staging/pull/289 to approve ingest
May 07 04:33:36 AM UTC 2024 approved 👍 tarball eessi-2023.06-software-linux-x86_64-intel-skylake_avx512-1715040025.tar.gz approved, see PR https://github.com/NorESSI/2024-staging/pull/289
May 07 05:05:57 AM UTC 2024 ingested 🎉 tarball eessi-2023.06-software-linux-x86_64-intel-skylake_avx512-1715040025.tar.gz successfully ingested at 2023.06/software/linux/x86_64/intel/skylake_avx512/

@trz42
Copy link
Collaborator Author

trz42 commented May 7, 2024

Checklist before starting deployment (setting bot:deploy label):

  • Check if the SPDX license identifier is provided
    • only a rebuild
  • Check whether builds for all required architectures succeed (SUCCESS message + reasonably sized tarball)
  • Check if the PR is up-to-date with the target branch nessi.no-2023.06 in the repository (if not what are the differences)
  • Assess if all requested changes are sound (checking files changed on GitHub.com)
  • Verify that all easyconfig/s being built are included with the EB version used (if not why not)
  • Review changes (if any) needed to get the build(s) succeed (common changes for all architectures, changes for a single architecture, changes because of build environment specifics, etc.)

@trz42
Copy link
Collaborator Author

trz42 commented May 7, 2024

Before we can ingest the tarballs, we need to manually remove the software packages and module files on the Stratum-0. The first step of that procedure is to test if the software is still available:

cd /cvmfs/pilot.nessi.no/versions/2023.06/software/linux ; \
  ls -l \
        x86_64/{generic,amd/zen2,intel/[bsc]*}/{software,modules/all}/CUDA/12.1.1* \
        aarch64/generic/{software,modules/all}/CUDA/12.1.1* ; \
cd -
log -rw-r--r--. 1 cvmfs cvmfs 2120 May 4 20:09 aarch64/generic/modules/all/CUDA/12.1.1.lua -rw-rw-r--. 1 cvmfs cvmfs 2120 May 4 20:14 x86_64/amd/zen2/modules/all/CUDA/12.1.1.lua -rw-rw-r--. 1 cvmfs cvmfs 2116 May 4 20:14 x86_64/generic/modules/all/CUDA/12.1.1.lua -rw-rw-r--. 1 cvmfs cvmfs 2148 May 4 20:11 x86_64/intel/broadwell/modules/all/CUDA/12.1.1.lua -rw-rw-r--. 1 cvmfs cvmfs 2168 May 4 20:10 x86_64/intel/skylake_avx512/modules/all/CUDA/12.1.1.lua

aarch64/generic/software/CUDA/12.1.1:
total 122
dr-xr-xr-x. 3 cvmfs cvmfs 4096 May 4 20:09 bin
dr-xr-xr-x. 4 cvmfs cvmfs 4096 May 4 20:09 compute-sanitizer
lrwxrwxrwx. 1 cvmfs cvmfs 102 May 4 20:09 DOCS -> /cvmfs/pilot.nessi.no/host_injections/2023.06/software/linux/aarch64/generic/software/CUDA/12.1.1/DOCS
dr-xr-xr-x. 3 cvmfs cvmfs 4096 May 4 20:09 easybuild
-r--r--r--. 1 cvmfs cvmfs 61498 May 4 20:09 EULA.txt
dr-xr-xr-x. 4 cvmfs cvmfs 4096 May 4 20:08 extras
lrwxrwxrwx. 1 cvmfs cvmfs 26 May 4 20:09 include -> targets/sbsa-linux/include
lrwxrwxrwx. 1 cvmfs cvmfs 5 May 4 20:09 lib -> lib64
lrwxrwxrwx. 1 cvmfs cvmfs 22 May 4 20:09 lib64 -> targets/sbsa-linux/lib
dr-xr-xr-x. 7 cvmfs cvmfs 4096 May 4 20:09 nsight-compute-2023.1.1
dr-xr-xr-x. 6 cvmfs cvmfs 4096 May 4 20:09 nsight-systems-2023.1.2
dr-xr-xr-x. 3 cvmfs cvmfs 4096 May 4 20:07 nvml
dr-xr-xr-x. 7 cvmfs cvmfs 4096 May 4 20:09 nvvm
dr-xr-xr-x. 2 cvmfs cvmfs 4096 May 4 20:09 pkgconfig
-r--r--r--. 1 cvmfs cvmfs 524 May 4 20:09 README
dr-xr-xr-x. 3 cvmfs cvmfs 4096 May 4 20:08 share
dr-xr-xr-x. 3 cvmfs cvmfs 4096 May 4 20:09 stubs
dr-xr-xr-x. 3 cvmfs cvmfs 4096 May 4 20:07 targets
dr-xr-xr-x. 2 cvmfs cvmfs 4096 May 4 20:09 tools

x86_64/amd/zen2/software/CUDA/12.1.1:
total 145
dr-xr-xr-x. 3 cvmfs cvmfs 4096 May 4 20:14 bin
dr-xr-xr-x. 5 cvmfs cvmfs 4096 May 4 20:14 compute-sanitizer
lrwxrwxrwx. 1 cvmfs cvmfs 102 May 4 20:14 DOCS -> /cvmfs/pilot.nessi.no/host_injections/2023.06/software/linux/x86_64/amd/zen2/software/CUDA/12.1.1/DOCS
dr-xr-xr-x. 3 cvmfs cvmfs 4096 May 4 20:14 easybuild
-r--r--r--. 1 cvmfs cvmfs 61498 May 4 20:14 EULA.txt
dr-xr-xr-x. 5 cvmfs cvmfs 4096 May 4 20:14 extras
dr-xr-xr-x. 6 cvmfs cvmfs 4096 May 4 20:13 gds
dr-xr-xr-x. 2 cvmfs cvmfs 4096 May 4 20:14 gds-12.1
lrwxrwxrwx. 1 cvmfs cvmfs 28 May 4 20:14 include -> targets/x86_64-linux/include
lrwxrwxrwx. 1 cvmfs cvmfs 5 May 4 20:14 lib -> lib64
lrwxrwxrwx. 1 cvmfs cvmfs 24 May 4 20:14 lib64 -> targets/x86_64-linux/lib
dr-xr-xr-x. 7 cvmfs cvmfs 4096 May 4 20:14 libnvvp
dr-xr-xr-x. 7 cvmfs cvmfs 4096 May 4 20:14 nsight-compute-2023.1.1
dr-xr-xr-x. 2 cvmfs cvmfs 4096 May 4 20:14 nsightee_plugins
dr-xr-xr-x. 6 cvmfs cvmfs 4096 May 4 20:13 nsight-systems-2023.1.2
dr-xr-xr-x. 3 cvmfs cvmfs 4096 May 4 20:13 nvml
dr-xr-xr-x. 7 cvmfs cvmfs 4096 May 4 20:14 nvvm
dr-xr-xr-x. 2 cvmfs cvmfs 4096 May 4 20:14 pkgconfig
-r--r--r--. 1 cvmfs cvmfs 524 May 4 20:14 README
dr-xr-xr-x. 3 cvmfs cvmfs 4096 May 4 20:13 share
dr-xr-xr-x. 2 cvmfs cvmfs 4096 May 4 20:14 src
dr-xr-xr-x. 3 cvmfs cvmfs 4096 May 4 20:14 stubs
dr-xr-xr-x. 3 cvmfs cvmfs 4096 May 4 20:13 targets
dr-xr-xr-x. 2 cvmfs cvmfs 4096 May 4 20:14 tools
lrwxrwxrwx. 1 cvmfs cvmfs 110 May 4 20:14 version.json -> /cvmfs/pilot.nessi.no/host_injections/2023.06/software/linux/x86_64/amd/zen2/software/CUDA/12.1.1/version.json

x86_64/generic/software/CUDA/12.1.1:
total 145
dr-xr-xr-x. 3 cvmfs cvmfs 4096 May 4 20:14 bin
dr-xr-xr-x. 5 cvmfs cvmfs 4096 May 4 20:14 compute-sanitizer
lrwxrwxrwx. 1 cvmfs cvmfs 101 May 4 20:14 DOCS -> /cvmfs/pilot.nessi.no/host_injections/2023.06/software/linux/x86_64/generic/software/CUDA/12.1.1/DOCS
dr-xr-xr-x. 3 cvmfs cvmfs 4096 May 4 20:14 easybuild
-r--r--r--. 1 cvmfs cvmfs 61498 May 4 20:14 EULA.txt
dr-xr-xr-x. 5 cvmfs cvmfs 4096 May 4 20:14 extras
dr-xr-xr-x. 6 cvmfs cvmfs 4096 May 4 20:13 gds
dr-xr-xr-x. 2 cvmfs cvmfs 4096 May 4 20:14 gds-12.1
lrwxrwxrwx. 1 cvmfs cvmfs 28 May 4 20:14 include -> targets/x86_64-linux/include
lrwxrwxrwx. 1 cvmfs cvmfs 5 May 4 20:14 lib -> lib64
lrwxrwxrwx. 1 cvmfs cvmfs 24 May 4 20:14 lib64 -> targets/x86_64-linux/lib
dr-xr-xr-x. 7 cvmfs cvmfs 4096 May 4 20:14 libnvvp
dr-xr-xr-x. 7 cvmfs cvmfs 4096 May 4 20:14 nsight-compute-2023.1.1
dr-xr-xr-x. 2 cvmfs cvmfs 4096 May 4 20:14 nsightee_plugins
dr-xr-xr-x. 6 cvmfs cvmfs 4096 May 4 20:13 nsight-systems-2023.1.2
dr-xr-xr-x. 3 cvmfs cvmfs 4096 May 4 20:12 nvml
dr-xr-xr-x. 7 cvmfs cvmfs 4096 May 4 20:14 nvvm
dr-xr-xr-x. 2 cvmfs cvmfs 4096 May 4 20:14 pkgconfig
-r--r--r--. 1 cvmfs cvmfs 524 May 4 20:14 README
dr-xr-xr-x. 3 cvmfs cvmfs 4096 May 4 20:13 share
dr-xr-xr-x. 2 cvmfs cvmfs 4096 May 4 20:14 src
dr-xr-xr-x. 3 cvmfs cvmfs 4096 May 4 20:14 stubs
dr-xr-xr-x. 3 cvmfs cvmfs 4096 May 4 20:12 targets
dr-xr-xr-x. 2 cvmfs cvmfs 4096 May 4 20:14 tools
lrwxrwxrwx. 1 cvmfs cvmfs 109 May 4 20:14 version.json -> /cvmfs/pilot.nessi.no/host_injections/2023.06/software/linux/x86_64/generic/software/CUDA/12.1.1/version.json

x86_64/intel/broadwell/software/CUDA/12.1.1:
total 145
dr-xr-xr-x. 3 cvmfs cvmfs 4096 May 4 20:11 bin
dr-xr-xr-x. 5 cvmfs cvmfs 4096 May 4 20:10 compute-sanitizer
lrwxrwxrwx. 1 cvmfs cvmfs 109 May 4 20:10 DOCS -> /cvmfs/pilot.nessi.no/host_injections/2023.06/software/linux/x86_64/intel/broadwell/software/CUDA/12.1.1/DOCS
dr-xr-xr-x. 3 cvmfs cvmfs 4096 May 4 20:11 easybuild
-r--r--r--. 1 cvmfs cvmfs 61498 May 4 20:10 EULA.txt
dr-xr-xr-x. 5 cvmfs cvmfs 4096 May 4 20:10 extras
dr-xr-xr-x. 6 cvmfs cvmfs 4096 May 4 20:09 gds
dr-xr-xr-x. 2 cvmfs cvmfs 4096 May 4 20:11 gds-12.1
lrwxrwxrwx. 1 cvmfs cvmfs 28 May 4 20:10 include -> targets/x86_64-linux/include
lrwxrwxrwx. 1 cvmfs cvmfs 5 May 4 20:10 lib -> lib64
lrwxrwxrwx. 1 cvmfs cvmfs 24 May 4 20:10 lib64 -> targets/x86_64-linux/lib
dr-xr-xr-x. 7 cvmfs cvmfs 4096 May 4 20:11 libnvvp
dr-xr-xr-x. 7 cvmfs cvmfs 4096 May 4 20:10 nsight-compute-2023.1.1
dr-xr-xr-x. 2 cvmfs cvmfs 4096 May 4 20:11 nsightee_plugins
dr-xr-xr-x. 6 cvmfs cvmfs 4096 May 4 20:10 nsight-systems-2023.1.2
dr-xr-xr-x. 3 cvmfs cvmfs 4096 May 4 20:08 nvml
dr-xr-xr-x. 7 cvmfs cvmfs 4096 May 4 20:10 nvvm
dr-xr-xr-x. 2 cvmfs cvmfs 4096 May 4 20:11 pkgconfig
-r--r--r--. 1 cvmfs cvmfs 524 May 4 20:10 README
dr-xr-xr-x. 3 cvmfs cvmfs 4096 May 4 20:10 share
dr-xr-xr-x. 2 cvmfs cvmfs 4096 May 4 20:11 src
dr-xr-xr-x. 3 cvmfs cvmfs 4096 May 4 20:10 stubs
dr-xr-xr-x. 3 cvmfs cvmfs 4096 May 4 20:08 targets
dr-xr-xr-x. 2 cvmfs cvmfs 4096 May 4 20:10 tools
lrwxrwxrwx. 1 cvmfs cvmfs 117 May 4 20:10 version.json -> /cvmfs/pilot.nessi.no/host_injections/2023.06/software/linux/x86_64/intel/broadwell/software/CUDA/12.1.1/version.json

x86_64/intel/skylake_avx512/software/CUDA/12.1.1:
total 145
dr-xr-xr-x. 3 cvmfs cvmfs 4096 May 4 20:10 bin
dr-xr-xr-x. 5 cvmfs cvmfs 4096 May 4 20:10 compute-sanitizer
lrwxrwxrwx. 1 cvmfs cvmfs 114 May 4 20:10 DOCS -> /cvmfs/pilot.nessi.no/host_injections/2023.06/software/linux/x86_64/intel/skylake_avx512/software/CUDA/12.1.1/DOCS
dr-xr-xr-x. 3 cvmfs cvmfs 4096 May 4 20:11 easybuild
-r--r--r--. 1 cvmfs cvmfs 61498 May 4 20:10 EULA.txt
dr-xr-xr-x. 5 cvmfs cvmfs 4096 May 4 20:10 extras
dr-xr-xr-x. 6 cvmfs cvmfs 4096 May 4 20:09 gds
dr-xr-xr-x. 2 cvmfs cvmfs 4096 May 4 20:10 gds-12.1
lrwxrwxrwx. 1 cvmfs cvmfs 28 May 4 20:10 include -> targets/x86_64-linux/include
lrwxrwxrwx. 1 cvmfs cvmfs 5 May 4 20:10 lib -> lib64
lrwxrwxrwx. 1 cvmfs cvmfs 24 May 4 20:10 lib64 -> targets/x86_64-linux/lib
dr-xr-xr-x. 7 cvmfs cvmfs 4096 May 4 20:10 libnvvp
dr-xr-xr-x. 7 cvmfs cvmfs 4096 May 4 20:10 nsight-compute-2023.1.1
dr-xr-xr-x. 2 cvmfs cvmfs 4096 May 4 20:10 nsightee_plugins
dr-xr-xr-x. 6 cvmfs cvmfs 4096 May 4 20:10 nsight-systems-2023.1.2
dr-xr-xr-x. 3 cvmfs cvmfs 4096 May 4 20:09 nvml
dr-xr-xr-x. 7 cvmfs cvmfs 4096 May 4 20:10 nvvm
dr-xr-xr-x. 2 cvmfs cvmfs 4096 May 4 20:10 pkgconfig
-r--r--r--. 1 cvmfs cvmfs 524 May 4 20:10 README
dr-xr-xr-x. 3 cvmfs cvmfs 4096 May 4 20:09 share
dr-xr-xr-x. 2 cvmfs cvmfs 4096 May 4 20:10 src
dr-xr-xr-x. 3 cvmfs cvmfs 4096 May 4 20:10 stubs
dr-xr-xr-x. 3 cvmfs cvmfs 4096 May 4 20:09 targets
dr-xr-xr-x. 2 cvmfs cvmfs 4096 May 4 20:10 tools
lrwxrwxrwx. 1 cvmfs cvmfs 122 May 4 20:10 version.json -> /cvmfs/pilot.nessi.no/host_injections/2023.06/software/linux/x86_64/intel/skylake_avx512/software/CUDA/12.1.1/version.json

And particularly the missing runtime (only a symbolic link currently)

cd /cvmfs/pilot.nessi.no/versions/2023.06/software/linux ; \
  ls -l \
        x86_64/{generic,amd/zen2,intel/[bs]*}/software/CUDA/12.1.1/lib/libcudart* \
        aarch64/generic/software/CUDA/12.1.1/lib/libcudart* ; \
cd -
log lrwxrwxrwx. 1 cvmfs cvmfs 15 May 4 20:07 aarch64/generic/software/CUDA/12.1.1/lib/libcudart.so -> libcudart.so.12 lrwxrwxrwx. 1 cvmfs cvmfs 21 May 4 20:08 aarch64/generic/software/CUDA/12.1.1/lib/libcudart.so.12 -> libcudart.so.12.1.105 lrwxrwxrwx. 1 cvmfs cvmfs 142 May 4 20:09 aarch64/generic/software/CUDA/12.1.1/lib/libcudart.so.12.1.105 -> /cvmfs/pilot.nessi.no/host_injections/2023.06/software/linux/aarch64/generic/software/CUDA/12.1.1/targets/sbsa-linux/lib/libcudart.so.12.1.105 -r--r--r--. 1 cvmfs cvmfs 1068368 May 4 20:07 aarch64/generic/software/CUDA/12.1.1/lib/libcudart_static.a lrwxrwxrwx. 1 cvmfs cvmfs 15 May 4 20:13 x86_64/amd/zen2/software/CUDA/12.1.1/lib/libcudart.so -> libcudart.so.12 lrwxrwxrwx. 1 cvmfs cvmfs 21 May 4 20:13 x86_64/amd/zen2/software/CUDA/12.1.1/lib/libcudart.so.12 -> libcudart.so.12.1.105 lrwxrwxrwx. 1 cvmfs cvmfs 144 May 4 20:14 x86_64/amd/zen2/software/CUDA/12.1.1/lib/libcudart.so.12.1.105 -> /cvmfs/pilot.nessi.no/host_injections/2023.06/software/linux/x86_64/amd/zen2/software/CUDA/12.1.1/targets/x86_64-linux/lib/libcudart.so.12.1.105 -r--r--r--. 1 cvmfs cvmfs 1176672 May 4 20:13 x86_64/amd/zen2/software/CUDA/12.1.1/lib/libcudart_static.a lrwxrwxrwx. 1 cvmfs cvmfs 15 May 4 20:12 x86_64/generic/software/CUDA/12.1.1/lib/libcudart.so -> libcudart.so.12 lrwxrwxrwx. 1 cvmfs cvmfs 21 May 4 20:13 x86_64/generic/software/CUDA/12.1.1/lib/libcudart.so.12 -> libcudart.so.12.1.105 lrwxrwxrwx. 1 cvmfs cvmfs 143 May 4 20:14 x86_64/generic/software/CUDA/12.1.1/lib/libcudart.so.12.1.105 -> /cvmfs/pilot.nessi.no/host_injections/2023.06/software/linux/x86_64/generic/software/CUDA/12.1.1/targets/x86_64-linux/lib/libcudart.so.12.1.105 -r--r--r--. 1 cvmfs cvmfs 1176672 May 4 20:12 x86_64/generic/software/CUDA/12.1.1/lib/libcudart_static.a lrwxrwxrwx. 1 cvmfs cvmfs 15 May 4 20:08 x86_64/intel/broadwell/software/CUDA/12.1.1/lib/libcudart.so -> libcudart.so.12 lrwxrwxrwx. 1 cvmfs cvmfs 21 May 4 20:09 x86_64/intel/broadwell/software/CUDA/12.1.1/lib/libcudart.so.12 -> libcudart.so.12.1.105 lrwxrwxrwx. 1 cvmfs cvmfs 151 May 4 20:11 x86_64/intel/broadwell/software/CUDA/12.1.1/lib/libcudart.so.12.1.105 -> /cvmfs/pilot.nessi.no/host_injections/2023.06/software/linux/x86_64/intel/broadwell/software/CUDA/12.1.1/targets/x86_64-linux/lib/libcudart.so.12.1.105 -r--r--r--. 1 cvmfs cvmfs 1176672 May 4 20:08 x86_64/intel/broadwell/software/CUDA/12.1.1/lib/libcudart_static.a lrwxrwxrwx. 1 cvmfs cvmfs 15 May 4 20:09 x86_64/intel/skylake_avx512/software/CUDA/12.1.1/lib/libcudart.so -> libcudart.so.12 lrwxrwxrwx. 1 cvmfs cvmfs 21 May 4 20:09 x86_64/intel/skylake_avx512/software/CUDA/12.1.1/lib/libcudart.so.12 -> libcudart.so.12.1.105 lrwxrwxrwx. 1 cvmfs cvmfs 156 May 4 20:10 x86_64/intel/skylake_avx512/software/CUDA/12.1.1/lib/libcudart.so.12.1.105 -> /cvmfs/pilot.nessi.no/host_injections/2023.06/software/linux/x86_64/intel/skylake_avx512/software/CUDA/12.1.1/targets/x86_64-linux/lib/libcudart.so.12.1.105 -r--r--r--. 1 cvmfs cvmfs 1176672 May 4 20:09 x86_64/intel/skylake_avx512/software/CUDA/12.1.1/lib/libcudart_static.a

Next we can remove it in an transaction

cd /cvmfs/pilot.nessi.no/versions/2023.06/software/linux ; \
  rm -rf \
        x86_64/{generic,amd/zen2,intel/[bsc]*}/{software,modules/all}/CUDA/12.1.1* \
        aarch64/generic/{software,modules/all}/CUDA/12.1.1* ; \
cd -

@trz42
Copy link
Collaborator Author

trz42 commented May 7, 2024

Target architectures

  • x86_64-intel-broadwell
  • x86_64-intel-skylake_avx512
  • x86_64-amd-zen2
  • x86_64-generic
  • aarch64-generic

Checklist for deployment/ingestion

  • at least one tarball for each architecture has been uploaded
  • at least one tarball for each architecture has been staged
  • for each staged tarball was a PR for approval/rejection opened
  • for each architecture one and only one tarball has been approved
  • SEPARATE MANUAL STEP TO REMOVE PACKAGES FROM STRATUM-0
  • for each architecture one and only one tarball was ingested
  • all tests in the CI have succeeded (rerun failed tests ~ 10 mins after last ingest)
    • one known to fail test fails (is ignored for now)
    • tests for missing software were succeeding (because software was rebuilt and only removed just before being ingested), are repeated
  • has the package become available via CernVM-FS for all architectures ingested (for lmod cache updates pay attention to timestamp of files)
command & log

command (particularly checking for previously missing libcudart*)

cd /cvmfs/pilot.nessi.no/versions/2023.06/software/linux ; \
ls -l \
      x86_64/{generic,amd/zen2,intel/[bs]*}/software/CUDA/12.1.1/lib/libcudart* \
      aarch64/generic/software/CUDA/12.1.1/lib/libcudart* ; \
cd -

log

lrwxrwxrwx. 1 cvmfs cvmfs      15 May  6 23:54 aarch64/generic/software/CUDA/12.1.1/lib/libcudart.so -> libcudart.so.12
lrwxrwxrwx. 1 cvmfs cvmfs      21 May  6 23:55 aarch64/generic/software/CUDA/12.1.1/lib/libcudart.so.12 -> libcudart.so.12.1.105
-r-xr-xr-x. 1 cvmfs cvmfs  662696 May  6 23:55 aarch64/generic/software/CUDA/12.1.1/lib/libcudart.so.12.1.105
-r--r--r--. 1 cvmfs cvmfs 1068368 May  6 23:54 aarch64/generic/software/CUDA/12.1.1/lib/libcudart_static.a
lrwxrwxrwx. 1 cvmfs cvmfs      15 May  7 00:03 x86_64/amd/zen2/software/CUDA/12.1.1/lib/libcudart.so -> libcudart.so.12
lrwxrwxrwx. 1 cvmfs cvmfs      21 May  7 00:03 x86_64/amd/zen2/software/CUDA/12.1.1/lib/libcudart.so.12 -> libcudart.so.12.1.105
-r-xr-xr-x. 1 cvmfs cvmfs  679264 May  7 00:03 x86_64/amd/zen2/software/CUDA/12.1.1/lib/libcudart.so.12.1.105
-r--r--r--. 1 cvmfs cvmfs 1176672 May  7 00:03 x86_64/amd/zen2/software/CUDA/12.1.1/lib/libcudart_static.a
lrwxrwxrwx. 1 cvmfs cvmfs      15 May  7 00:04 x86_64/generic/software/CUDA/12.1.1/lib/libcudart.so -> libcudart.so.12
lrwxrwxrwx. 1 cvmfs cvmfs      21 May  7 00:04 x86_64/generic/software/CUDA/12.1.1/lib/libcudart.so.12 -> libcudart.so.12.1.105
-r-xr-xr-x. 1 cvmfs cvmfs  679264 May  7 00:04 x86_64/generic/software/CUDA/12.1.1/lib/libcudart.so.12.1.105
-r--r--r--. 1 cvmfs cvmfs 1176672 May  7 00:04 x86_64/generic/software/CUDA/12.1.1/lib/libcudart_static.a
lrwxrwxrwx. 1 cvmfs cvmfs      15 May  6 23:54 x86_64/intel/broadwell/software/CUDA/12.1.1/lib/libcudart.so -> libcudart.so.12
lrwxrwxrwx. 1 cvmfs cvmfs      21 May  6 23:54 x86_64/intel/broadwell/software/CUDA/12.1.1/lib/libcudart.so.12 -> libcudart.so.12.1.105
-r-xr-xr-x. 1 cvmfs cvmfs  679264 May  6 23:54 x86_64/intel/broadwell/software/CUDA/12.1.1/lib/libcudart.so.12.1.105
-r--r--r--. 1 cvmfs cvmfs 1176672 May  6 23:54 x86_64/intel/broadwell/software/CUDA/12.1.1/lib/libcudart_static.a
lrwxrwxrwx. 1 cvmfs cvmfs      15 May  6 23:57 x86_64/intel/skylake_avx512/software/CUDA/12.1.1/lib/libcudart.so -> libcudart.so.12
lrwxrwxrwx. 1 cvmfs cvmfs      21 May  6 23:58 x86_64/intel/skylake_avx512/software/CUDA/12.1.1/lib/libcudart.so.12 -> libcudart.so.12.1.105
-r-xr-xr-x. 1 cvmfs cvmfs  679264 May  6 23:58 x86_64/intel/skylake_avx512/software/CUDA/12.1.1/lib/libcudart.so.12.1.105
-r--r--r--. 1 cvmfs cvmfs 1176672 May  6 23:57 x86_64/intel/skylake_avx512/software/CUDA/12.1.1/lib/libcudart_static.a 

@trz42 trz42 requested a review from TopRichard May 7, 2024 05:06
@trz42 trz42 added the ingested label May 7, 2024
Copy link

@TopRichard TopRichard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good

@TopRichard TopRichard merged commit 88ad8e9 into NorESSI:nessi.no-2023.06 May 7, 2024
25 of 26 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants