Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EESSI bash initialization to module file #667

Conversation

TopRichard
Copy link
Collaborator

@TopRichard TopRichard commented Aug 12, 2024

This is a follow up PR to the issue: https://gitlab.com/eessi/support/-/issues/83

The CI tests are intended to evaluate the following:

Check the archdetect function works when there are no valid paths
Check the archdetect function works (by setting a value with valid path in the middle), Load the module and check the existence and values a few expected variables

Copy link

eessi-bot bot commented Aug 12, 2024

Instance eessi-bot-mc-aws is configured to build for:

  • architectures: x86_64/generic, x86_64/intel/haswell, x86_64/intel/skylake_avx512, x86_64/amd/zen2, x86_64/amd/zen3, aarch64/generic, aarch64/neoverse_n1, aarch64/neoverse_v1
  • repositories: eessi.io-2023.06-compat, eessi-hpc.org-2023.06-software, eessi-hpc.org-2023.06-compat, eessi.io-2023.06-software

Instance boegel-bot-deucalion is configured to build for:

  • architectures: aarch64/a64fx
  • repositories: eessi.io-2023.06-software

Copy link

eessi-bot bot commented Aug 12, 2024

Instance eessi-bot-mc-azure is configured to build for:

  • architectures: x86_64/amd/zen4
  • repositories: eessi.io-2023.06-compat, eessi-hpc.org-2023.06-compat, eessi-hpc.org-2023.06-software, eessi.io-2023.06-software

@TopRichard TopRichard added the 2023.06-software.eessi.io 2023.06 version of software.eessi.io label Aug 12, 2024
Copy link
Member

@ocaisa ocaisa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments from a visual review, will take it for a test drive.

We should also add CI to test this module file (can be done using the EESSI GitHub Action)

init/modules/EESSI/2023.06.lua Outdated Show resolved Hide resolved
init/modules/EESSI/2023.06.lua Outdated Show resolved Hide resolved
init/modules/EESSI/2023.06.lua Outdated Show resolved Hide resolved
init/modules/EESSI/2023.06.lua Outdated Show resolved Hide resolved
init/modules/EESSI/2023.06.lua Outdated Show resolved Hide resolved
init/modules/EESSI/2023.06.lua Show resolved Hide resolved
init/modules/EESSI/2023.06.lua Outdated Show resolved Hide resolved
@ocaisa
Copy link
Member

ocaisa commented Aug 12, 2024

I'm not sure if it is really a concern, but we add an Lmod family() related to the EESSI version as well so that we have a way to manage "compatible" modules (for example module files related to dev.eessi.io)

@ocaisa
Copy link
Member

ocaisa commented Aug 12, 2024

This also doesn't cover the cURL issue currently fixed in our initialisation script:

	  rhel_libcurl_file="/etc/pki/tls/certs/ca-bundle.crt"
          if [ -f $rhel_libcurl_file ]; then
            show_msg "Found libcurl CAs file at RHEL location, setting CURL_CA_BUNDLE"
            export CURL_CA_BUNDLE=$rhel_libcurl_file
          fi

You can use isFile() to give you the same logic

@ocaisa
Copy link
Member

ocaisa commented Aug 12, 2024

The one other thing missing is the current redirection we do for Zen4:

# use x86_64/amd/zen3 for now when AMD Genoa (Zen4) CPU is detected,
# since optimized software installations for Zen4 are a work-in-progress,
# see https://gitlab.com/eessi/support/-/issues/37
if [[ "${EESSI_SOFTWARE_SUBDIR}" == "x86_64/amd/zen4" ]]; then
if [ -z $EESSI_SOFTWARE_SUBDIR_OVERRIDE ]; then
export EESSI_SOFTWARE_SUBDIR="x86_64/amd/zen3"
echo -e "\e[33mSticking to ${EESSI_SOFTWARE_SUBDIR} for now, since optimized installations for AMD Genoa (Zen4) are a work in progress, see https://gitlab.com/eessi/support/-/issues/37 for more information\e[0m"
fi
fi

@ocaisa
Copy link
Member

ocaisa commented Aug 12, 2024

A quick test, including my suggested changes, shows that this is missing EESSI_CPU_FAMILY and EPREFIX. These seems to be the only ones of consequence

@ocaisa
Copy link
Member

ocaisa commented Aug 12, 2024

Here's the one that works for me (but still misses EPREFIX and EESSI_CPU_FAMILY and the override for Zen4) and includes the cert bundle check and PS1 change:

help([[
Description
===========
The European Environment for Scientific Software Installations (EESSI, pronounced as easy) is a collaboration between different European partners in HPC community.The goal of this project is to build a common stack of scientific software installations for HPC systems and beyond, including laptops, personal workstations and cloud infrastructure. 

More information
================
 - URL: https://www.eessi.io/docs/
]])
whatis("Description: The European Environment for Scientific Software Installations (EESSI, pronounced as easy) is a collaboration between different European partners in HPC community. The goal of this project is to build a common stack of scientific software installations for HPC systems and beyond, including laptops, personal workstations and cloud infrastructure.")
whatis("URL: https://www.eessi.io/docs/:")

local eessi_version = myModuleVersion()
local eessi_repo = "/cvmfs/software.eessi.io"
local eessi_prefix = pathJoin(eessi_repo, "versions", eessi_version)
local eessi_os_type = "linux"
pushenv("EESSI_VERSION", eessi_version)
pushenv("EESSI_CVMFS_REPO", eessi_repo)
pushenv("EESSI_OS_TYPE", eessi_os_type)
function archdetect_cpu()
    local script = pathJoin(eessi_prefix, 'init', 'lmod_eessi_archdetect_wrapper.sh')
    if not os.getenv("EESSI_ARCHDETECT_OPTIONS") then
        if convertToCanonical(LmodVersion()) < convertToCanonical("8.6") then
            LmodError("Loading this modulefile requires using Lmod version > 8.6, but you can export EESSI_ARCHDETECT_OPTIONS to the available cpu architecture in the form of: x86_64/intel/haswell or aarch64/neoverse_v1")
        end
        source_sh("bash", script)
    end
    for archdetect_filter_cpu in string.gmatch(os.getenv("EESSI_ARCHDETECT_OPTIONS"), "([^" .. ":" .. "]+)") do
        if isDir(pathJoin(eessi_prefix, "software", eessi_os_type, archdetect_filter_cpu, "software")) then
            return archdetect_filter_cpu
        end
    end
    LmodError("Software directory check for the detected architecture failed")
end
local archdetect = archdetect_cpu()
local eessi_cpu_family = archdetect:match("([^/]+)")
local eessi_software_subdir = os.getenv("EESSI_SOFTWARE_SUBDIR_OVERRIDE") or archdetect
local eessi_eprefix = pathJoin(eessi_prefix, "compat", eessi_os_type, eessi_cpu_family)
local eessi_software_path = pathJoin(eessi_prefix, "software", eessi_os_type, eessi_software_subdir)
local eessi_module_path = pathJoin(eessi_software_path, "modules", "all")
local eessi_site_module_path = string.gsub(eessi_module_path, "versions", "host_injections")
pushenv("EESSI_SITE_MODULEPATH", eessi_site_module_path)
pushenv("EESSI_SOFTWARE_SUBDIR", eessi_software_subdir)
pushenv("EESSI_PREFIX", eessi_prefix)
pushenv("EESSI_EPREFIX", eessi_eprefix)
prepend_path("PATH", pathJoin(eessi_eprefix, "bin"))
prepend_path("PATH", pathJoin(eessi_eprefix, "usr/bin"))
pushenv("EESSI_SOFTWARE_PATH", eessi_software_path)
pushenv("EESSI_MODULEPATH", eessi_module_path)
prepend_path("MODULEPATH", eessi_module_path)
prepend_path("MODULEPATH", eessi_site_module_path)
pushenv("LMOD_CONFIG_DIR", pathJoin(eessi_software_path, ".lmod"))
pushenv("LMOD_PACKAGE_PATH", pathJoin(eessi_software_path, ".lmod"))
-- update the prompt (unless overridden)
if not os.getenv("EESSI_RETAIN_PROMPT") then
  pushenv("PS1", "{EESSI " .. eessi_version .. "} " .. (os.getenv("PS1") or ""))
end
-- check for RHEL certificate locatioin
local rhel_certificates = "/etc/pki/tls/certs/ca-bundle.crt"
if isFile(rhel_certificates) then
  pushenv("CURL_CA_BUNDLE", rhel_certificates)
end 
if mode() == "load" then
    LmodMessage("EESSI/" .. eessi_version .. " loaded successfully")
end

@MaKaNu
Copy link
Contributor

MaKaNu commented Aug 12, 2024

Setting the PS1 only would create difficulties for prompts, which don't use the PS1 Variable (Starship, OhMyPosh, etc).
An easy solution could be the following

if not os.getenv("EESSI_RETAIN_PROMPT") then
-  pushenv("PS1", "{EESSI " .. eessi_version .. "} " .. (os.getenv("PS1") or ""))
+ local eessi_prompt = "{EESSI " .. eessi_version .. "}"
+ pushenv("PS1", eessi_prompt .. (os.getenv("PS1") or ""))
+ pushenv("EESSI_PROMPT", eessi_prompt)
end

But the os.getenv("PS1") still results in nil and so the original prompt will be gone. I have no clue why the variable could not be read by lua, since the bash script has no issues with it.

Added zen4 redirection
@TopRichard
Copy link
Collaborator Author

TopRichard commented Aug 12, 2024

The one other thing missing is the current redirection we do for Zen4:

# use x86_64/amd/zen3 for now when AMD Genoa (Zen4) CPU is detected,
# since optimized software installations for Zen4 are a work-in-progress,
# see https://gitlab.com/eessi/support/-/issues/37
if [[ "${EESSI_SOFTWARE_SUBDIR}" == "x86_64/amd/zen4" ]]; then
if [ -z $EESSI_SOFTWARE_SUBDIR_OVERRIDE ]; then
export EESSI_SOFTWARE_SUBDIR="x86_64/amd/zen3"
echo -e "\e[33mSticking to ${EESSI_SOFTWARE_SUBDIR} for now, since optimized installations for AMD Genoa (Zen4) are a work in progress, see https://gitlab.com/eessi/support/-/issues/37 for more information\e[0m"
fi
fi

Handled in the commit above

@ocaisa
Copy link
Member

ocaisa commented Aug 12, 2024

Setting the PS1 only would create difficulties for prompts, which don't use the PS1 Variable (Starship, OhMyPosh, etc). An easy solution could be the following

if not os.getenv("EESSI_RETAIN_PROMPT") then
-  pushenv("PS1", "{EESSI " .. eessi_version .. "} " .. (os.getenv("PS1") or ""))
+ local eessi_prompt = "{EESSI " .. eessi_version .. "}"
+ pushenv("PS1", eessi_prompt .. (os.getenv("PS1") or ""))
+ pushenv("EESSI_PROMPT", eessi_prompt)
end

But the os.getenv("PS1") still results in nil and so the original prompt will be gone. I have no clue why the variable could not be read by lua, since the bash script has no issues with it.

We can only set PS1 if it is in the environment already

if os.getenv("PS1") and not os.getenv("EESSI_RETAIN_PROMPT") then

os.getenv() returns nil if a variable is not set

@MaKaNu
Copy link
Contributor

MaKaNu commented Aug 12, 2024

We can only set PS1 if it is in the environment already

if os.getenv("PS1") and not os.getenv("EESSI_RETAIN_PROMPT") then

That's the Thing I don't get. In my test scenario, PS1 was set and was also readable by the bash script, but os.getenv("PS1") returns the nil.

I believe the problem is that the PS1 variable can't be exported, so it's not available in the module. Or at least the basic ubuntu PS1=${debian_chroot:+($debian_chroot)}\u@\h:\w\$ seems to be an issue. If I provide a custom PS1 the module appends as expected.

EDIT: The issue is that directly exporting doesn't work, if I put export PS1=$PS1 in front of module load it works as expeceted.

@bedroge
Copy link
Collaborator

bedroge commented Aug 30, 2024

@bedroge Do we need any trick here to ensure that we pick up the Lmod cache created by EESSI (and it doesn't get rebuilt by spider)? I see you have a few clever tricks in #68 (comment) :

if ( mode() ~= "spider" ) then
  prepend_path("MODULEPATH", pathJoin(eessi_software_path, eessi_module_subdir, classes[i]))
end
-- Set path to the Lmod cache of the the EESSI stack.
prepend_path("LMOD_RC", pathJoin(eessi_software_path, "/.lmod/lmodrc.lua"))

We indeed set LMOD_RC in our EESSI modulefile, but it looks like setting LMOD_CONFIG_DIR to a directory that contains a file lmodrc.lua should also work, see https://lmod.readthedocs.io/en/latest/145_properties.html#the-properties-file-lmodrc-lua.

Tested it on my laptop:

$ export LMOD_CONFIG_DIR=/cvmfs/software.eessi.io/versions/2023.06/software/linux/x86_64/intel/haswell/.lmod/
$ module --config
Active RC file(s):
------------------
/usr/share/lmod/lmod/libexec/../init/lmodrc.lua
/cvmfs/software.eessi.io/versions/2023.06/software/linux/x86_64/intel/haswell/.lmod/lmodrc.lua


Cache Directory                                                                            Time Stamp File
---------------                                                                            ---------------
/cvmfs/software.eessi.io/versions/2023.06/software/linux/x86_64/intel/haswell/.lmod/cache  /cvmfs/software.eessi.io/versions/2023.06/software/linux/
x86_64/intel/haswell/.lmod/cache/timestamp

@ocaisa
Copy link
Member

ocaisa commented Aug 30, 2024

I'm not sure we should be setting LMOD_CONFIG_DIR, I think that may be too intrusive if people are using their own Lmod

@bedroge
Copy link
Collaborator

bedroge commented Aug 30, 2024

I'm not sure we should be setting LMOD_CONFIG_DIR, I think that may be too intrusive if people are using their own Lmod

True. I'm not sure if it can contain multiple paths (then we could prepend to it), but that's definitely possible with LMOD_RC.

@MaKaNu
Copy link
Contributor

MaKaNu commented Aug 30, 2024

I'm not sure we should be setting LMOD_CONFIG_DIR, I think that may be too intrusive if people are using their own Lmod

If prepending is an Option, is this part of the module or do we need this in the init scripts?

@TopRichard
Copy link
Collaborator Author

TopRichard commented Aug 30, 2024

I'm not sure we should be setting LMOD_CONFIG_DIR, I think that may be too intrusive if people are using their own Lmod

True. I'm not sure if it can contain multiple paths (then we could prepend to it), but that's definitely possible with LMOD_RC.

EESSI LMOD_CONFIG_DIR is set when we source the bash script no ? and are we supposed to compare the env variables specifically EESSI and LMOD variables which existed using sourcing the bash script to what we get with EESSImodule ?

@ocaisa
Copy link
Member

ocaisa commented Aug 31, 2024

The difference now is that we are allowing the case where people are using their own Lmod, we should only make the minimal changes to the environment we need to gets things working as we expect. That means using our spider caches and setting up our hooks. I think @casparvl has a good overview of this somewhere in a PR (see #491 (comment))

To prevent error bad argument #1 to 'gmatch' when module show is used and EESSI_ARCHDETECT_OPTIONS is not set
init/modules/EESSI/2023.06.lua Outdated Show resolved Hide resolved
Copy link
Member

@ocaisa ocaisa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for all the effort!

@ocaisa ocaisa dismissed trz42’s stale review September 5, 2024 08:59

We'll deploy this and iterate from there

@ocaisa
Copy link
Member

ocaisa commented Sep 5, 2024

bot: build repo:eessi.io-2023.06-software arch:x86_64/generic

Copy link

eessi-bot bot commented Sep 5, 2024

Updates by the bot instance eessi-bot-mc-aws (click for details)
  • received bot command build repo:eessi.io-2023.06-software arch:x86_64/generic from ocaisa

    • expanded format: build repository:eessi.io-2023.06-software architecture:x86_64/generic
  • handling command build repository:eessi.io-2023.06-software architecture:x86_64/generic resulted in:

Copy link

eessi-bot bot commented Sep 5, 2024

Updates by the bot instance eessi-bot-mc-azure (click for details)
  • received bot command build repo:eessi.io-2023.06-software arch:x86_64/generic from ocaisa

    • expanded format: build repository:eessi.io-2023.06-software architecture:x86_64/generic
  • handling command build repository:eessi.io-2023.06-software architecture:x86_64/generic resulted in:

    • no jobs were submitted

Copy link

eessi-bot bot commented Sep 5, 2024

New job on instance eessi-bot-mc-aws for architecture x86_64-generic for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2024.09/pr_667/17813

date job status comment
Sep 05 09:00:34 UTC 2024 submitted job id 17813 awaits release by job manager
Sep 05 09:01:28 UTC 2024 released job awaits launch by Slurm scheduler
Sep 05 09:06:31 UTC 2024 running job 17813 is running
Sep 05 09:25:59 UTC 2024 finished
😁 SUCCESS (click triangle for details)
Details
✅ job output file slurm-17813.out
✅ no message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-x86_64-generic-1725527197.tar.gzsize: 0 MiB (1522 bytes)
entries: 1
modules under 2023.06/software/linux/x86_64/generic/modules/all
no module files in tarball
software under 2023.06/software/linux/x86_64/generic/software
no software packages in tarball
other under 2023.06/software/linux/x86_64/generic
2023.06/init/modules/EESSI/2023.06.lua
Sep 05 09:25:59 UTC 2024 test result
😁 SUCCESS (click triangle for details)
ReFrame Summary
[ PASSED ] Ran 18/18 test case(s) from 18 check(s) (0 failure(s), 0 skipped, 0 aborted)
Details
✅ job output file slurm-17813.out
✅ no message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case
Sep 05 09:49:41 UTC 2024 uploaded transfer of eessi-2023.06-software-linux-x86_64-generic-1725527197.tar.gz to S3 bucket succeeded

@bedroge bedroge added the bot:deploy Ask bot to deploy missing software installations to EESSI label Sep 5, 2024
@bedroge
Copy link
Collaborator

bedroge commented Sep 5, 2024

This has been ingested, so I'll merge the PR.
EESSI/filesystem-layer#198 adds a symlink to this new modulefile in a top-level dir init/modules/EESSI.

@bedroge bedroge merged commit b8946ee into EESSI:2023.06-software.eessi.io Sep 5, 2024
34 checks passed
@boegel boegel mentioned this pull request Sep 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2023.06-software.eessi.io 2023.06 version of software.eessi.io bot:deploy Ask bot to deploy missing software installations to EESSI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants