Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CSD3] Update configurations and Spack environments #247

Merged
merged 22 commits into from
Dec 13, 2023

Conversation

mirenradia
Copy link
Contributor

This PR updates the ReFrame configurations for CSD3 and the corresponding Spack environments. Furthermore it also adds the new Sapphire Rapids partition.

Some key changes are explained below.

Operating System

On CSD3, we have 2 different OSs:

  • CentOS 7 - a rebuild of Red Hat Enterprise Linux (RHEL) 7
  • Rocky Linux 8 - a rebuild of RHEL 8

The Cascade Lake partition run on CentOS 7 and all other partitions (i.e. Ice Lakes and Sapphire Rapids) run on Rocky Linux 8. In order to avoid bugs and library incompatibilities, it is important to use the software stack built for the correct OS. One of the main problems with the current csd3-icelake Spack environment is that there are lots of external packages built for CentOS 7/RHEL 7 and as a result I experienced segfaults on the example Sombrero benchmark with the default Spack spec. I have therefore completely recreated it from scratch and only included packages built for Rocky Linux 8.

In order to stress the different OSs to users of this repo on CSD3, I have renamed the existing systems as follows

  • csd3-cascadelake/compute-node -> csd3-centos7/cascadelake
  • csd3-icelake/compute-node -> csd3-rocky8/icelake

Note that there are two sets of login nodes login-p-[0-4] that are running Cent OS 7 and login-q-[0-4] that are running Rocky Linux 8.

Sapphire Rapids Partition

This currently uses the same software stack and OS as the Ice Lakes hence the common compilers.yaml and packages.yaml for the Spack environment. The only difference in the Spack environments is the specified target.

The Sapphire Rapids partition has been added as a new partition (in the ReFrame sense) under the csd3-rocky8 system.

Preferred compilers and MPI implementation

I have set the preferred compiler to intel for all Spack environments and the preferred MPI implementation to intel-mpi (for CentOS 7) or intel-oneapi-mpi (for Rocky Linux 8). These are both in the default modules that are loaded on CSD3 and we generally observe better performance than with gcc/openmpi.

Reference values

I have updated some of the existing reference values for the benchmarks.

I have tested this PR with the following benchmarks:

  • grid
  • hpl
  • ramses
  • sombrero
  • sphng
  • swift
  • trove-pdsyev
  • trove

There are still a few minor changes I wish to make before asking for a review hence why I'm making this a draft PR for now.

Copy link
Member

@giordano giordano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't tested them myself, but the changes look good, thanks a lot!

Copy link
Member

@giordano giordano Dec 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Side note, I wish we could automate the generation of the input file, but at the moment we don't know the memory available on the compute nodes (it could be an extra custom property of partitions, if we wanted to), but there's also the problem that "looking up an ever-changing table on the Intel website" isn't exactly my idea of automation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that it would be nice to do this but don't think there's a good solution...

Comment on lines -189 to +335
- spec: [email protected]%intel+mpi^intel-mpi
- spec: [email protected] +mpi^intel-mpi
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I presume the point of %intel was not to allow using this with other libraries built with gcc, I may have had troubles trying this combination

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, for some reason Spack refused to used this external package (when using the intel compiler and intel-mpi) until I made this change. I didn't really understand it since it didn't seem to be a problem for any other external packages with compilers in their specs 🤷

benchmarks/reframe_config.py Show resolved Hide resolved
Most of the Spack compilers and external packages are
RHEL/Rocky Linux/Scientific Linux 7 (for the Cascade Lakes) and have issues on
the Icelakes (running Rocky Linux 8) hence why this resets the
environment to a clean slate.
Since we have two different OSs on CSD3, CentOS 7 and Rocky Linux 8, it
is important to only use software built for the correct OS in order to
avoid issues.
This adds some external Spack packages provided by the OS and specifies
a preferred compiler and MPI implementation.
Note that there is currently a common software stack for the Ice Lakes
and Sapphire Rapids and both partitions use the same OS (Rocky Linux 8).
Also allow the concretizer to target sapphirerapids when building on the
icelake login nodes.
Set the environment variables explicitly rather than using the
rhel?/default-* modules so as to avoid any environment pollution from
default modules.
Always request all of the memory on a node as `--exclusive` does not
imply this according to the current SLURM documentation
(https://slurm.schedmd.com/sbatch.html#OPT_exclusive). This can make a
difference to benchmark results when not using all of the cores on a
node due to the increased memory bandwidth available.

Also increase the job_submit_timeout as the SLURM controller can be a
bit slow on CSD3.
Note that these values are when it is built with the Intel Classic
Compiler which is preferred in the Spack environments.
@mirenradia mirenradia force-pushed the enhancement/update-csd3 branch from 7c18005 to 28b7173 Compare December 7, 2023 15:48
@mirenradia mirenradia marked this pull request as ready for review December 7, 2023 15:49
@mirenradia
Copy link
Contributor Author

I've rebased the changes onto main. This is now ready to be merged.

Copy link
Member

@giordano giordano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please sort alphabetically packages in the spack configurations? That makes easier to see what's in there.

benchmarks/spack/csd3-centos7/cascadelake/spack.yaml Outdated Show resolved Hide resolved
Also remove [email protected] from cascadelake environment as it is
deprecated.
@mirenradia
Copy link
Contributor Author

Can you please sort alphabetically packages in the spack configurations? That makes easier to see what's in there.

It would be nice if spack external find did this automatically. In any case, I've done it now (by hand 🙈).

@mirenradia mirenradia requested a review from giordano December 12, 2023 11:15
Though this can improve performance, it can also lead to problems when
submitting some jobs so leave it up to the user to add this if they
want.
Copy link
Member

@giordano giordano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a comment about indentation, but rest looks good, thanks again!

It would be nice if spack external find did this automatically.

Agreed!

benchmarks/examples/sombrero/sombrero.py Outdated Show resolved Hide resolved
Co-authored-by: Mosè Giordano <[email protected]>
@giordano giordano merged commit 6496422 into ukri-excalibur:main Dec 13, 2023
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants