You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Different sites may require different srun invocation options in order to correctly spawn the tasks using the SLURM scheduler. On Cray systems, for instance, there is the need to set the gres option craynetwork to 0 (--gres=craynetwork:0) when running non-MPI-based workflows and to 1, when running MPI-based workflows (--gres=craynetwork:1). At CSCS, we have patched GREASY source code in order to allow for these custom configurations.
I wonder what you think about adding environment variables to act as hooks in certain parts of the code in order to allow these types of customizations.
My proposal is to add the following environment flags
GREASY_CUSTOM_INIT_CMD, default value "". This would allow custom init commands, such as nvidia-mps init. In our case, the variable would be set to "if [ ! -z ${CRAY_CUDA_MPS+x} ]; ...";
GREASY_ADDITIONAL_SRUN_FLAGS, default value "". This would allow additional srun cli flags, such as --gres=craynetwork:0.
GREASY_EMIT_SRUN_MEMORY_VALUE, default value "0". Possible values are 0 or 1. This would turn on and off the memory requrest on srun.
GREASY_EMIT_SRUN_CPUS_PER_TASK, default value "1". Possible values are 0 or 1. This would turn on and off the -c cli option for srun.
I think that all of these flags could be set as using basic bash inside the greasy.in file because each site has to customize its own greasy.in file anyway. So, different sites would only need to maintain this "configuration" patch. As an example, LUMI will require a different set of patches compared to Piz Daint, even though it is a cray machine. These entry points would allow us to maintain only a single greasy.in file for daint and another one for LUMI.
Please let me know what you think so that I can make the appropriate PRs.
The text was updated successfully, but these errors were encountered:
They seem really interesting changes to flexibilice the usage of Greasy on different systems and to gain fine-grain control on the task submission part. I also agree to set and define all those flags in the greasy.in file. Thanks for your contribution.
Different sites may require different
srun
invocation options in order to correctly spawn the tasks using the SLURM scheduler. On Cray systems, for instance, there is the need to set thegres
optioncraynetwork
to0
(--gres=craynetwork:0
) when running non-MPI-based workflows and to1
, when running MPI-based workflows (--gres=craynetwork:1
). At CSCS, we have patched GREASY source code in order to allow for these custom configurations.I wonder what you think about adding environment variables to act as hooks in certain parts of the code in order to allow these types of customizations.
My proposal is to add the following environment flags
GREASY_CUSTOM_INIT_CMD
, default value""
. This would allow custom init commands, such as nvidia-mps init. In our case, the variable would be set to"if [ ! -z ${CRAY_CUDA_MPS+x} ]; ...";
GREASY_ADDITIONAL_SRUN_FLAGS
, default value""
. This would allow additionalsrun
cli flags, such as--gres=craynetwork:0
.GREASY_EMIT_SRUN_MEMORY_VALUE
, default value"0"
. Possible values are0
or1
. This would turn on and off the memory requrest onsrun
.GREASY_EMIT_SRUN_CPUS_PER_TASK
, default value"1"
. Possible values are0
or1
. This would turn on and off the-c
cli option forsrun
.I think that all of these flags could be set as using basic bash inside the greasy.in file because each site has to customize its own
greasy.in
file anyway. So, different sites would only need to maintain this "configuration" patch. As an example, LUMI will require a different set of patches compared to Piz Daint, even though it is a cray machine. These entry points would allow us to maintain only a singlegreasy.in
file for daint and another one for LUMI.Please let me know what you think so that I can make the appropriate PRs.
The text was updated successfully, but these errors were encountered: