Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Specifying specific step-dependant -R and -M #10

Open
benjeffery opened this issue Jun 17, 2024 · 6 comments
Open

Specifying specific step-dependant -R and -M #10

benjeffery opened this issue Jun 17, 2024 · 6 comments

Comments

@benjeffery
Copy link

My specific cluster setup requires that I specify:
-R 'select[mem>{mem_mb}] rusage[mem={mem_mb}]' -M {mem_mb}
I couldn't see how to specify this in the config - maybe there is a way to do this? I'm currently hacking in a change to the module to achieve this.
Thanks for the plugin!

@BEFH
Copy link
Owner

BEFH commented Jun 17, 2024

Currently, the plugin only requests rusage[mem={mem_mb}], with the rest being implicit. I think we can make things work with your setup, but I would like to autodetect what is needed if possible. Is there any way you could provide your lsf.conf?

@BEFH
Copy link
Owner

BEFH commented Jun 17, 2024

If you cannot provide the files, please let me know if LSB_RESOURCE_ENFORCE is set, and if so, what the value is. Also, LSB_JOB_MEMLIMIT and LSB_MEMLIMIT_ENFORCE.

@dlaehnemann What do you have for LSB_JOB_MEMLIMIT and LSB_MEMLIMIT_ENFORCE?

@benjeffery
Copy link
Author

benjeffery commented Jun 17, 2024

Currently, the plugin only requests rusage[mem={mem_mb}], with the rest being implicit.

Yes, with that specification the job ends up with a default value for the mem limit.

Is there any way you could provide your lsf.conf

I can't take files outside the secure environment (working with human genomes) but here are the values you asked for:
LSB_RESOURCE_ENFORCE="cpu memory GPU"
LSB_JOB_MEMLIMIT=Y
LSB_MEMLIMIT_ENFORCE is not present.

Many thanks for your time on this.

@BEFH
Copy link
Owner

BEFH commented Jun 17, 2024

Do you specify memory limits per core or per job?

@dlaehnemann
Copy link
Contributor

I have this set of parameters on my system:

LSB_JOB_MEMLIMIT=N
LSB_MEMLIMIT_ENFORCE=N
LSB_RESOURCE_ENFORCE="cpu memory gpu"

In addition, this setting might be relevant:

LSB_SUB_MEM_SWAP_HOST_LIMIT=Y

Also, @benjeffery: have you tested that each of these settings is necessary and omitting any one of them doesn't work? Maybe you could try all these different combinations and report on the exact outputs you get with some minimal working example:

  • the default: -R 'rusage[mem={mem_mb}]'
  • only select[] added: -R 'select[mem>{mem_mb}] rusage[mem={mem_mb}]'
  • only -m added: -R 'rusage[mem={mem_mb}]' -M {mem_mb}
  • everything: -R 'select[mem>{mem_mb}] rusage[mem={mem_mb}]' -M {mem_mb}

The LSF cluster configuration is far too flexible and there don't seem to be clear recommendations on how to set this up. So our cluster admin actually recommended against trying to automatically determine any memory configuration. However, I do agree that we should try to keep this plugin as general as possible, so I think all we can do is try to understand as thoroughly as possible what is going on here...

Also, to further find out what you system is configuring, you could look through your system's configuration by determining what the $LSF_TOP folder is and then referring to the LSF folder structure in the docs. Important files that I have looked at so far have been:

So it might also be that you simply have a very odd cluster configuration and might be easier to ask admins to reconfigure. For example the select[mem>{mem_mb}] setting sounds like something that should be implicitly handled when the rusage[mem={mem_mb}] is already set.


And just document another (currently) non-workable solution: I also thought about adding something like the following to your ~/.config/snakemake/lsf_profile/config.yaml file should work to also set your necessary extra command line arguments:

default-resources:
  lsf_extra: "f\"-R 'select[mem>{resources.mem_mb}]' -M {resources.mem_mb}\""

However, the resources.mem_mb specification is not currently available for the callables in dynamic resources specification, so this would require a change in snakemake itself. And I'm not sure how easy and feasible that would be.

@BEFH
Copy link
Owner

BEFH commented Jun 18, 2024

@dlaehnemann I agree with everything you said.

It should probably be fine to do -R select[mem>{mem_mb}] for everyone, although it may interfere with MPI jobs if the memory is per-job, and would realistically have to be the total memory for non-MPI jobs if the select matters at all. In most cases, the only reason to set this would be if your cluster allows modification of jobs after submission and you want a greater amount of memory on the execution node than the initial request for later increases, which is not something that is possible on my cluster.

I'm somewhat more concerned about setting -M since it sets a hard memory limit and may introduce unexpected behavior on some clusters. I think it's fine on mine, but I don't know about other clusters.

Another weird thing about @benjeffery's configuration is that LSB_RESOURCE_ENFORCE is set for memory, which means that memory should be enforced by unix cgroup, overriding LSB_JOB_MEMLIMIT=Y according to the documentation. I would like to know if the memory request is per-core or per-job on @benjeffery's cluster.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants