Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to add resource manager options inside Cerise #75

Open
felipeZ opened this issue Feb 5, 2018 · 6 comments
Open

How to add resource manager options inside Cerise #75

felipeZ opened this issue Feb 5, 2018 · 6 comments
Assignees
Labels

Comments

@felipeZ
Copy link

felipeZ commented Feb 5, 2018

Let's imaging that I have the following slurm script

#! /bin/bash
#SBATCH -t 00:05:00
#SBATCH -N 1
#SBATCH -J test
#SBATCH -C TitanX
#SBATCH --gres=gpu:1

How can I add constrains like -C TitanX from Cerise?

@LourensVeen
Copy link
Member

It's not currently supported, because Xenon doesn't do that yet: xenon-middleware/xenon#582

Then there's the question of where these things should be specified: in the Cerise configuration, or in the CWL file. The CWL file seems logical, but CWL 1.0 doesn't support that, see common-workflow-language/common-workflow-language#587

So we're a bit constrained by the technology that we're working with. We'll need some kind of solution though, but with the lack of Xenon support that's not so easy. Xenon does let you use a custom job script, but that means that we'll have to do everything else by hand as well, and for every supported scheduler, which is exactly what we're trying to avoid by using Xenon...

@felipeZ
Copy link
Author

felipeZ commented Feb 5, 2018

For running the MD simulation with Gromacs we have cases where GPUs are available at a specific queue, other cases we need some constrains or the combination of queue names and constrains.
The user should have some control over the resource manager tool (e.g. Slurm, Torque) some testing can be done in a short queue while production requires another queue. Right now we have the queue name hardcoded in the cerise-config forcing the user to build the container every time that a different queue is required.

@LourensVeen
Copy link
Member

It seems like it would be best to let the user add a hint (which is a CWL feature) to the Workflow, where they can specify the kind of node (or however we call it, needs thought) to run on. These kinds are then defined in the specialisation, because they depend on the machine (not all machines have GPUs, short queues, or whatever). The same hint feature could then be used to specify the number of nodes to request and the runtime, for #44.

@LourensVeen
Copy link
Member

Xenon now has support for setting job constraints, so it's waiting for the upgrade to Xenon 2 now.

@LourensVeen
Copy link
Member

Actually, we already had the idea to specify different steps for different use cases. For example, we'd have a gromacs_fast.cwl and a gromacs_efficient.cwl, one giving the result ASAP, the other in few core hours. And we could have gromacs_protein_protein.cwl for larger systems, or something. So the user wouldn't give a hint, they'd call a specific step, and the step would contain a hint.

Cerise will then, on reading the steps on startup, build a table of requirements per step. When a workflow is submitted, the requirements of all the steps used will be merged, and then the job will be submitted with them. Conflicting requirements should be avoided by the specialist as much as possible, but will result in a PermanentFailure if incompatible steps are used. That should give a good error message in the job log as well.

LourensVeen added a commit that referenced this issue Nov 4, 2018
@LourensVeen
Copy link
Member

LourensVeen commented Nov 4, 2018

Okay, Xenon 2 didn't happen, and we're running out of time a bit, but Cerulean can do this and I've added the simplest solution I could think of, which is an additional option next to queue-name in the API configuration where you can specify additional scheduler options. This may mean that we need two separate specialisations, one with and one without GPUs, but I think that that's acceptable for now. @felipeZ will that work for you?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants