-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
No documentation for using SciLuigi with SLURM #45
Comments
Hi @pietromarchesi and sorry for the lack of documentation on SLURM. Contributions very welcome, e.g. for the wiki. For now, you could have a look at our use case project for the sciluigi publication: E.g. see these lines on how to send a slurminfo object to the (The components used, are available in this accompanying repo). As you can see, the info needed is the typical SLURM job details, plus a runmode, which lets you switch dynamically whether to run locally or via SLURM, with python code. (In the linked code example, this is set up in the beginning of the script, on these lines). Hope this helps! |
Brilliant. Yeah I couldn't figure out exactly where things had to be defined, now it's clear. I could write up a draft for a doc page where I extend the example workflow of this examples with the necessary modifications necessary to run on SLURM, if you think that would be useful! |
That'd be great! |
I wrote a draft of a potential wiki page which is now at this gist. If you have suggestions on how it can be improved/fixed I will incorporate those and then add it to the wiki. The example works for me but it may be a good idea to test it as well. Cheers |
I also extended the example of the previous comment in a workflow where we run several instances of the same workflow in parallel, which you can find here. It would be great if you take a look, because I am noticing that the workflows get shipped always to the same node (instead of being sent as batch jobs to different nodes). They also don't show up in |
How many cores per node are there? ... since I think SLURM might schedule multiple core-jobs together on the same node, as long as there are free cores on the node. Any difference if you set
This is more strange. It's a bit hard to tell though without testing at a concrete system ... so many things that could happen with SLURM etc. E.g. I've been surprised a few time about how salloc and srun work together (sometimes having jobs start just locally on the login node ... sometimes only starting one job per node, although having many cores per node, etc).... |
Hi Samuel, many thanks for your reply. I can see jobs appearing in the queue now, although for some reason only Interestingly, jobs still appear sequentially in the queue, as if SciLuigi was waiting for one workflow to be completed before creating the allocation for the next. I am on a system with effectively 256 cores (64 cores with four hardware threads), so I tried requesting 256 cores, but not much changed. In fact, I discovered that I was already getting all 256 cores even was asking for only one, so I was getting the whole node in any case. The output of
|
Hmm, are you specifying number of workers to Luigi? The default one is |
Brilliant, that was it. And I guess the reason that only |
Absolutely. Many thanks indeed! |
Cool, I'm writing it up now. Just to be sure: as far as I understand, the only way to send tasks as batch jobs is to make a call to |
Exactly! |
I added the two pages to the wiki! Only thing is, in the menu on the right automatically generated by GitHub, the order of the pages got messed up, and Using shows up last now. Not sure how to fix that. Again ran into the problem of jobs not showing up in |
Hi,
I am trying to switch to SciLuigi from Luigi because I am interested in having support for SLURM. However, I cannot find any docs which show how to set up SlurmTask tasks. I am reading through the code, but it's still not clear to me what the intended usage is. Do you have any examples? I'd happy to write up and contribute a documentation page once I get the hang of it, I think it would be a nice addition to the wiki.
Cheers,
Pietro
The text was updated successfully, but these errors were encountered: