Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slurm_lapply unable to parse sbatch output when starting jobs on federated slurm cluster when cluster name specified #41

Open
bmilash opened this issue Oct 16, 2023 · 1 comment
Assignees

Comments

@bmilash
Copy link

bmilash commented Oct 16, 2023

The Slurm_lapply function fails, reporting job ids of NA, when run on a federated SLURM cluster. In this case the parallel slurm jobs were successfully started, but the parent process failed to parse the output from the sbatch command.
On a federated SLURM cluster, when the cluster name is specified in the sbatch_opts (and passed to the sbatch command), the output from sbatch looks like:
Submitted batch job 8653762 on cluster name_of_cluster
The regular expression used to parse this output and capture the job id on lines 142 and 224 of sbatch.R is:
".+ (?=[[:digit:]]+$)"
The "$" in that expression prevents the pattern from matching the sbatch output since there are characters following the job id. I suspect just removing the $ will solve the problem. I tried recoding that line as:
jobid <- as.integer(regmatches(ans,regexpr("[[:digit:]]+",ans)))
and that worked as well.

@gvegayon gvegayon self-assigned this Oct 16, 2023
@gvegayon
Copy link
Member

Thank you, @bmilash! Would you be willing to submit a PR? Another question: I have a Docker image with slurm for testing, do you think we could create a test for this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants