-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
slurm bcbio_submit.sh job pending #159
Comments
Thomas;
that you could try running to see if that unsticks it. Sorry about the issue and hope that does it for you. |
Thanks for your feedback. I have been able to reproduce this issue with multiple Interestingly, the issue might be related to the type of AWS EC2 instance provisioned for c3.large worker nodesFirst, I created a cluster with two
Next, I created and submitted my batch job:
This job is added to the SLURM queue and starts executing. (The downstream job fails, c3.xlarge worker nodesNext, I destroyed the above cluster and changed the The new cluster starts without any problems:
Once again, I submit my batch job:
But this time the job is added to the queue, but remains pending due to insufficient
m4.xlarge worker nodesFinally, I destroyed the cluster and requested two
Once again, I submit my batch job:
and again, the job is placed into the queue but remains pending. It seems that bcbio_vm has correctly identified that the worker nodes have 4 cores and
In summary, I have only managed to launch jobs when the worker nodes were of instance Any ideas? |
Thomas; |
After realizing that the AWS instances I had specified for the workers were too small in issue #158 ,
I switched to using c3.xlarge instances for the workers.
The cluster is built without problems:
But when I log into the head node and try to submit a job to the queue, the job is never executed and remains pending.
It seems that no suitable resource is available to execute the
bcbio_submit.sh
script.I am new to using slurm, so I don't really know how the resources are defined. Any hints?
For completeness, here is some more information about the scripts and the cluster environment:
The text was updated successfully, but these errors were encountered: