You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Motuz currently does not scale well. When the number of jobs equals the number of CPUs on the (single) machine where it is running, it puts new jobs into a queue and they do not start until a running job completes.
This makes Motuz less useful when it is needed most.
Since here at Fred Hutch we have an HPC cluster, it would make sense to have the option to submit jobs to the cluster, then we could remove this limitation.
Since not everyone who may want to run Motuz has an HPC cluster, we should still support the Celery backend.
This issue is just to track progress on this work.
Some thoughts -
rclone outputs estimated time to job completion periodically (at least when using the --verbose flag). We could use this to determine how long the HPC job needs to run. We can have a cron job that periodically checks running motuz jobs and gives them more wall time to run if necessary.
We could also customize the parallelization of a job (setting the number of cores used by the job to be equal to the number of parallel transfers. The tricky part is knowing a priori how big the job will be and therefore how many cores to request.
If podman is installed on the cluster we can run the copy job as a docker image (which can be run by ordinary users without root privileges), making management of rclone and python dependencies quite simple.
Add more thoughts below as needed....
The text was updated successfully, but these errors were encountered:
Motuz currently does not scale well. When the number of jobs equals the number of CPUs on the (single) machine where it is running, it puts new jobs into a queue and they do not start until a running job completes.
This makes Motuz less useful when it is needed most.
Since here at Fred Hutch we have an HPC cluster, it would make sense to have the option to submit jobs to the cluster, then we could remove this limitation.
Since not everyone who may want to run Motuz has an HPC cluster, we should still support the Celery backend.
This issue is just to track progress on this work.
Some thoughts -
--verbose
flag). We could use this to determine how long the HPC job needs to run. We can have a cron job that periodically checks running motuz jobs and gives them more wall time to run if necessary.podman
is installed on the cluster we can run the copy job as a docker image (which can be run by ordinary users without root privileges), making management of rclone and python dependencies quite simple.The text was updated successfully, but these errors were encountered: