You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A single 4 TB persistent disk on a 4 core NFS has a maximum throughput of ~400 MB/s. The instance itself will have a network throughput of 8 gigabits per second (1000 MB/s). Recall that the NFS server also runs jobs, so for a large batch submission where we are sure that we will continuously max out a 16 core instance, its throughput would be 32 Gb/s (4000 MB/s).
Thus, for a large number of concurrent workflows, we will need to have multiple NFS servers/disks per NFS server. All disks would be mounted to the controller node/each worker node; each task would be randomly assigned to a disk. (Since Slurm allows us to preferentially assign tasks to specific nodes, it would make sense to put the most I/O intensive jobs on the NFS nodes.)
We will have to very carefully optimize how many jobs a single disk could accommodate. As currently implemented, our pipelines are CPU-bound, not I/O-bound. Localization will likely be the most I/O intensive step; I’ve ballparked localization maxing out at 2-4 concurrent samples per 4 TB drive, assuming sustained bucket throughput of 100-200 MB/s per file. Perhaps we could do something clever with staggering workflow starts so that there aren’t too many concurrent localization steps.
The text was updated successfully, but these errors were encountered:
Just to make sure we're on the same page, I think we had discussed two options, right?
wolF Task Spreading
Let wolF be entirely responsible for distributing jobs across the available NFS instances. Either rotate through, queuing one task/NFS or estimate the throughput needed by each task and use that to determine distribution of tasks
Canine Orchestrator Spreading
The Orchestrator can spread a single task across multiple NFS by chunking the input job spec and creating one localizer instance each
One of the things that has been bothering me is how we report/detect multiple nfs mounts so the orchestrator can shard a pipeline. What I think is the simplest solution is to allow the localization.staging_dir pipeline argument to also be an array. If it's an array, canine should distribute the job load across those NFS mounts (either by spreading jobs evenly or by fully loading one NFS before moving to the next).
And then the orchestrator can create separate localizers for each NFS that it's using. Additionally, the --array argument to sbatch would allow us to modify the task ids so the pipeline still exists over a contiguous block of job ids
A single 4 TB persistent disk on a 4 core NFS has a maximum throughput of ~400 MB/s. The instance itself will have a network throughput of 8 gigabits per second (1000 MB/s). Recall that the NFS server also runs jobs, so for a large batch submission where we are sure that we will continuously max out a 16 core instance, its throughput would be 32 Gb/s (4000 MB/s).
Thus, for a large number of concurrent workflows, we will need to have multiple NFS servers/disks per NFS server. All disks would be mounted to the controller node/each worker node; each task would be randomly assigned to a disk. (Since Slurm allows us to preferentially assign tasks to specific nodes, it would make sense to put the most I/O intensive jobs on the NFS nodes.)
We will have to very carefully optimize how many jobs a single disk could accommodate. As currently implemented, our pipelines are CPU-bound, not I/O-bound. Localization will likely be the most I/O intensive step; I’ve ballparked localization maxing out at 2-4 concurrent samples per 4 TB drive, assuming sustained bucket throughput of 100-200 MB/s per file. Perhaps we could do something clever with staggering workflow starts so that there aren’t too many concurrent localization steps.
The text was updated successfully, but these errors were encountered: