You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Imagine a scenario where you need X child resources in total, you don't care how many parent resources they are spread across but you need the same number of children allocated per parent.
Two concrete request examples and valid allocations:
A user has an MPI application that uses OpenMP for on-node parallelism and distributes a uniform amount of work to each MPI rank. Thus, they need 100 cores (or GPUs), don't care how many nodes, but the same number of cores allocated per node.
Valid allocations: 10 nodes with 10 cores per node, 20 nodes with 5 cores per node, etc
Invalid allocation: 2 nodes with 48 cores per node, 1 node with 4 cores
A user has an in-memory database that requires a fixed amount of memory and the database assumes each node has the same amount of memory. Thus, they need 100 TB of memory, don't care how many nodes, but the same amount of memory allocated per node.
Valid allocations: 10 nodes with 10 TB of memory per node, 20 nodes with 5 TB of memory per node, etc
Invalid allocation: 2 nodes with 48 TB memory per node, 1 node with 4 TB of memory
There are really two problems here:
Need some way to specify the total count (ignore multiplicative effects of the with key) of a particular child resource (currently only possible with tasks)
Need some way to specify that the child resource should be allocated uniformly across the parent resource
My best attempt to summarize a discussion with @trws:
One thought for solving problem 1 was to add a new label to the resource besides count, something like total-count
Building on that, to solve problem 2, we could add another label like total-count-spread (or something else) which could have values of uniform and non-uniform.
This seemed kinda gross and hacky. It also opens up all sorts of questions about sibling resources and how to actually implement this functionality in the scheduler.
The idea that we hated the least was to add an alternative to the with key: across. You would start by specifying the "child" resource that you want a total count of, then use the across key to describe the parent (or subtree) that you want the children uniformly spread across. Strawman example:
I am still trying to figure out if sibling resources make sense in the subtree under the across key. I don't think they do. Maybe it makes sense to invert the entire request and keep using across all the way down. Anyway, one nice part about this description, is that as an implementation, you know from the start that this isn't a normal match/traversal. You are attempting to find a total number of child resources and spread them uniformly across the parent resources.
The text was updated successfully, but these errors were encountered:
Imagine a scenario where you need X child resources in total, you don't care how many parent resources they are spread across but you need the same number of children allocated per parent.
Two concrete request examples and valid allocations:
There are really two problems here:
with
key) of a particular child resource (currently only possible with tasks)My best attempt to summarize a discussion with @trws:
count
, something liketotal-count
total-count-spread
(or something else) which could have values ofuniform
andnon-uniform
.This seemed kinda gross and hacky. It also opens up all sorts of questions about sibling resources and how to actually implement this functionality in the scheduler.
The idea that we hated the least was to add an alternative to the
with
key:across
. You would start by specifying the "child" resource that you want a total count of, then use theacross
key to describe the parent (or subtree) that you want the children uniformly spread across. Strawman example:I am still trying to figure out if sibling resources make sense in the subtree under the
across
key. I don't think they do. Maybe it makes sense to invert the entire request and keep usingacross
all the way down. Anyway, one nice part about this description, is that as an implementation, you know from the start that this isn't a normal match/traversal. You are attempting to find a total number of child resources and spread them uniformly across the parent resources.The text was updated successfully, but these errors were encountered: