Skip to content

Commit

Permalink
Scheduler: floating resources no longer experimental (#4159)
Browse files Browse the repository at this point in the history
Signed-off-by: Robert Smith <[email protected]>
  • Loading branch information
robertdavidsmith authored Jan 27, 2025
1 parent 1d93b7a commit 1ee7196
Show file tree
Hide file tree
Showing 5 changed files with 34 additions and 7 deletions.
27 changes: 27 additions & 0 deletions docs/floating_resources.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
# Floating Resources

Floating resources are designed to constrain the usage of resources that are not tied to nodes. For example, if you have a fileserver outside your Kubernetes clusters, you may want to limit how many connections to the fileserver can exist at once. In that case you would add config like the below (this goes under the `scheduling` section of the Armada scheduler config).

```
floatingResources:
- name: fileserver-connections
resolution: "1"
pools:
- name: cpu
quantity: 1000
- name: gpu
quantity: 500
```
When submitting a job, floating resources are specified in the same way as normal Kubernetes resources such as `cpu`. For example if a job needs 3 cpu cores and opens 10 connections to the fileserver, the job should specify
```
resources:
requests:
cpu: "3"
fileserver-connections: "10"
limits:
cpu: "3"
fileserver-connections: "10"
```
The `requests` section is used for scheduling. For floating resources, the `limits` section is not enforced by Armada (this it not possible in the general case). Instead the workload must be trusted to respect its limit.

If the jobs submitted to Armada request more of a floating resource than is available, they queue just as if they had exceeded the amount available of a standard Kubernetes resource (e.g. `cpu`). Floating resources generally behave like standard Kubernetes resources. They use the same code for queue ordering, pre-emption, etc.
2 changes: 1 addition & 1 deletion internal/scheduler/configuration/configuration.go
Original file line number Diff line number Diff line change
Expand Up @@ -224,7 +224,7 @@ type SchedulingConfig struct {
// These can be requested like a normal k8s resource. Note there is no mechanism in armada
// to enforce actual usage, it relies on honesty. For example, there is nothing to stop a badly-behaved job
// requesting 2 S3 server connections and then opening 10.
ExperimentalFloatingResources []FloatingResourceConfig
FloatingResources []FloatingResourceConfig
// WellKnownNodeTypes defines a set of well-known node types used to define "home" and "away" nodes for a given priority class.
WellKnownNodeTypes []WellKnownNodeType `validate:"dive"`
// Executor that haven't heartbeated in this time period are considered stale.
Expand Down
4 changes: 2 additions & 2 deletions internal/scheduler/schedulerapp.go
Original file line number Diff line number Diff line change
Expand Up @@ -77,14 +77,14 @@ func Run(config schedulerconfig.Configuration) error {
// ////////////////////////////////////////////////////////////////////////
resourceListFactory, err := internaltypes.NewResourceListFactory(
config.Scheduling.SupportedResourceTypes,
config.Scheduling.ExperimentalFloatingResources,
config.Scheduling.FloatingResources,
)
if err != nil {
return errors.WithMessage(err, "Error with the .scheduling.supportedResourceTypes field in config")
}
ctx.Infof("Supported resource types: %s", resourceListFactory.SummaryString())

floatingResourceTypes, err := floatingresources.NewFloatingResourceTypes(config.Scheduling.ExperimentalFloatingResources, resourceListFactory)
floatingResourceTypes, err := floatingresources.NewFloatingResourceTypes(config.Scheduling.FloatingResources, resourceListFactory)
if err != nil {
return err
}
Expand Down
4 changes: 2 additions & 2 deletions internal/scheduler/scheduling/gang_scheduler_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -100,7 +100,7 @@ func TestGangScheduler(t *testing.T) {
"floating resources": {
SchedulingConfig: func() configuration.SchedulingConfig {
cfg := testfixtures.TestSchedulingConfig()
cfg.ExperimentalFloatingResources = testfixtures.TestFloatingResourceConfig
cfg.FloatingResources = testfixtures.TestFloatingResourceConfig
return cfg
}(),
Nodes: testfixtures.N32CpuNodes(1, testfixtures.TestPriorities),
Expand Down Expand Up @@ -634,7 +634,7 @@ func TestGangScheduler(t *testing.T) {
func(qn string) *api.Queue { return &api.Queue{Name: qn} },
),
)
floatingResourceTypes, err := floatingresources.NewFloatingResourceTypes(tc.SchedulingConfig.ExperimentalFloatingResources, testfixtures.TestResourceListFactory)
floatingResourceTypes, err := floatingresources.NewFloatingResourceTypes(tc.SchedulingConfig.FloatingResources, testfixtures.TestResourceListFactory)
require.NoError(t, err)
sch, err := NewGangScheduler(sctx, constraints, floatingResourceTypes, nodeDb, false)
require.NoError(t, err)
Expand Down
4 changes: 2 additions & 2 deletions internal/scheduler/simulator/simulator.go
Original file line number Diff line number Diff line change
Expand Up @@ -126,13 +126,13 @@ func NewSimulator(
) (*Simulator, error) {
resourceListFactory, err := internaltypes.NewResourceListFactory(
schedulingConfig.SupportedResourceTypes,
schedulingConfig.ExperimentalFloatingResources,
schedulingConfig.FloatingResources,
)
if err != nil {
return nil, errors.WithMessage(err, "Error with the .scheduling.supportedResourceTypes field in config")
}

floatingResourceTypes, err := floatingresources.NewFloatingResourceTypes(schedulingConfig.ExperimentalFloatingResources, resourceListFactory)
floatingResourceTypes, err := floatingresources.NewFloatingResourceTypes(schedulingConfig.FloatingResources, resourceListFactory)
if err != nil {
return nil, err
}
Expand Down

0 comments on commit 1ee7196

Please sign in to comment.