Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scheduler: fixed a bug where resource calculation did not account correctly for poststart tasks #24297

Conversation

mvegter
Copy link
Contributor

@mvegter mvegter commented Oct 25, 2024

Fixes a bug in the AllocatedResources.Comparable method, which resulted in
reporting less required resources than actually expected. This could result in
overscheduling of allocations on a single node and overlapping cgroup cpusets.

Split to #24304


job "redis2nd" {
  type = "service"
  group "cache" {
    count = 1

    task "redis-prestart" {
      lifecycle {
        hook    = "prestart"
        sidecar = false
      }
      driver = "docker"
      config {
        image = "hello-world:latest"
      }
      resources {
        cpu = 1000
      }
    }

    task "redis" {
      driver = "docker"
      config {
        image = "redis:3.2"
      }
      resources {
        cpu = 1000
      }
    }

    task "redis-start-side" {
      lifecycle {
        hook    = "poststart"
        sidecar = true
      }
      driver = "docker"
      config {
        image = "redis:3.2"
      }
      resources {
        cpu = 1000
      }
    }

    task "redis-poststop" {
      lifecycle {
        hook    = "poststop"
        sidecar = false
      }
      driver = "docker"
      config {
        image = "hello-world:latest"
      }
      resources {
        cpu = 1000
      }
    }
  }
}

image

Before

[sandbox@nomad-dev nomad]$ curl -s http://localhost:4646/v1/metrics | jq '.Gauges[] | select(.Name | contains("allocated.cpu")) | .Name, .Value'
"nomad.client.allocated.cpu"
1000.0
"nomad.client.unallocated.cpu"
277380.0

After

[sandbox@nomad-dev nomad]$ curl -s http://localhost:4646/v1/metrics | jq '.Gauges[] | select(.Name | contains("allocated.cpu")) | .Name, .Value'
"nomad.client.allocated.cpu"
2000.0
"nomad.client.unallocated.cpu"
276380.0

@mvegter mvegter force-pushed the mvegter-fix-overlapping-cpuset-due-to-posstart-tasks branch from 77b60ba to 7d34250 Compare October 25, 2024 18:09
@mvegter mvegter force-pushed the mvegter-fix-overlapping-cpuset-due-to-posstart-tasks branch from 7d34250 to fa4a1ee Compare October 27, 2024 14:18
@mvegter mvegter changed the title scheduler: take into account posstart task to prevent overlapping cpusets scheduler: fix a bug where we did not account for poststart tasks resources Oct 27, 2024
@mvegter mvegter force-pushed the mvegter-fix-overlapping-cpuset-due-to-posstart-tasks branch from fa4a1ee to 6926586 Compare October 27, 2024 14:50
@jrasell jrasell self-assigned this Nov 4, 2024
@jrasell jrasell self-requested a review November 4, 2024 11:47
@jrasell jrasell added backport/ent/1.7.x+ent Changes are backported to 1.7.x+ent backport/ent/1.8.x+ent Changes are backported to 1.8.x+ent backport/ent/1.9.x+ent Changes are backported to 1.9.x+ent backport/1.9.x backport to 1.9.x release line labels Nov 4, 2024
Copy link
Member

@jrasell jrasell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @mvegter and thanks for raising this PR. I've made a minor suggested change to the changelog wording, but otherwise this LGTM and once the changelog item has been resolved I'll get this merged. Thanks again!

.changelog/24297.txt Outdated Show resolved Hide resolved
…rectly for poststart tasks

Fixes a bug in the AllocatedResources.Comparable method, which resulted in
reporting less required resources than actually expected. This could result in
overscheduling of allocations on a single node  and overlapping cgroup cpusets.
@mvegter mvegter force-pushed the mvegter-fix-overlapping-cpuset-due-to-posstart-tasks branch from 6926586 to 0c58a33 Compare November 5, 2024 08:46
@mvegter mvegter changed the title scheduler: fix a bug where we did not account for poststart tasks resources scheduler: fixed a bug where resource calculation did not account correctly for poststart tasks Nov 5, 2024
@jrasell jrasell merged commit 8545e1c into hashicorp:main Nov 5, 2024
18 checks passed
jrasell pushed a commit that referenced this pull request Nov 5, 2024
…rectly for poststart tasks (#24297)

Fixes a bug in the AllocatedResources.Comparable method, which resulted in
reporting less required resources than actually expected. This could result in
overscheduling of allocations on a single node  and overlapping cgroup cpusets.

Co-authored-by: Martijn Vegter <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport/ent/1.7.x+ent Changes are backported to 1.7.x+ent backport/ent/1.8.x+ent Changes are backported to 1.8.x+ent backport/ent/1.9.x+ent Changes are backported to 1.9.x+ent backport/1.9.x backport to 1.9.x release line
Projects
Development

Successfully merging this pull request may close these issues.

2 participants