Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable k8s reserved cpus #3964

Merged

Conversation

james-masson
Copy link
Contributor

@james-masson james-masson commented May 16, 2024

Issue number:

As per discussions with @yeazelm

Description of changes:

Adds support for K8s reserved-cpus functionality

https://kubernetes.io/docs/tasks/administer-cluster/reserve-compute-resources/#explicitly-reserved-cpu-list

Testing done:

Warning - not functional.

Seems I've added a new package to Cargo for the migrations, but can't figure out how to only update the Cargo.lock with only my package changes.

Local build, unit-test, deployed AMI using new config.

Terms of contribution:

By submitting this pull request, I agree that this contribution is dual-licensed under the terms of both the Apache License, version 2.0, and the MIT license.

Copy link
Contributor

@yeazelm yeazelm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking like a reasonable addition, thanks for the PR! I'd like to note the migration has to go into the next release so I've put some comments in the PR to shift you there. This is the first PR that has a migration for 1.21.0 (which may become 1.20.1 but the maintainers can sort that out if it changes) so that is why there are no directories showing you the way. You'll just want to shift the path for the actual migration crate to v1.21.0 and rename them. Otherwise the migration looks right from reading the code. If you can do the rename I'll kick off some testing to ensure it works.

Release.toml Outdated Show resolved Hide resolved
Copy link
Contributor

@yeazelm yeazelm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was able to pull this branch and validate that the settings take. I couldn't verify that the reserved CPUs were actually reserved. Do you have some commands I we can document to use for validation of this change?

Release.toml Outdated Show resolved Hide resolved
@yeazelm
Copy link
Contributor

yeazelm commented Jun 3, 2024

I've tested the migration works as well. We are pretty close, as a tidiness ask, can you squash the commits into two commits?

  • one to cover adding the setting and templates
  • and another to add the migration

If you can provide the commands to validate the actual functionality and clean up the commits, we should be in a good spot to merge! Thanks for the work on this @james-masson

@james-masson
Copy link
Contributor Author

I was able to pull this branch and validate that the settings take. I couldn't verify that the reserved CPUs were actually reserved. Do you have some commands I we can document to use for validation of this change?

Given an 8 CPU core system as an example

  1. Configure the system
[settings.kubernetes]
cpu-manager-policy = "static"
reserved-cpus = "0-1" # don't use these CPUs for any high-priority k8s workloads
  1. Inject a guaranteed class pod into the cluster...

... with an integer number of CPUs that matches the number of CPUs that are not in the "reserved-cpus" mask - in this case 6

This assumes the node is otherwise empty, apart from daemonsets - otherwise you'll have to reduce the number of cpus the pod requests to get it to fit.

eg.

---
apiVersion: v1
kind: Pod
metadata:
  name: guarenteed-qos
spec:
  containers:
    - name: qos-demo
      image: nginx
      resources:
        limits:
          memory: "200Mi"
          cpu: "6"
  1. Look at the cpuset that is assigned to the guaranteed class pod.

Assuming cgroup v2

kubectl exec -it <guaranteed pod> -- cat /sys/fs/cgroup/cpuset.cpus.effective

The set should not include the CPU numbers set in the "reserved-cpus" mask in step 1

  1. Look at the cpuset that is assigned to a random daemonset
kubectl -n kube-system exec -it <daemonset pod> -- cat /sys/fs/cgroup/cpuset.cpus.effective

The set should include the CPU numbers set in the "reserved-cpus" mask in step 1, and also should not overlap with the guaranteed-class pod cpuset values.

@james-masson james-masson force-pushed the enable_k8s_reserved_cpus branch 2 times, most recently from 501cba8 to da86138 Compare June 7, 2024 09:42
@james-masson
Copy link
Contributor Author

@yeazelm - This has now been built and tested, and should have all the changes you asked for.

@@ -295,6 +295,7 @@ struct KubernetesSettings {
shutdown_grace_period_for_critical_pods: KubernetesDurationValue,
memory_manager_reserved_memory: HashMap<Identifier, KubernetesMemoryReservation>,
memory_manager_policy: KubernetesMemoryManagerPolicy,
reserved_cpus: SingleLineString,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I read the code of how the Kubernetes parses this line, and I think we should implement the same parsing logic in a struct. We already do something similar for other values that kubernetes expects in other places (see KubernetesDurationValue). This will prevent users from setting a random, since the API will refuse to apply the value, which provides a better experience than chasing errors in the kubelet logs. The new type could be KubernetesCpuSet, and implement what the kubernetes folks did here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arguably this is not a kubernetes specific string - it's a kernel concept, "cpuset string" is effectively a standard.

https://docs.kernel.org/admin-guide/cgroup-v2.html#cpuset-interface-files

It is probably a useful concept to have in Bottlerocket, as my future PRs will include a lot more use of "cpuset string" that are not tied to Kubernetes.

So, if you want this, best it's KernelCpuSet - not KubernetesCpuSet

But it's almost academic - I lack the skills to do this in Rust ;-)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0c9d63c I wrote up a quick approach to this to assist with getting you there. This should work for your needs but I didn't test the integration myself. I'm happy to help get you the rest of the way if needed on this change!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @yeazelm - I cherry-picked your commit and squashed it in here.
Tested and working.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

negative case correctly gives.

[    4.753115] early-boot-config[1198]: Error PATCHing '/settings?tx=bottlerocket-launch': Status 400 when PATCHing /settings?tx=bottlerocket-launch: Json deserialize error: Unable to deserialize into KernelCpuSetValue: Invalid Kernel CpuSet value 'abc' at line 1 column 1698


/// Add the option to set Kubernetes reserved-cpus
fn run() -> Result<()> {
migrate(AddSettingsMigration(&["settings.kubernetes.reserved-cpus"]))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we need to limit this for k8s only? And have a conditional here to perform a Noop migration?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the inverse of what you need here.

Suggested change
migrate(AddSettingsMigration(&["settings.kubernetes.reserved-cpus"]))
if cfg!(variant_runtime = "k8s") {
migrate(AddSettingsMigration(&["settings.kubernetes.reserved-cpus"]))
} else {
migrate(NoOpMigration);
}

The example I linked to used ? but I don't think it is required.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried a few conbinations of this and the other change you linked to.

I'm stuck on...

  #19 0.466 7  | /     if cfg!(variant_runtime = "k8s") {                                                                                                             
  #19 0.466 8  | |         migrate(AddSettingsMigration(&["settings.kubernetes.reserved-cpus"]))
  #19 0.466    | |         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ expected `()`, found `Result<(), Error>`                             
  #19 0.466 9  | |     } else {                                                                                                                                       
  #19 0.466 10 | |        migrate(NoOpMigration);                                                                                                                     
  #19 0.466 11 | |     }                                                                                                                                              
  #19 0.466    | |_____- expected this to be `()`                                                                                                                     
  #19 0.466    |
  #19 0.466    = note: expected unit type `()`
  #19 0.466                    found enum `Result<(), migration_helpers::error::Error>`
  #19 0.466 help: consider using a semicolon here
  #19 0.466    |
  #19 0.466 8  |         migrate(AddSettingsMigration(&["settings.kubernetes.reserved-cpus"]));
  #19 0.466    |                                                                              +
  #19 0.466 help: consider using a semicolon here
  #19 0.466    |

... any hints?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, sorry, there was a typo in my suggestion, there should not be a ; at the end of the first migration call to end the statement.

Suggested change
migrate(AddSettingsMigration(&["settings.kubernetes.reserved-cpus"]))
if cfg!(variant_runtime = "k8s") {
migrate(AddSettingsMigration(&["settings.kubernetes.reserved-cpus"]))
} else {
migrate(NoOpMigration)
}

The error is complaining about mixing and matching statements vs an expression. In this case, if you leave off the ; it will return the results of the function call, in this case migration() so want both statements to not have ; but then you also don't want Ok(()) at the end of your run() function. So the code I used to get this to compile looks like this for the entire file:

use migration_helpers::common_migrations::{AddSettingsMigration, NoOpMigration};
use migration_helpers::{migrate, Result};
use std::process;

/// Add the option to set Kubernetes reserved-cpus
fn run() -> Result<()> {
     if cfg!(variant_runtime = "k8s") {
         migrate(AddSettingsMigration(&["settings.kubernetes.reserved-cpus"]))
    } else {
        migrate(NoOpMigration)
    }
}

// Returning a Result from main makes it print a Debug representation of the error, but with Snafu
// we have nice Display representations of the error, so we wrap "main" (run) and print any error.
// https://github.com/shepmaster/snafu/issues/110
fn main() {
    if let Err(e) = run() {
        eprintln!("{}", e);
        process::exit(1);
    }
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either this suggestion, or adding ?; and Ok(()) at the end should be enough.


/// Add the option to set Kubernetes reserved-cpus
fn run() -> Result<()> {
migrate(AddSettingsMigration(&["settings.kubernetes.reserved-cpus"]))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This migration looks fine to me - it doesn't matter that aws-ecs-* doesn't have these settings, it'll just be a no-op on downgrade because the paths won't exist.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We only need a conditional migration in the case where settings are added in one version for some variants (aws-k8s-*) and then in a subsequent version for other variants (aws-ecs-*). These settings won't ever be added for aws-ecs-* so it's fine.

Copy link
Contributor

@yeazelm yeazelm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After working through @bcressey's comment I agree the migration is fine as is. So approving.

Copy link
Contributor

@bcressey bcressey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks!

Copy link
Contributor

@bcressey bcressey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good - rebase just fixed the merge conflict in the use modeled_types::... lines at the top of sourcse/models/src.lib.rs.

Copy link
Contributor

@bcressey bcressey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cargo fmt appeased!

@yeazelm yeazelm merged commit 2d4790f into bottlerocket-os:develop Jun 13, 2024
31 checks passed
@vigh-m vigh-m mentioned this pull request Aug 1, 2024
16 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants