-
Notifications
You must be signed in to change notification settings - Fork 519
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable k8s reserved cpus #3964
Enable k8s reserved cpus #3964
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking like a reasonable addition, thanks for the PR! I'd like to note the migration has to go into the next release so I've put some comments in the PR to shift you there. This is the first PR that has a migration for 1.21.0 (which may become 1.20.1 but the maintainers can sort that out if it changes) so that is why there are no directories showing you the way. You'll just want to shift the path for the actual migration crate to v1.21.0
and rename them. Otherwise the migration looks right from reading the code. If you can do the rename I'll kick off some testing to ensure it works.
sources/api/migration/migrations/v1.20.0/k8s-reserved-cpus-v0-1-0/Cargo.toml
Outdated
Show resolved
Hide resolved
741703c
to
7fd23f9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was able to pull this branch and validate that the settings take. I couldn't verify that the reserved CPUs were actually reserved. Do you have some commands I we can document to use for validation of this change?
I've tested the migration works as well. We are pretty close, as a tidiness ask, can you squash the commits into two commits?
If you can provide the commands to validate the actual functionality and clean up the commits, we should be in a good spot to merge! Thanks for the work on this @james-masson |
Given an 8 CPU core system as an example
... with an integer number of CPUs that matches the number of CPUs that are not in the "reserved-cpus" mask - in this case 6 This assumes the node is otherwise empty, apart from daemonsets - otherwise you'll have to reduce the number of cpus the pod requests to get it to fit. eg.
Assuming cgroup v2
The set should not include the CPU numbers set in the "reserved-cpus" mask in step 1
The set should include the CPU numbers set in the "reserved-cpus" mask in step 1, and also should not overlap with the guaranteed-class pod cpuset values. |
501cba8
to
da86138
Compare
@yeazelm - This has now been built and tested, and should have all the changes you asked for. |
sources/models/src/lib.rs
Outdated
@@ -295,6 +295,7 @@ struct KubernetesSettings { | |||
shutdown_grace_period_for_critical_pods: KubernetesDurationValue, | |||
memory_manager_reserved_memory: HashMap<Identifier, KubernetesMemoryReservation>, | |||
memory_manager_policy: KubernetesMemoryManagerPolicy, | |||
reserved_cpus: SingleLineString, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I read the code of how the Kubernetes parses this line, and I think we should implement the same parsing logic in a struct. We already do something similar for other values that kubernetes expects in other places (see KubernetesDurationValue). This will prevent users from setting a random, since the API will refuse to apply the value, which provides a better experience than chasing errors in the kubelet logs. The new type could be KubernetesCpuSet
, and implement what the kubernetes folks did here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arguably this is not a kubernetes specific string - it's a kernel concept, "cpuset string" is effectively a standard.
https://docs.kernel.org/admin-guide/cgroup-v2.html#cpuset-interface-files
It is probably a useful concept to have in Bottlerocket, as my future PRs will include a lot more use of "cpuset string" that are not tied to Kubernetes.
So, if you want this, best it's KernelCpuSet
- not KubernetesCpuSet
But it's almost academic - I lack the skills to do this in Rust ;-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
0c9d63c I wrote up a quick approach to this to assist with getting you there. This should work for your needs but I didn't test the integration myself. I'm happy to help get you the rest of the way if needed on this change!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @yeazelm - I cherry-picked your commit and squashed it in here.
Tested and working.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
negative case correctly gives.
[ 4.753115] early-boot-config[1198]: Error PATCHing '/settings?tx=bottlerocket-launch': Status 400 when PATCHing /settings?tx=bottlerocket-launch: Json deserialize error: Unable to deserialize into KernelCpuSetValue: Invalid Kernel CpuSet value 'abc' at line 1 column 1698
|
||
/// Add the option to set Kubernetes reserved-cpus | ||
fn run() -> Result<()> { | ||
migrate(AddSettingsMigration(&["settings.kubernetes.reserved-cpus"])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't we need to limit this for k8s only? And have a conditional here to perform a Noop
migration?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I couldn't find any example of that for an "Add a new setting" migration.
I found
and
and
all of which do the same as mine...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Line 2 in a1bcca5
use migration_helpers::{migrate, Result}; |
migrate(AddSettingsMigration(&["settings.kubernetes.reserved-cpus"])) | |
if cfg!(variant_runtime = "k8s") { | |
migrate(AddSettingsMigration(&["settings.kubernetes.reserved-cpus"])) | |
} else { | |
migrate(NoOpMigration); | |
} |
The example I linked to used ?
but I don't think it is required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried a few conbinations of this and the other change you linked to.
I'm stuck on...
#19 0.466 7 | / if cfg!(variant_runtime = "k8s") {
#19 0.466 8 | | migrate(AddSettingsMigration(&["settings.kubernetes.reserved-cpus"]))
#19 0.466 | | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ expected `()`, found `Result<(), Error>`
#19 0.466 9 | | } else {
#19 0.466 10 | | migrate(NoOpMigration);
#19 0.466 11 | | }
#19 0.466 | |_____- expected this to be `()`
#19 0.466 |
#19 0.466 = note: expected unit type `()`
#19 0.466 found enum `Result<(), migration_helpers::error::Error>`
#19 0.466 help: consider using a semicolon here
#19 0.466 |
#19 0.466 8 | migrate(AddSettingsMigration(&["settings.kubernetes.reserved-cpus"]));
#19 0.466 | +
#19 0.466 help: consider using a semicolon here
#19 0.466 |
... any hints?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, sorry, there was a typo in my suggestion, there should not be a ; at the end of the first migration call to end the statement.
migrate(AddSettingsMigration(&["settings.kubernetes.reserved-cpus"])) | |
if cfg!(variant_runtime = "k8s") { | |
migrate(AddSettingsMigration(&["settings.kubernetes.reserved-cpus"])) | |
} else { | |
migrate(NoOpMigration) | |
} |
The error is complaining about mixing and matching statements vs an expression. In this case, if you leave off the ;
it will return the results of the function call, in this case migration()
so want both statements to not have ;
but then you also don't want Ok(())
at the end of your run()
function. So the code I used to get this to compile looks like this for the entire file:
use migration_helpers::common_migrations::{AddSettingsMigration, NoOpMigration};
use migration_helpers::{migrate, Result};
use std::process;
/// Add the option to set Kubernetes reserved-cpus
fn run() -> Result<()> {
if cfg!(variant_runtime = "k8s") {
migrate(AddSettingsMigration(&["settings.kubernetes.reserved-cpus"]))
} else {
migrate(NoOpMigration)
}
}
// Returning a Result from main makes it print a Debug representation of the error, but with Snafu
// we have nice Display representations of the error, so we wrap "main" (run) and print any error.
// https://github.com/shepmaster/snafu/issues/110
fn main() {
if let Err(e) = run() {
eprintln!("{}", e);
process::exit(1);
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either this suggestion, or adding ?;
and Ok(())
at the end should be enough.
da86138
to
048bc35
Compare
|
||
/// Add the option to set Kubernetes reserved-cpus | ||
fn run() -> Result<()> { | ||
migrate(AddSettingsMigration(&["settings.kubernetes.reserved-cpus"])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This migration looks fine to me - it doesn't matter that aws-ecs-*
doesn't have these settings, it'll just be a no-op on downgrade because the paths won't exist.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We only need a conditional migration in the case where settings are added in one version for some variants (aws-k8s-*
) and then in a subsequent version for other variants (aws-ecs-*
). These settings won't ever be added for aws-ecs-*
so it's fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After working through @bcressey's comment I agree the migration is fine as is. So approving.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, thanks!
aea73f2
to
387cbb5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good - rebase just fixed the merge conflict in the use modeled_types::...
lines at the top of sourcse/models/src.lib.rs
.
387cbb5
to
7cb2390
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cargo fmt
appeased!
Issue number:
As per discussions with @yeazelm
Description of changes:
Adds support for K8s reserved-cpus functionality
https://kubernetes.io/docs/tasks/administer-cluster/reserve-compute-resources/#explicitly-reserved-cpu-list
Testing done:
Warning - not functional.Seems I've added a new package to Cargo for the migrations, but can't figure out how to only update the Cargo.lock with only my package changes.Local build, unit-test, deployed AMI using new config.
Terms of contribution:
By submitting this pull request, I agree that this contribution is dual-licensed under the terms of both the Apache License, version 2.0, and the MIT license.