-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFE] Allow for specifying the Flatcar OS version or to extend validation steps #202
Comments
Have you seen the after-reboot checks? Docs here https://github.com/flatcar/flatcar-linux-update-operator/blob/030e43574c229eeb5a8858f03bdcc997f38131d9/doc/before-after-reboot-checks.md and example daemonset here: https://github.com/flatcar/flatcar-linux-update-operator/blob/030e43574c229eeb5a8858f03bdcc997f38131d9/examples/reboot-annotations/after-reboot-daemonset.yaml. You also have the option of defining a health check on the node level as a systemd service and making it a dependency of update_engine (and kubelet) at the systemd level https://www.flatcar.org/docs/latest/setup/debug/manual-rollbacks/#automated-rollbacks. That way the node automatically performs a rollback when you reboot it from a failed update. I'm also interested in finding out more about the issues you faced:
If I recall correctly you had issues with containerd not launching correctly. Where there others? Here's an example of what could have worked in this case (you would need to evaluate the level of dependency required, Requires= or BindsTo=). If you defined containerd as a dependency of kubelet and both kubelet and containerd as a dependency of update_engine:
|
Current situation
The latest version (3815.2.0) had some significant changes with upstream systems that causes outages. The auto update process will validate the update of the OS has successfully completed but doesn't allow for subsequent checks easily (or at least not through k8s). This causes outages and the only way to fix is to get into each node and manually rollback to the previous version and then pause updates.
Impact
This causes outages and the only way to fix is to get into each node and manually rollback to the previous version and then pause updates.
Ideal future situation
Allow for OS version pinning via an annotation.
OR
Allow for a CM with additional scripts that can be used to verify successful updates
Implementation options
annotation with pinned version that is passed via the DBus to the upate-agent to call flatcar-update
OR
some mechanism (maybe also over DBus) to send down a script that can update the after reboot checks and trigger a rollback if failed.
This is not the same as the https://www.flatcar.org/docs/latest/setup/releases/update-strategies/#configure-a-post-install-update-hook hook as this will keep the node in a bad state (although, this is a good final catch)
Additional information
The text was updated successfully, but these errors were encountered: