Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Configuring reboot window using flags as documented is not working #193

Closed
steled opened this issue Mar 28, 2023 · 5 comments
Closed

Configuring reboot window using flags as documented is not working #193

steled opened this issue Mar 28, 2023 · 5 comments
Labels
documentation Improvements or additions to documentation

Comments

@steled
Copy link

steled commented Mar 28, 2023

Description

Hi, I setup a reboot window of 2 hours via the environment variables but the reboot is not done during this window.

Impact

The node is not rebooted during the reboot window and does not get the new update.

Environment and steps to reproduce

  1. Set-up: I setup FLUO as described at the Usage section
  2. Task: for the reboot window I added the following lines to the update-operator.yaml file:
        env:
        ...
        - name: UPDATE_OPERATOR_REBOOT_WINDOW_START
          value: "09:00"
        - name: UPDATE_OPERATOR_REBOOT_WINDOW_LENGTH
          value: "2h"
  1. Action(s): to trigger an update I'm jumping to another channel
  2. Error: the node isn't rebooted during the reboot window

Expected behavior

The node with the update should be rebooted during the reboot window

Additional information

I can see that the labels and annotaions changes:

Labels:             flatcar-linux-update.v1.flatcar-linux.net/group=stable
                    flatcar-linux-update.v1.flatcar-linux.net/id=flatcar
                    flatcar-linux-update.v1.flatcar-linux.net/reboot-needed=true
                    flatcar-linux-update.v1.flatcar-linux.net/version=3374.2.5
                    v1.kubeone.io/operating-system=flatcar
Annotations:        flatcar-linux-update.v1.flatcar-linux.net/last-checked-time: 1679986757
                    flatcar-linux-update.v1.flatcar-linux.net/new-version: 3510.1.0
                    flatcar-linux-update.v1.flatcar-linux.net/reboot-in-progress: false
                    flatcar-linux-update.v1.flatcar-linux.net/reboot-needed: true
                    flatcar-linux-update.v1.flatcar-linux.net/status: UPDATE_STATUS_UPDATED_NEED_REBOOT

I can't see any problems from the logs of the pod:

$ kubectl logs -n reboot-coordinator flatcar-linux-update-operator-85b99fd865-swqgc
I0328 06:58:07.376415       1 main.go:108] /bin/update-operator running
I0328 06:58:07.376546       1 leaderelection.go:248] attempting to acquire leader lease reboot-coordinator/flatcar-linux-update-operator-lock...
I0328 06:58:07.401639       1 leaderelection.go:258] successfully acquired lease reboot-coordinator/flatcar-linux-update-operator-lock
I0328 06:58:08.382212       1 operator.go:593] Found 0 rebooted nodes
<for the sake of brevity>
I0328 07:18:47.188086       1 operator.go:593] Found 0 rebooted nodes

And I see the environment variables correctly set to the pod:

kubectl exec -it -n reboot-coordinator flatcar-linux-update-operator-85b99fd865-swqgc -- sh
/bin $ env
KUBERNETES_SERVICE_PORT=443
KUBERNETES_PORT=tcp://10.96.0.1:443
HOSTNAME=flatcar-linux-update-operator-85b99fd865-swqgc
SHLVL=1
UPDATE_OPERATOR_REBOOT_WINDOW_START=09:00
HOME=/
TERM=xterm
UPDATE_OPERATOR_REBOOT_WINDOW_LENGTH=2h
KUBERNETES_PORT_443_TCP_ADDR=10.96.0.1
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
KUBERNETES_PORT_443_TCP_PORT=443
KUBERNETES_PORT_443_TCP_PROTO=tcp
KUBERNETES_SERVICE_PORT_HTTPS=443
KUBERNETES_PORT_443_TCP=tcp://10.96.0.1:443
POD_NAMESPACE=reboot-coordinator
KUBERNETES_SERVICE_HOST=10.96.0.1
PWD=/bin

Additional question

Is it possible to set the reboot window only for office working hours, for example:

UPDATE_OPERATOR_REBOOT_WINDOW_START: Mon, Tue, Wed, Thu, Fri 09:00
UPDATE_OPERATOR_REBOOT_WINDOW_LENGTH: 8h
@invidian invidian changed the title Reboot window is not working Configuring reboot window using environment variables as documented is not working Mar 28, 2023
@invidian invidian added the documentation Improvements or additions to documentation label Mar 28, 2023
@invidian
Copy link
Member

invidian commented Mar 28, 2023

Good finding. It seems those environment variables were documented in cda5e86, but never implemented. They should be removed from the documentation.

Separately, we can discuss whether it make sense to actually implement it, as right now I don't see an obvious benefit in doing so. Is there some specific reason you would prefer to use environment variables instead of CLI flags? As far as I know, it should be possible to refer to env variables in CLI args in pod spec, perhaps you can use this instead?

invidian added a commit that referenced this issue Mar 28, 2023
Their support has never been implemented.

Closes #193

Signed-off-by: Mateusz Gozdek <[email protected]>
@steled
Copy link
Author

steled commented Mar 28, 2023

Ok, so doing it via env variables directly is not important.

I did it with both variants:

        command:
        - "/bin/update-operator"
        args:
          - "--reboot-window-start=10:45"
          - "--reboot-window-length=1h"

and as described here:

        command:
        - "/bin/update-operator"
        args:
          - "--reboot-window-start=$(UPDATE_OPERATOR_REBOOT_WINDOW_START)"
          - "--reboot-window-length=$(UPDATE_OPERATOR_REBOOT_WINDOW_LENGTH)"
        env:
        ...
        - name: UPDATE_OPERATOR_REBOOT_WINDOW_START
          value: "10:30"
        - name: UPDATE_OPERATOR_REBOOT_WINDOW_LENGTH
          value: "1h"

But none of them is working.

From inside the pod I can see:

k exec -it -n reboot-coordinator flatcar-linux-update-operator-7f864bbd94-q4cm9 -- sh
/bin $ ps
PID   USER     TIME  COMMAND
    1 nobody    0:00 /bin/update-operator --reboot-window-start=10:45 --reboot-window-length=1h
   20 nobody    0:00 sh
   27 nobody    0:00 ps

But the node still is not rebooting.

The logs also doesn't show anything:

$ k logs -n reboot-coordinator flatcar-linux-update-operator-7f864bbd94-q4cm9 -f
I0328 08:40:48.396657       1 main.go:108] /bin/update-operator running
I0328 08:40:48.398287       1 leaderelection.go:248] attempting to acquire leader lease reboot-coordinator/flatcar-linux-update-operator-lock...
I0328 08:40:48.430159       1 leaderelection.go:258] successfully acquired lease reboot-coordinator/flatcar-linux-update-operator-lock
I0328 08:40:49.404412       1 operator.go:593] Found 0 rebooted nodes
<for the sake of brevity>
I0328 08:58:26.923732       1 operator.go:593] Found 0 rebooted nodes

And from the other pods I see the following:

$ k logs -n reboot-coordinator flatcar-linux-update-agent-5swm4
I0328 08:40:48.605642       1 main.go:84] /bin/update-agent running
I0328 08:40:48.605698       1 agent.go:145] Setting info labels
I0328 08:40:48.631998       1 agent.go:151] Checking annotations
I0328 08:40:48.634019       1 agent.go:174] Setting annotations map[string]string{"flatcar-linux-update.v1.flatcar-linux.net/reboot-in-progress":"false", "flatcar-linux-update.v1.flatcar-linux.net/reboot-needed":"false"}
I0328 08:40:48.683963       1 agent.go:212] Waiting for ok-to-reboot from controller...
I0328 08:40:48.684309       1 agent.go:362] Beginning to watch update_engine status
I0328 08:40:48.685236       1 agent.go:306] Updating status
I0328 08:40:48.685250       1 agent.go:319] Indicating a reboot is needed

k logs -n reboot-coordinator flatcar-linux-update-agent-qgcrk
I0328 08:40:48.484998       1 main.go:84] /bin/update-agent running
I0328 08:40:48.486076       1 agent.go:145] Setting info labels
I0328 08:40:48.519265       1 agent.go:151] Checking annotations
I0328 08:40:48.525350       1 agent.go:174] Setting annotations map[string]string{"flatcar-linux-update.v1.flatcar-linux.net/reboot-in-progress":"false", "flatcar-linux-update.v1.flatcar-linux.net/reboot-needed":"false"}
I0328 08:40:48.552000       1 agent.go:212] Waiting for ok-to-reboot from controller...
I0328 08:40:48.552279       1 agent.go:362] Beginning to watch update_engine status
I0328 08:40:48.562810       1 agent.go:306] Updating status

k logs -n reboot-coordinator flatcar-linux-update-agent-v277b
I0328 08:40:48.488548       1 main.go:84] /bin/update-agent running
I0328 08:40:48.492439       1 agent.go:145] Setting info labels
I0328 08:40:48.517376       1 agent.go:151] Checking annotations
I0328 08:40:48.520167       1 agent.go:174] Setting annotations map[string]string{"flatcar-linux-update.v1.flatcar-linux.net/reboot-in-progress":"false", "flatcar-linux-update.v1.flatcar-linux.net/reboot-needed":"false"}
I0328 08:40:48.536459       1 agent.go:212] Waiting for ok-to-reboot from controller...
I0328 08:40:48.536617       1 agent.go:362] Beginning to watch update_engine status
I0328 08:40:48.538043       1 agent.go:306] Updating status
I0328 08:41:28.946436       1 agent.go:306] Updating status
I0328 08:41:29.075044       1 agent.go:306] Updating status
I0328 08:41:36.012535       1 agent.go:306] Updating status
I0328 08:42:06.428615       1 agent.go:306] Updating status
I0328 08:42:11.368669       1 agent.go:306] Updating status
I0328 08:42:11.369343       1 agent.go:319] Indicating a reboot is needed

EDIT:

But if I remove the reboot window the logs changes instantly:

k logs -n reboot-coordinator flatcar-linux-update-operator-c4f798f44-t2v8c -f
I0328 09:17:16.639694       1 main.go:108] /bin/update-operator running
I0328 09:17:16.643182       1 leaderelection.go:248] attempting to acquire leader lease reboot-coordinator/flatcar-linux-update-operator-lock...
I0328 09:17:16.689928       1 leaderelection.go:258] successfully acquired lease reboot-coordinator/flatcar-linux-update-operator-lock
I0328 09:17:17.651123       1 operator.go:593] Found 0 rebooted nodes
I0328 09:17:18.051270       1 operator.go:535] Found 1 nodes that need a reboot
I0328 09:17:48.527126       1 operator.go:593] Found 0 rebooted nodes
I0328 09:17:49.110917       1 operator.go:508] Found node "kkp-test-core-cp-2" still rebooting, waiting
I0328 09:17:49.111901       1 operator.go:511] Found 1 (of max 1) rebooting nodes; waiting for completion
I0328 09:17:49.112073       1 operator.go:535] Found 0 nodes that need a reboot
I0328 09:18:19.166212       1 operator.go:593] Found 0 rebooted nodes
I0328 09:18:19.320955       1 operator.go:508] Found node "kkp-test-core-cp-2" still rebooting, waiting
I0328 09:18:19.320970       1 operator.go:511] Found 1 (of max 1) rebooting nodes; waiting for completion
I0328 09:18:19.321507       1 operator.go:535] Found 0 nodes that need a reboot

@steled steled changed the title Configuring reboot window using environment variables as documented is not working Configuring reboot window using flags as documented is not working Mar 28, 2023
@steled
Copy link
Author

steled commented Mar 28, 2023

Ok, I think I found the problem...
The time in the pod is GMT but we are at GMT +2

So 2 questions:

Is it possible to update the timezone for the pod?
Is it possible to set the reboot window only for office working hours, for example:

UPDATE_OPERATOR_REBOOT_WINDOW_START: Mon, Tue, Wed, Thu, Fri 09:00
UPDATE_OPERATOR_REBOOT_WINDOW_LENGTH: 8h

@invidian
Copy link
Member

Is it possible to update the timezone for the pod?

I don't know how it works in Kubernetes, but normally servers use UTC time for uniformity. I guess it's up to the host OS configuration.

Is it possible to set the reboot window only for office working hours, for example:

Not at the moment, as this would imply multiple windows and right now only one window is supported.

Probably this whole feature could be given some thought and improved, as existing implementation is already overly complex.

The time in the pod is GMT but we are at GMT +2

👍 We already have timestamp in logs, I think that should help spotting this kind of errors.

invidian added a commit that referenced this issue Mar 28, 2023
Their support has never been implemented.

Refs #193

Signed-off-by: Mateusz Gozdek <[email protected]>
@steled
Copy link
Author

steled commented Mar 28, 2023

Ok, I opened [RFE] Add support for multiple reboot windows

This ticket than can be closed...

Thanks for your support 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

2 participants