-
Notifications
You must be signed in to change notification settings - Fork 211
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change in KurReD Behaviour between 1.15 and 1.16 #1066
Comments
Hello! Thanks for the kind words.
|
I will, however, dig deeper into what happened... should my memory allow :) |
Just to be sure we are talking about the same thing. Did you mean: Also, did you try in 1.16.0/1.16.1 for making the bisecting easier? |
Hello, With following configuration:
KuReD 1.16.2 writes in 1 min cycle in info-log:
and stops logging anything outside the configured reboot time. KuReD 1.15.1 and 1.16.1! writes more logs and do this all the time regardless of the configured reboot time:
So I assume the change is in 1.16.2 version. |
I will dig into this. |
I tried to reproduce this in 1.16.1 and 1.16.2, and in both cases I don't have extra logs. See also: https://gist.github.com/evrardjp/7b503a8a5079ab200de11bd5e79cb466 I can dig deeper if you wish. I would be happy to have users opinion here: |
Closing the issue because I don't think there is anything actionnable in the current state, but feel free to comment, and we'll improve in our next releases. Especially if you want to explain the behaviour you're expecting in more details (your use case), that would be perfect! |
So, I've investigated the behavior of both 1.6.1 and 1.6.2 version with following results. Both version have two cycles: one for metrics another for reboots. Metrics cycle has an 1 minute interval and runs always, and reboot cycle interval and run times are configurable. Additional logs I mentioned previously are the metrics cycle logs for the case when need-restarting command have been used. They are visible in version 1.6.1 and not visible in version 1.6.2. Version 1.6.1 with `--reboot-sentinel-command=sh -c "! /usr/bin/needs-restarting --reboothint"`` ------metric logs begin ----- Version 1.6.2 with `--reboot-sentinel-command=sh -c "! /usr/bin/needs-restarting --reboothint"`` ------reboot logs begin ----- We have build our elastic dashboard for reboot notifications based on the behavior of the version 1.6.1 and analyzing need-restarting logs from metrics cycle which are not visible since 1.6.2. It looks like a very edge case and more generic approach would be to use metrics to monitor if reboot is required. Could you please confirm my assumptions? Are metrics being calculated regardless of the reboot interval config? |
You're correct in the fact that there is currently two go routines - one to keep metrics done by default every minute and not configurable, one to keep the rest of the code. In that refactor, the intent is to have the test whether a reboot is required quite early, and expose the metric. The usage and behaviour would then depend on the maitnenance window. But that's for the future. I didn't believe it was useful to keep the log to say "reboot not required", BUT I will make sure it's still there in the rewrite, you have a good use case. Interestingly, I didn't see the metrics logs in my test BUT I realised I don't have the same vars: I don't have |
Hello,
firstly I would like to thank all the tool maintainers for doing a great job. We are using KuReD to reboot our RHEL nodes and it worked well most of the time.
Lastly we have tried to upgrade KuReD from 1.15.1 to 1.16.2 and noticed one important behavior change. Old version of KuReD run reboot-checks all the time, regardless of scheduled reboot time. This was very convenient for production clusters to get notified by KuReD about required reboot but not reboot servers outside of maintenance window. Since Version 1.16 the behavior have changed. Now KuReD only verifies if reboot is needed within defined reboot time interval.
Was this made intentionally? I have failed to find an corresponding issue, so I am missing the discussion about this :-(
It is not a big deal, however IMHO an important behavioral change and must be highlighted somehow.
The text was updated successfully, but these errors were encountered: