-
Notifications
You must be signed in to change notification settings - Fork 123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: tolerate failed checkins / ready #392
base: main
Are you sure you want to change the base?
Conversation
Making checkin process optional (default to activated)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Over all this looks very reasonable for static/predictable clusters.
Thanks a lot for you contribution @redref!
Please see the suggestions and regenerate the docs and then we'll be good to merge!
cmd/kg/main.go
Outdated
@@ -134,6 +135,7 @@ var ( | |||
|
|||
func init() { | |||
cmd.Flags().StringVar(&backend, "backend", k8s.Backend, fmt.Sprintf("The backend for the mesh. Possible values: %s", availableBackends)) | |||
cmd.Flags().BoolVar(&checkIn, "check-in", true, "Should kilo regularly check-in in backend") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cmd.Flags().BoolVar(&checkIn, "check-in", true, "Should kilo regularly check-in in backend") | |
cmd.Flags().BoolVar(&checkIn, "check-in", true, "Should Kilo regularly check in with the backend") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
cmd/kgctl/main.go
Outdated
@@ -119,6 +120,7 @@ func main() { | |||
SilenceErrors: true, | |||
} | |||
cmd.PersistentFlags().StringVar(&backend, "backend", k8s.Backend, fmt.Sprintf("The backend for the mesh. Possible values: %s", availableBackends)) | |||
cmd.PersistentFlags().BoolVar(&checkIn, "check-in", true, "Should kilo consider check-in (LastSeen) in backend") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cmd.PersistentFlags().BoolVar(&checkIn, "check-in", true, "Should kilo consider check-in (LastSeen) in backend") | |
cmd.PersistentFlags().BoolVar(&checkIn, "check-in", true, "Should Kilo prune nodes that have not checked in with the backend") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
var checkedIn bool | ||
if (n != nil) && (n.Key != wgtypes.Key{}) && (n.Subnet != nil) && (n.CheckLastSeen) { | ||
checkedIn = time.Now().Unix()-n.LastSeen < int64(checkInPeriod)*2/int64(time.Second) | ||
} else { | ||
checkedIn = true | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to add more conditions to the check-in logic.
Let's keep this scoped to the change at hand for now. If we need to add more safety let's consider that in a different PR / issue.
var checkedIn bool | |
if (n != nil) && (n.Key != wgtypes.Key{}) && (n.Subnet != nil) && (n.CheckLastSeen) { | |
checkedIn = time.Now().Unix()-n.LastSeen < int64(checkInPeriod)*2/int64(time.Second) | |
} else { | |
checkedIn = true | |
} | |
var checkedIn bool | |
if n.CheckLastSeen { | |
checkedIn = time.Now().Unix()-n.LastSeen < int64(checkInPeriod)*2/int64(time.Second) | |
} else { | |
checkedIn = true | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those tests are here to protect against nil values.
They do not add any logic as they are also in the return statement and would return false anyway.
That was my first implem, but it did not pass the tests. I agree it is not pretty now (return one liner), but did not wanted to refactor too much.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agh yes, I woke up in the middle of the night today and literally said "I understand now". I have a little refactor in mind that might be more legible
Making checkin process optional (default to activated)
We quite often have apiserver issues which does not allow some pods to update their last-seen annotation, thus some kilo mesh nodes going "unready" and flapping wireguard configuration(s).
With this setup (currently in test our side), we have a more static setup over time. It also reduces calls to apiserver.
I do not see any drawback, but maybe I just lack imagination 😉