-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Configuration via annotations #64
base: master
Are you sure you want to change the base?
Conversation
func (rule *chaos) load() (bool, string, error) { | ||
value, active := os.LookupEnv(envChaosChance) | ||
if !active { | ||
explicit := explicitLoad(ruleChaos) | ||
value, hasDefault := os.LookupEnv(envChaosChance) | ||
if !explicit && !hasDefault { | ||
return false, "", nil | ||
} | ||
chance, err := strconv.ParseFloat(value, 64) | ||
if err != nil { | ||
if !explicit && err != nil { | ||
return false, "", fmt.Errorf("invalid chaos chance %s", err) | ||
} | ||
rule.chance = chance | ||
return true, fmt.Sprintf("chaos chance %s", value), nil | ||
|
||
if rule.chance != 0 { | ||
return true, fmt.Sprintf("chaos chance %s", value), nil | ||
} | ||
return true, fmt.Sprint("chaos (no default)"), nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want to make sure I understand the desired behavior here, as there are 4 possible cases based on explicit load and default value.
- NO explicit load and NO default value -- don't load the rule
- NO explicit load but default value -- load the rule (previous behavior) -- this case honors the annotation overrides?
- explicit load but NO default value -- load the rule, but the rule would also say "don't reap" unless the pod had an annotation override and the condition for that override was met?
- explicit load but default value -- loads the rule and and default value, but honors the annotation overrides.
Wondering if we need a way to determine whether or not the reaper should honor the annotations? I guess the concern is making sure that if the reaper isn't intended to be overwritten that it can't be. Maybe I'm missing that piece here (I'm still looking!)
There might also be a case where explicit loading of the rule is set but there's also a value that isn't parsible as a float (i.e. lizard
) here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are correct for all 4 cases. If a rule is loaded (explicitly or with a default value), it will always check for an annotation and use the annotation's value if found. I don't think we need an option to explicitly enable annotations. It should be safe to assume that if a user added a pod-reaper/*
annotation, they intend to opt-in to annotation override. If a pod is not annotated, then pod-reaper will behave the same as it does now.
The annotation is parsed in ShouldReap()
. If a pod is annotated with pod-reaper/chaos-chance: lizard
, a warning will be logged, and the behavior will fall back to default value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The chaos rule might be a weird edge case since all enabled rules must return true. If you were to enable chaos without a default, it would prevent any pod which was not annotated from being reaped. I would prefer that a pod is reaped if any rule returns true, but that would be a potentially breaking change. We could also implement reap/spare logic like you mentioned in #44 (comment), but I wanted to keep things simple if possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, reaping if any rule returns true would be breaking. If it operator by reaping when any rule returned true there wouldn't be a way to get an "or" function between multiple rules. How I've done the "reaping on any rule = true" situation was to make multiple reapers, each with only one rule.
I'm looking at #45 and wondering why I didn't merge that... I think it was from lack of a code review. Every way I look at this I see crazy edge cases, which makes me really apprehensive. But I think I'm going to take a stab at what it might look like off of #45 (I try stuff like this a lot, and most of it I just throw out!) |
pushed up a container image: This was based off #45 and has an environment variable called I have some higher level functional tests that I still need to run, but I might not be able to get to that tonight. |
The |
I think I figured out why I'm having a hard time with this one... Not from a difficulty to implement perspective, but from a product perspective: the complexity of this feature is higher than anything I every thought pod-reaper would do... Each time I try to implement this, I end up with something that works for most cases, but there always seems to be one that doesn't fit.
If pod annotations aren't opt-in, then the manager of the reaper won't know (outside of logs and a lot of manual comparing) what's being overridden. This would be true for the very first use case of pod-reaper. It was originally designed to cleanup CI jobs running on a shared cluster when they were taking way to long or got stuck in an crash loop. In that case, we did not want anyone to every be able to override it. But I think there's a more fundamental disagreement with allowing annotations to override. It just splits who's responsible for what. In some cases, the configuration is annotations, the other it's in the environment of the reaper. So when we're using annotations, there are the following things that have to considered for at least 1 (but probably more) rules:
I guess, in my head, this is the feature that crossed the line of "simple" to "complex" -- which brings me to another... not great point. It very much looks like I'm the only one maintaining this... and I honestly don't use it my day to day anymore. Changes in jobs has me predominantly working with teams that are behind on the kubernetes adoption train. That's leaving me feeling more and more like I don't have "skin in this game" and shouldn't be making the product level decisions anyway :( |
Even though annotations are always checked, I would argue that they are still opt-in.
The case where the pod and pod-reaper owners are different and the pod-reaper owner does not want to allow override is interesting. If you are using pod reaper with multiple rules activated, then an annotation could allow the pod owner to evade getting killed. In my opinion, the pod owner should get the final say, but I guess that's ultimately a decision for the product owner. We are running our own fork of pod-reaper with the annotations already compiled in. If you want to leave this open until someone else adopts it, you're not holding us back. If the project is no longer actively being maintained, we may continue adding features in our fork without pushing them back upstream. |
Resolves #44.
Rule configuration may be overridden by annotations on individual pods. For single-value rules, the configured rule value will be replaced by the annotation value. For multi-value rules, annotations will be added to the configured rule values. See Implemented Rules for available annotations.
For backwards compatibility and minimum possible side effects, I opted to load configured rules as usual. I added a
RULES
environment variable to load rules explicitly without a configuration. See RULES. I'm open to other ideas for how to load rules, but this was the quickest way forward. There was some discussion in #44 (comment) on this.Example:
This configuration will load the Unready rule with 5m duration and the Duration rule with no configuration. Only pods decorated with
pod-reaper/max-duration
will be reaped by the Duration rule.