chore: Kubernetes Integration RFC #2222

binarylogic · 2020-04-04T18:15:28Z

As noted in the RFC, this is retroactive RFC intended to ensure we deliver a high-quality Kubernetes integration. When we originally started on this integration we did not have an RFC process and I think we missed the opportunity to appropriately address corner cases and establish consensus on our approach. We also have the benefit of hindsight; we've learned a lot of as we've made progress on this integration. Therefore, I'd like for this RFC to serve not only as an audit for everything we've done but an opportunity for us to reapproach our strategy with everything we know.

Finally, I'd like to point out that the work up to this point has been excellent. I'm very happy with the progress we've made so far. This is more about bringing a new perspective in an effort to QA our approach.

Ref #260 (our previous attempt at an RFC)
Closes #2221

github-actions · 2020-04-04T18:15:48Z

Great PR! Please pay attention to the following items before merging:

Files matching rfcs/**:

Have at least 3 team members approved this RFC?

This is an automatically generated QA checklist based on modified files

binarylogic · 2020-04-04T18:21:43Z

I would like for @MOZGIII to finish this RFC. @LucioFranco and @ktff you are welcome to chime in, but it might be better to wait for more progress.

Alexx-G

Sorry for the spam, I hope some of my feedback will be useful.
Also, I believe this one #2070 should be mentioned here (although it doesn't seem to be a blocker). However, without it "centralized" topology might become less predictable when both agent and the central services are deployed in K8S (it might be a common pattern to optimize connections to sinks for huge clusters).

rfcs/2020-04-04-2221-kubernetes-integration.md

LucioFranco · 2020-04-06T19:41:32Z

@Alexx-G we very much appreciate the feedback!! Thank you!

rfcs/2020-04-04-2221-kubernetes-integration.md

binarylogic · 2020-04-08T16:23:01Z

@MOZGIII, here are some interesting issues in other projects. Addressing these should help us avoid bad design decisions:

Namespace support for fluentbit.io/exclude annotation fluent/fluent-bit#1140 - log exclusion
Fluent Bit keeps Windows k8s pods from shutting down fluent/fluent-bit#2027
mutiline logs ouput unexpected format on k8s fluent/fluent-bit#2013 - looks like you are already addressing this.
Support dynamic configuration fluent/fluent-bit#365 (comment) - config reloading
handle multiline from nested JSON strings fluent/fluent-bit#337 - multi-line messages, many people complaining are on k8s.

MOZGIII · 2020-04-08T17:08:22Z

Thanks @binarylogic! I'll cross-reference those with our RFC and add the concerns that are missing.

I'm still in progress of writing up the design considerations, but there are a few new sections already available for review.

MOZGIII · 2020-04-10T11:52:28Z

Converting this to draft for now. Will mark as "Ready for review" again when it's complete.

MOZGIII · 2020-04-29T16:43:32Z

The introduction of the RFC mentions “We had planned to perform a 3rd party audit on the integration”, but there is nowhere else in the RFC mentioning this. It might be good to make this an explicit requirement of this feature (or remove it from the intro).

I'm not sure how to address this. Good catch though! @binarylogic, do you have thoughts on this?
I can request peer reviews for the people I know that might be interested, but probably not in the form of a formal audit.

binarylogic · 2020-04-29T16:49:38Z

@MOZGIII you are the 3rd party audit :). I think we're fine on our approach here. We've been very thorough.

Signed-off-by: MOZGIII <[email protected]>

MOZGIII · 2020-04-29T17:02:13Z

I wonder if there’s any downside to working in the opposite direction: start with supporting the latest version at first, and then let customer feedback drive extra work to make our set-up work with earlier versions (up to a point, and that point should probably follow the version skew support policy you mentioned).

We pretty much determine which k8s API version we use. It's unlikely that we'll use any new features, but setting a reasonable non-latest MSKV is a pretty much zero-effort increase of the volume of the supported versions. If it'll cause issues during the implementation - I'll probably bump MSKV, but it seems like 1.14 is optimal in an effort-to-value ratio.

MOZGIII · 2020-04-29T17:52:11Z

Fell free to skip this comment - I'm just providing a write up for completeness.

You mention Docker JSON file logging driver and CRI. Are we 100% sure the Docker format is still in use in the latest version(s) of Kubernetes? The migration to CRI has been ongoing for quite a while, and it would simplify our set-up if we only had to support CRI, which is the long-term goal of Kubernetes itself as well.

It' in the kubernetes code at master as of 1.18. I'd suppose even that it's likely the most popular format currently.

Partial event merging

You mention providing support to disable “partial event merging”. Is there a good use-case for supporting this tuneable? Should we postpone this until after v1?

It should be easy to add in the initial implementation.

Reading container log files

You mention “Kubernetes log files is a de-facto well-settled interface, that we should be able to use reliably.”. Did we do any verification how other systems handle this (e.g. Fluentd)? If so, it might make sense to amend this section to add more confidence in the statements made.

Well, it's pretty evident from the code of k8s itself (I linked the relevant code). fluentd uses the same approach.
The problem is there are no official statements on the stability of this interface from the k8s team - so the code is there, but there're no promises or guarantees.

Vector config file reloads

We could also link to the Reloader project to provide another alternative on how people can make sure their Vector config reloads when changed. Another one (which is basically what Reloader does) is to add the hash of the config file as an annotation on the pod, for example by having the config map JSON go through jq before applying to the Kubernetes API.

You're right about Reloader, it's a useful tool. But there's a wide variety of those - kustomize (and kubectl -k now) can do this, as well as Helm. The point of that section though is just we don't want to reload configmap values at runtime (i.e. don't use -w arg) - other than that usual reload concerns apply.
I'd leave this out as Helm Chart implementation detail.

Health probes

You mention readinessProbe, but I believe that one to be irrelevant for Vector, as Vector isn’t exposed through a service resource?

This is more important for Vector being ready to run as a k8s-integrated app in any configuration (not just the one that we recommend by default). I.e. not just with kubernetes source - but with socket or http source too. Those configs would probably be deployed as a Deployment instead of DaemonSet with a Service - and the Deployment might be replicated to cope with the load. So it's needed. We might skip it initially though - and ship another release that includes better integration for this use case.

You also mention livenessProbe, and discuss an HTTP endpoint to check the health of Vector. However, probes also support regular commands to execute in the container to check the health of an application, which might be more appropriate here? (e.g. letting Vector report its health based on a signal, or in some other way).

It's an implementation detail - using signals is also possible. The problem with signals is they're preemptive - so it's harder to actually ensure that a process doesn't hang. For example, if a process is stuck in a deadlock - signals might still work, while HTTP server involves more internal components by default - and thus provides higher confidence (again - by default, i.e. with a naive implementation).

Kubernetes API server availability

You mention “intelligently handle k8s API server being down” w.r.t. filter metadata used by Vector getting out-of-sync. I wonder if we should start with three failure mode preferences the customer can choose from (and we set a sane default):

drop all events
filter events based on last known filter metadata
stall events until buffer overflow

This is very interesting. I feel like we only need to provide the filter events based on last known filter metadata option. I don't like either dropping events or stall events until buffer overflow because if the kube-apiserver is gone, then kubelet doen't get any updates from it either - and it operates with it's known last state. Thus we should not be safe by using the last cached data - but also roughly correct, since it's the same state that kubelet saw. Of course, races are possible, and the states at Vector cache and kubelet may diverge - but that's the alternatives don't solve this issue either.
@JeanMertz If you still think we should offer dropping events or stall events until buffer overflow options - let's discuss!

I think this is a great source of data to expose. I do wonder if we have considered splitting this source up into multiple sources, e.g. kubernetes_pods and kubernetes_api or something, as I do think they are distinct from each other.

I was thinking about renaming kubernetes source to kubernetes_logs. A source to collect Kubernetes API events would definitely be a good candidate for implementing separately - because the deployment topology is different (we need to deploy it only once, rather than on every node). So, yeah, the most natural solution I see would be to have kubernetes_logs source and kubernetes_api.
But we can think about it later.

For example, having Vector run as a daemonset to collect pod logs, have another Vector deployed as a single-pod deployment to collect metrics from the Kubernetes API, and potentially have a final Vector instance that receives events from the other Vector instances and applies user-configured transformations and sends the logs to the configured sinks.

Yes, I'm thinking about those cases. They're all outside of the scope of the initial integration, but I'm trying to account for them so that we don't get into a position from which we can't advance. So far, I think it's going well. I mentioned some of the aspects - but there are a lot of those that I omitted from this RFC - so we can focus on a narrower feature set for now.

I.e. --config *.toml Signed-off-by: MOZGIII <[email protected]>

…an RFC Signed-off-by: MOZGIII <[email protected]>

The should've been at the outstanding questions in the first place Signed-off-by: MOZGIII <[email protected]>

Signed-off-by: MOZGIII <[email protected]>

binarylogic · 2020-04-30T15:09:26Z

Noting my approval. I cannot approve since I opened the PR, but 👍 from me.

binarylogic · 2020-04-30T17:41:01Z

Noting: I spoke with @lukesteensen and @JeanMertz separately and both agree that we should move forward with implementation.

lukesteensen

Retroactive ✔️ for the plan of attack

Signed-off-by: Brian Menges <[email protected]>

binarylogic requested a review from MOZGIII April 4, 2020 18:20

binarylogic assigned MOZGIII Apr 4, 2020

This was referenced Apr 4, 2020

Test configuration reloading on Kubernetes #2224

Closed

Centralize building of Vector in one step on GH actions #2225

Closed

feat(platforms): Add helm chart for Vector installation #2232

Closed

Hoverbear added the transform: kubernetes_pod_metadata label Apr 5, 2020

binarylogic mentioned this pull request Apr 5, 2020

fix(kubernetes source)!: Use Pod's UID to exclude Vector's logs #2188

Closed

Alexx-G reviewed Apr 6, 2020

View reviewed changes

MOZGIII force-pushed the k8s-rfc branch 3 times, most recently from 90ba69f to 692bbbb Compare April 6, 2020 21:30

MOZGIII reviewed Apr 7, 2020

View reviewed changes