From b51b5fb229ad51a7d27c667d2861f69822dc2acf Mon Sep 17 00:00:00 2001 From: binarylogic Date: Sat, 4 Apr 2020 12:49:42 -0400 Subject: [PATCH 001/118] chore: Kubernetes Integration RFC Signed-off-by: binarylogic Signed-off-by: MOZGIII --- .../2020-04-04-2221-kubernetes-integration.md | 214 ++++++++++++++++++ .../vector-daemonset.yaml | 126 +++++++++++ 2 files changed, 340 insertions(+) create mode 100644 rfcs/2020-04-04-2221-kubernetes-integration.md create mode 100644 rfcs/2020-04-04-2221-kubernetes-integration/vector-daemonset.yaml diff --git a/rfcs/2020-04-04-2221-kubernetes-integration.md b/rfcs/2020-04-04-2221-kubernetes-integration.md new file mode 100644 index 0000000000000..624f6d14c9330 --- /dev/null +++ b/rfcs/2020-04-04-2221-kubernetes-integration.md @@ -0,0 +1,214 @@ +# RFC 2221 - 2020-04-04 - Kubernetes Integration + +This RFC outlines how the Vector will integration with Kubernetes (k8s). + +**Note: This RFC is retroactive and meant to seve as an audit to complete our +Kubernetes integration. At the time of writing this RFC, Vector has already made +considerable progress on it's Kubernetes integration. It has a `kubernetes` +source, `kubernetes_pod_metadata` transform, an example daemonset file, and the +ability automatically reload configuration when it changes. The fundamental +pieces are mostly in place to complete this integration, but as we approach +the finish line we're being faced with deeper questions that heavily affect the +UX. Such as how to properly deploy Vector and exclude it's own logs ([pr#2188]). +We had planned to perform a 3rd party audit on the integration before +announcement and we've decided to align this RFC with that process.** + +## Motivation + +Kubernetes is arguably the most popular container orchestration framework at +the time of writing this RFC; many large companies, with large production +deployments, depend heavily on Kubernetes. Kubernetes handles log collection +but does not facilitate shipping. Shipping is meant to be delegated to tools +like Vector. This is precisely the use case that Vector was built for. So, +the motivation is three-fold: + +1. A Kubernetes integration is essential to achieving Vector's vision of being + the dominant, single collector for observability data. +2. This will inherently attract large, valuable users to Vector since Kubernetes + is generally used with large deployments. +3. It is currently the #1 requested feature of Vector. + +## Guide-level Proposal + +**Note: This guide largely follows the format of our existing guides +([example][guide_example]). There are two perspectives to our guides: 1) A new +user coming from Google 2) A user that is familar with Vector. This guide is +from perspective 2.** + +This guide covers integrating Vector with Kubernetes. We'll touch on the basic +concepts of deploying Vector into Kubernetes and walk through our recommended +[strategy](#strategy). By the end of this guide you'll have a single, +lightweight, ultra-fast, and reliable data collector ready to ship your +Kubernetes logs and metrics to any destination you please. + +### Strategy + +#### How This Guide Works + +Our recommended strategy deploys Vector as a Kubernetes [daemonset]. This is +the most efficient means of collecting Kubernetes observability data since +Vector is guaranteed to deploy _once_ on each of your Pods. In addition, +we'll use the [`kubernetes_pod_metadata` transform][kubernetes_pod_metadata_transform] +to enrich your logs with Kubernetes context. This transform interacts with +the Kubernetes watch API to collect cluster metadata and update in real-time +when things change. The following diagram demonstrates how this works: + +TODO: insert diagram + +### What We'll Accomplish + +* Collect data from each of your Kubernetes Pods + * Ability to filter by container name, Pod IDs, and namespaces. + * Automatically merge logs that Kubernetes splits. + * Enrich your logs with useful Kubernetes context. +* Send your logs to one or more destinations. + +### Tutorial + +#### Kubectl Interface + +1. Configure Vector: + + Before we can deplo Vector we must configure. This is done by creating + a Kubernetes `ConfigMap`: + + ...insert selector to select any of Vector's sinks... + + ```bash + echo ' + apiVersion: v1 + kind: ConfigMap + metadata: + name: vector-config + namespace: logging + labels: + k8s-app: vector + data: + vector.toml: | + # Docs: https://vector.dev/docs/ + + # Set global options + data_dir = "/var/tmp/vector" + + # Ingest logs from Kubernetes + [sources.kubernetes] + type = "kubernetes" + + # Enrich logs with Pod metadata + [transforms.pod_metadata] + type = "kubernetes_pod_metadata" + inputs = ["kubernetes"] + + # Send data to one or more sinks! + [sinks.aws_s3] + type = "aws_s3" + inputs = ["pod_metadata"] + bucket = "my-bucket" + compression = "gzip" + region = "us-east-1" + key_prefix = "date=%F/" + ' > vector-configmap.toml + ``` + +2. Deploy Vector! + + Now that you have your custom `ConfigMap` ready it's time to deploy + Vector. To ensure Vector is isolated and has the necessary permissions + we must create a `namespace`, `ServiceAccount`, `ClusterRole`, and + `ClusterRoleBinding`: + + ```bash + kubectl create namespace logging + kubectl create -f vector-service-account.yaml + kubectl create -f vector-role.yaml + kubectl create -f vector-role-binding.yaml + kubectl create -f vector-configmap.yaml + kubectl create -f vector-daemonset.yaml + ``` + + * *See [outstanding questions 2, 3, 4, 5, and 6](#outstanding-questions).* + + That's it! + +#### Helm Interface + +TODO: fill in + +## Prior Art + +1. [Filebeat k8s integration] +1. [Fluentbit k8s integration] +2. [Fluentd k8s integration] +3. [LogDNA k8s integration] +4. [Honeycomb integration] +3. [Bonzai logging operator] - This is approach is likely outside of the scope + of Vector's initial Kubernetes integration because it focuses more on + deployment strategies and topologies. There are likely some very useful + and interesting tactics in their approach though. +4. [Influx Helm charts] + +## Sales Pitch + +See [motivation](#motivation). + +## Drawbacks + +1. Increases the surface area that our team must manage. + +## Alternatives + +1. Not do this integration and rely solely on external community driven + integrations. + +## Outstanding Questions + +1. What is the best to avoid Vector from ingesting it's own logs? +2. I've seen two different installation strategies. For example, Fluentd offers + a [single daemonset configuration file][fluentd_daemonset] while Fluentbit + offers [four separate configuration files][fluentbit_installation] + (`service-account.yaml`, `role.yaml`, `role-binding.yaml`, `configmap.yaml`). + Which approach is better? Why are they different? +3. Should we prefer `kubectl create ...` or `kubectl apply ...`? The examples + in the [prior art](#prior-art) section use both. +4. From what I understand, Vector requires the Kubernetes `watch` verb in order + to receive updates to k8s cluster changes. This is required for the + `kubernetes_pod_metadata` transform. Yet, Fluentbit [requires the `get`, + `list`, and `watch` verbs][fluentbit_role]. Why don't we require the same? +5. What is `updateStrategy` ... `RollingUpdate`? This is not included in + [our daemonset][vector_daemonset] or in [any of Fluentbit's config + files][fluentbit_installation]. But it is included in both [Fluentd's + daemonset][fluentd_daemonset] and [LogDNA's daemonset][logdna_daemonset]. +6. I've also noticed `resources` declarations in some of these config files. + For example [LogDNA's daemonset][logdna_daemonset]. I assume this is limiting + resources. Do we want to consider this? +7. What the hell is going on with [Honeycomb's integration + strategy][Hoenycomb integration]? :) It seems like the whole "Heapster" + pipeline is specifically for system events, but Heapster is deprecated? + This leads me to my next question... +8. How are we collecting Kubernetes system events? Is that outside of the + scope of this RFC? And why does this take an entirely different path? + (ref [issue#1293]) +9. What are some of the details that sets Vector's Kubernetes integration apart? + This is for marketing purposes and also helps us "raise the bar". + +## Plan Of Attack + +- [ ] + +[Bonzai logging operator]: https://github.com/banzaicloud/logging-operator +[daemonset]: https://kubernetes.io/docs/concepts/workloads/controllers/daemonset/ +[Filebeat k8s integration]: https://www.elastic.co/guide/en/beats/filebeat/master/running-on-kubernetes.html +[Fluentbit k8s integration]: https://docs.fluentbit.io/manual/installation/kubernetes +[fluentbit_daemonset]: https://raw.githubusercontent.com/fluent/fluent-bit-kubernetes-logging/master/output/elasticsearch/fluent-bit-ds.yaml +[fluentbit_installation]: https://docs.fluentbit.io/manual/installation/kubernetes#installation +[fluentbit_role]: https://raw.githubusercontent.com/fluent/fluent-bit-kubernetes-logging/master/fluent-bit-role.yaml +[Fluentd k8s integration]: https://docs.fluentd.org/v/0.12/articles/kubernetes-fluentd +[fluentd_daemonset]: https://github.com/fluent/fluentd-kubernetes-daemonset/blob/master/fluentd-daemonset-papertrail.yaml +[guide_example]: https://vector.dev/guides/integrate/sources/syslog/aws_kinesis_firehose/ +[Honeycomb integration]: https://docs.honeycomb.io/getting-data-in/integrations/kubernetes/ +[Influx Helm charts]: https://github.com/influxdata/helm-charts +[issue#1293]: https://github.com/timberio/vector/issues/1293 +[LogDNA k8s integration]: https://docs.logdna.com/docs/kubernetes +[logdna_daemonset]: https://raw.githubusercontent.com/logdna/logdna-agent/master/logdna-agent-ds.yaml +[pr#2188]: https://github.com/timberio/vector/pull/2188 +[vector_daemonset]: 2020-04-04-2221-kubernetes-integration/vector-daemonset.yaml diff --git a/rfcs/2020-04-04-2221-kubernetes-integration/vector-daemonset.yaml b/rfcs/2020-04-04-2221-kubernetes-integration/vector-daemonset.yaml new file mode 100644 index 0000000000000..f116d8aa09b29 --- /dev/null +++ b/rfcs/2020-04-04-2221-kubernetes-integration/vector-daemonset.yaml @@ -0,0 +1,126 @@ +# WARNING! +# +# DO NOT USE THIS DAEMONSET. THIS IS AN EXAMPLE DAEMONSET USED DURING +# VECTOR'S KUBERNETES RFC PROCESS. + +# Everything related to vector will live under the `telemetry` namespace. +apiVersion: v1 +kind: Namespace +metadata: + name: telemetry +--- +# Permissions to use Kubernetes API. +# Necessary for kubernetes_pod_metadata transform. +# Requires that RBAC authorization is enabled. +apiVersion: rbac.authorization.k8s.io/v1 +kind: ClusterRoleBinding +metadata: + name: vector-permissions +subjects: +- kind: ServiceAccount + name: default + namespace: telemetry +roleRef: + kind: ClusterRole + name: view + apiGroup: rbac.authorization.k8s.io +--- +# ConfigMap which contains vector.toml configuration for pods. +# +# This can also be removed and loaded from a file via `kubectl`. +apiVersion: v1 +kind: ConfigMap +metadata: + name: vector-config + namespace: telemetry +data: + vector-agent-config: | + # file: vector.toml + # Configuration for vector-agent + # Docs: https://vector.dev/docs/ + + # Set global options + data_dir = "/var/tmp/vector" + + # Ingest logs from Kubernetes + [sources.kubernetes] + type = "kubernetes" + + # Enrich logs with Pod metadata + [transforms.pod_metadata] + type = "kubernetes_pod_metadata" + inputs = ["kubernetes"] + + # Add additional Vector transforms and sinks as desired! + # + # For example: + # + # [sinks.aws_s3] + # type = "aws_s3" + # inputs = ["pod_metadata"] + # bucket = "my-bucket" + # compression = "gzip" + # region = "us-east-1" + # key_prefix = "date=%F/" + + # This line is not in VECTOR.TOML +--- +# Vector agent ran on each node where it collects logs from pods. +apiVersion: apps/v1 +kind: DaemonSet +metadata: + name: vector-agent + namespace: telemetry +spec: + minReadySeconds: 1 + selector: + matchLabels: + name: vector-agent + template: + metadata: + labels: + name: vector-agent + # TODO: Modify this pod spec to include any extra configuration needed like + # secrets or dns. + spec: + volumes: + # Directory with logs + - name: var-log + hostPath: + path: /var/log/ + # Docker and containerd log files in Kubernetes are symlinks to this folder. + - name: var-lib + hostPath: + path: /var/lib/ + # Vector will store it's data here. + - name: data-dir + emptyDir: {} + # Mount vector configuration from config map as a file vector.toml + - name: config-dir + configMap: + name: vector-config + items: + - key: vector-agent-config + path: vector.toml + containers: + - name: vector + image: timberio/vector:latest-alpine + imagePullPolicy: Always + args: ["-w"] + volumeMounts: + - name: var-log + mountPath: /var/log/ + readOnly: true + - name: var-lib + mountPath: /var/lib + readOnly: true + - name: data-dir + mountPath: /var/tmp/vector + - name: config-dir + mountPath: /etc/vector + readOnly: true + env: + - name: VECTOR_NODE_NAME + valueFrom: + fieldRef: + fieldPath: spec.nodeName From c08133e2ba3185b2f00c51d7e4635934afc7f0c3 Mon Sep 17 00:00:00 2001 From: binarylogic Date: Sat, 4 Apr 2020 15:03:12 -0400 Subject: [PATCH 002/118] Fill out plan of attack Signed-off-by: binarylogic Signed-off-by: MOZGIII --- .../2020-04-04-2221-kubernetes-integration.md | 57 +++++++++++++++---- 1 file changed, 46 insertions(+), 11 deletions(-) diff --git a/rfcs/2020-04-04-2221-kubernetes-integration.md b/rfcs/2020-04-04-2221-kubernetes-integration.md index 624f6d14c9330..6aa3a193241df 100644 --- a/rfcs/2020-04-04-2221-kubernetes-integration.md +++ b/rfcs/2020-04-04-2221-kubernetes-integration.md @@ -126,7 +126,7 @@ TODO: insert diagram kubectl create -f vector-daemonset.yaml ``` - * *See [outstanding questions 2, 3, 4, 5, and 6](#outstanding-questions).* + * *See [outstanding questions 3, 4, 5, 6, and 7](#outstanding-questions).* That's it! @@ -162,38 +162,60 @@ See [motivation](#motivation). ## Outstanding Questions -1. What is the best to avoid Vector from ingesting it's own logs? -2. I've seen two different installation strategies. For example, Fluentd offers +1. What is the minimal Kubernetes version that we want to support. See + [this comment][kubernetes_version_comment]. +1. What is the best to avoid Vector from ingesting it's own logs? I'm assuming + that my [`kubectl` tutoria](#kubectl-interface) handles this with namespaces? + We'd just need to configure Vector to excluse this namespace? +1. I've seen two different installation strategies. For example, Fluentd offers a [single daemonset configuration file][fluentd_daemonset] while Fluentbit offers [four separate configuration files][fluentbit_installation] (`service-account.yaml`, `role.yaml`, `role-binding.yaml`, `configmap.yaml`). Which approach is better? Why are they different? -3. Should we prefer `kubectl create ...` or `kubectl apply ...`? The examples +1. Should we prefer `kubectl create ...` or `kubectl apply ...`? The examples in the [prior art](#prior-art) section use both. -4. From what I understand, Vector requires the Kubernetes `watch` verb in order +1. From what I understand, Vector requires the Kubernetes `watch` verb in order to receive updates to k8s cluster changes. This is required for the `kubernetes_pod_metadata` transform. Yet, Fluentbit [requires the `get`, `list`, and `watch` verbs][fluentbit_role]. Why don't we require the same? -5. What is `updateStrategy` ... `RollingUpdate`? This is not included in +1. What is `updateStrategy` ... `RollingUpdate`? This is not included in [our daemonset][vector_daemonset] or in [any of Fluentbit's config files][fluentbit_installation]. But it is included in both [Fluentd's daemonset][fluentd_daemonset] and [LogDNA's daemonset][logdna_daemonset]. -6. I've also noticed `resources` declarations in some of these config files. +1. I've also noticed `resources` declarations in some of these config files. For example [LogDNA's daemonset][logdna_daemonset]. I assume this is limiting resources. Do we want to consider this? -7. What the hell is going on with [Honeycomb's integration +1. What the hell is going on with [Honeycomb's integration strategy][Hoenycomb integration]? :) It seems like the whole "Heapster" pipeline is specifically for system events, but Heapster is deprecated? This leads me to my next question... -8. How are we collecting Kubernetes system events? Is that outside of the +1. How are we collecting Kubernetes system events? Is that outside of the scope of this RFC? And why does this take an entirely different path? (ref [issue#1293]) -9. What are some of the details that sets Vector's Kubernetes integration apart? +1. What are some of the details that sets Vector's Kubernetes integration apart? This is for marketing purposes and also helps us "raise the bar". ## Plan Of Attack -- [ ] +- [ ] Setup a proper testing suite for k8s. + - [ ] Support for customizable k8s clusters. See [issue#2170]. + - [ ] Stabilize k8s integration tests. See [isue#2193], [issue#2216], + and [issue#1635]. + - [ ] Ensure we are testing all supported minor versions. See + [issue#2223]. +- [ ] Audit and improve the `kubernetes` source. + - [ ] Handle the log recursion problem where Vector ingests it's own logs. + See [issue#2218] and [issue#2171]. + - [ ] Audit the `file` source strategy. See [issue#2199] and [issue#1910]. + - [ ] Merge split logs. See [pr#2134]. +- [ ] Audit and improve the `kubernetes_pod_matadata` transform. + - [ ] Use the `log_schema.kubernetes_key` setting. See [issue#1867]. +- [ ] Ensure our config reload strategy is solid. + - [ ] Don't exit when there are configuration errors. See [issue#1816]. + - [ ] Test this. See [issue#2224]. +- [ ] Add `kubernetes` source reference documentation. +- [ ] Add Kubernetes setup/integration guide. +- [ ] Release `0.10.0` and announce. [Bonzai logging operator]: https://github.com/banzaicloud/logging-operator [daemonset]: https://kubernetes.io/docs/concepts/workloads/controllers/daemonset/ @@ -208,7 +230,20 @@ See [motivation](#motivation). [Honeycomb integration]: https://docs.honeycomb.io/getting-data-in/integrations/kubernetes/ [Influx Helm charts]: https://github.com/influxdata/helm-charts [issue#1293]: https://github.com/timberio/vector/issues/1293 +[issue#1635]: https://github.com/timberio/vector/issues/1635 +[issue#1816]: https://github.com/timberio/vector/issues/1867 +[issue#1867]: https://github.com/timberio/vector/issues/1867 +[issue#1910]: https://github.com/timberio/vector/issues/1910 +[issue#2170]: https://github.com/timberio/vector/issues/2170 +[issue#2171]: https://github.com/timberio/vector/issues/2171 +[issue#2199]: https://github.com/timberio/vector/issues/2199 +[issue#2216]: https://github.com/timberio/vector/issues/2216 +[issue#2218]: https://github.com/timberio/vector/issues/2218 +[issue#2223]: https://github.com/timberio/vector/issues/2223 +[issue#2224]: https://github.com/timberio/vector/issues/2224 +[kubernetes_version_comment]: https://github.com/timberio/vector/pull/2188#discussion_r403120481 [LogDNA k8s integration]: https://docs.logdna.com/docs/kubernetes [logdna_daemonset]: https://raw.githubusercontent.com/logdna/logdna-agent/master/logdna-agent-ds.yaml +[pr#2134]: https://github.com/timberio/vector/pull/2134 [pr#2188]: https://github.com/timberio/vector/pull/2188 [vector_daemonset]: 2020-04-04-2221-kubernetes-integration/vector-daemonset.yaml From dea8dae035f9d89b516298d30a4a1e5b63c4fb6a Mon Sep 17 00:00:00 2001 From: binarylogic Date: Sat, 4 Apr 2020 15:06:04 -0400 Subject: [PATCH 003/118] Fix formatting Signed-off-by: binarylogic Signed-off-by: MOZGIII --- .../2020-04-04-2221-kubernetes-integration.md | 24 +++++++++---------- 1 file changed, 12 insertions(+), 12 deletions(-) diff --git a/rfcs/2020-04-04-2221-kubernetes-integration.md b/rfcs/2020-04-04-2221-kubernetes-integration.md index 6aa3a193241df..070e994ccb7da 100644 --- a/rfcs/2020-04-04-2221-kubernetes-integration.md +++ b/rfcs/2020-04-04-2221-kubernetes-integration.md @@ -198,21 +198,21 @@ See [motivation](#motivation). ## Plan Of Attack - [ ] Setup a proper testing suite for k8s. - - [ ] Support for customizable k8s clusters. See [issue#2170]. - - [ ] Stabilize k8s integration tests. See [isue#2193], [issue#2216], - and [issue#1635]. - - [ ] Ensure we are testing all supported minor versions. See - [issue#2223]. + - [ ] Support for customizable k8s clusters. See [issue#2170]. + - [ ] Stabilize k8s integration tests. See [isue#2193], [issue#2216], + and [issue#1635]. + - [ ] Ensure we are testing all supported minor versions. See + [issue#2223]. - [ ] Audit and improve the `kubernetes` source. - - [ ] Handle the log recursion problem where Vector ingests it's own logs. - See [issue#2218] and [issue#2171]. - - [ ] Audit the `file` source strategy. See [issue#2199] and [issue#1910]. - - [ ] Merge split logs. See [pr#2134]. + - [ ] Handle the log recursion problem where Vector ingests it's own logs. + See [issue#2218] and [issue#2171]. + - [ ] Audit the `file` source strategy. See [issue#2199] and [issue#1910]. + - [ ] Merge split logs. See [pr#2134]. - [ ] Audit and improve the `kubernetes_pod_matadata` transform. - - [ ] Use the `log_schema.kubernetes_key` setting. See [issue#1867]. + - [ ] Use the `log_schema.kubernetes_key` setting. See [issue#1867]. - [ ] Ensure our config reload strategy is solid. - - [ ] Don't exit when there are configuration errors. See [issue#1816]. - - [ ] Test this. See [issue#2224]. + - [ ] Don't exit when there are configuration errors. See [issue#1816]. + - [ ] Test this. See [issue#2224]. - [ ] Add `kubernetes` source reference documentation. - [ ] Add Kubernetes setup/integration guide. - [ ] Release `0.10.0` and announce. From f88f5e88ac1a85ea9dfee10f4859ef3d9a0a1d30 Mon Sep 17 00:00:00 2001 From: binarylogic Date: Sat, 4 Apr 2020 15:06:57 -0400 Subject: [PATCH 004/118] Delete old daemonset.yml Signed-off-by: binarylogic Signed-off-by: MOZGIII --- .../vector-daemonset.yaml | 126 ------------------ 1 file changed, 126 deletions(-) delete mode 100644 rfcs/2020-04-04-2221-kubernetes-integration/vector-daemonset.yaml diff --git a/rfcs/2020-04-04-2221-kubernetes-integration/vector-daemonset.yaml b/rfcs/2020-04-04-2221-kubernetes-integration/vector-daemonset.yaml deleted file mode 100644 index f116d8aa09b29..0000000000000 --- a/rfcs/2020-04-04-2221-kubernetes-integration/vector-daemonset.yaml +++ /dev/null @@ -1,126 +0,0 @@ -# WARNING! -# -# DO NOT USE THIS DAEMONSET. THIS IS AN EXAMPLE DAEMONSET USED DURING -# VECTOR'S KUBERNETES RFC PROCESS. - -# Everything related to vector will live under the `telemetry` namespace. -apiVersion: v1 -kind: Namespace -metadata: - name: telemetry ---- -# Permissions to use Kubernetes API. -# Necessary for kubernetes_pod_metadata transform. -# Requires that RBAC authorization is enabled. -apiVersion: rbac.authorization.k8s.io/v1 -kind: ClusterRoleBinding -metadata: - name: vector-permissions -subjects: -- kind: ServiceAccount - name: default - namespace: telemetry -roleRef: - kind: ClusterRole - name: view - apiGroup: rbac.authorization.k8s.io ---- -# ConfigMap which contains vector.toml configuration for pods. -# -# This can also be removed and loaded from a file via `kubectl`. -apiVersion: v1 -kind: ConfigMap -metadata: - name: vector-config - namespace: telemetry -data: - vector-agent-config: | - # file: vector.toml - # Configuration for vector-agent - # Docs: https://vector.dev/docs/ - - # Set global options - data_dir = "/var/tmp/vector" - - # Ingest logs from Kubernetes - [sources.kubernetes] - type = "kubernetes" - - # Enrich logs with Pod metadata - [transforms.pod_metadata] - type = "kubernetes_pod_metadata" - inputs = ["kubernetes"] - - # Add additional Vector transforms and sinks as desired! - # - # For example: - # - # [sinks.aws_s3] - # type = "aws_s3" - # inputs = ["pod_metadata"] - # bucket = "my-bucket" - # compression = "gzip" - # region = "us-east-1" - # key_prefix = "date=%F/" - - # This line is not in VECTOR.TOML ---- -# Vector agent ran on each node where it collects logs from pods. -apiVersion: apps/v1 -kind: DaemonSet -metadata: - name: vector-agent - namespace: telemetry -spec: - minReadySeconds: 1 - selector: - matchLabels: - name: vector-agent - template: - metadata: - labels: - name: vector-agent - # TODO: Modify this pod spec to include any extra configuration needed like - # secrets or dns. - spec: - volumes: - # Directory with logs - - name: var-log - hostPath: - path: /var/log/ - # Docker and containerd log files in Kubernetes are symlinks to this folder. - - name: var-lib - hostPath: - path: /var/lib/ - # Vector will store it's data here. - - name: data-dir - emptyDir: {} - # Mount vector configuration from config map as a file vector.toml - - name: config-dir - configMap: - name: vector-config - items: - - key: vector-agent-config - path: vector.toml - containers: - - name: vector - image: timberio/vector:latest-alpine - imagePullPolicy: Always - args: ["-w"] - volumeMounts: - - name: var-log - mountPath: /var/log/ - readOnly: true - - name: var-lib - mountPath: /var/lib - readOnly: true - - name: data-dir - mountPath: /var/tmp/vector - - name: config-dir - mountPath: /etc/vector - readOnly: true - env: - - name: VECTOR_NODE_NAME - valueFrom: - fieldRef: - fieldPath: spec.nodeName From c1eb4dece079bd485ca9a6a51514ce555e48a4f7 Mon Sep 17 00:00:00 2001 From: binarylogic Date: Sat, 4 Apr 2020 15:18:19 -0400 Subject: [PATCH 005/118] Add minimum k8s version to plan of attack Signed-off-by: binarylogic Signed-off-by: MOZGIII --- rfcs/2020-04-04-2221-kubernetes-integration.md | 1 + 1 file changed, 1 insertion(+) diff --git a/rfcs/2020-04-04-2221-kubernetes-integration.md b/rfcs/2020-04-04-2221-kubernetes-integration.md index 070e994ccb7da..9218ce051b743 100644 --- a/rfcs/2020-04-04-2221-kubernetes-integration.md +++ b/rfcs/2020-04-04-2221-kubernetes-integration.md @@ -197,6 +197,7 @@ See [motivation](#motivation). ## Plan Of Attack +- [ ] Agree on minimal Kubernetes version. - [ ] Setup a proper testing suite for k8s. - [ ] Support for customizable k8s clusters. See [issue#2170]. - [ ] Stabilize k8s integration tests. See [isue#2193], [issue#2216], From 8cb8197ee96ff93ea75b2c7b660cf3d8fd270a2e Mon Sep 17 00:00:00 2001 From: binarylogic Date: Sat, 4 Apr 2020 18:41:42 -0400 Subject: [PATCH 006/118] Add centralize testing issue to tasks Signed-off-by: binarylogic Signed-off-by: MOZGIII --- rfcs/2020-04-04-2221-kubernetes-integration.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/rfcs/2020-04-04-2221-kubernetes-integration.md b/rfcs/2020-04-04-2221-kubernetes-integration.md index 9218ce051b743..e88f4299e1396 100644 --- a/rfcs/2020-04-04-2221-kubernetes-integration.md +++ b/rfcs/2020-04-04-2221-kubernetes-integration.md @@ -200,6 +200,8 @@ See [motivation](#motivation). - [ ] Agree on minimal Kubernetes version. - [ ] Setup a proper testing suite for k8s. - [ ] Support for customizable k8s clusters. See [issue#2170]. + - [ ] Look into [issue#2225] and see if we can include it as part of this + work. - [ ] Stabilize k8s integration tests. See [isue#2193], [issue#2216], and [issue#1635]. - [ ] Ensure we are testing all supported minor versions. See @@ -242,6 +244,7 @@ See [motivation](#motivation). [issue#2218]: https://github.com/timberio/vector/issues/2218 [issue#2223]: https://github.com/timberio/vector/issues/2223 [issue#2224]: https://github.com/timberio/vector/issues/2224 +[issue#2225]: https://github.com/timberio/vector/issues/2225 [kubernetes_version_comment]: https://github.com/timberio/vector/pull/2188#discussion_r403120481 [LogDNA k8s integration]: https://docs.logdna.com/docs/kubernetes [logdna_daemonset]: https://raw.githubusercontent.com/logdna/logdna-agent/master/logdna-agent-ds.yaml From 4f71def45ad756b86720f54c003153d1babb81e0 Mon Sep 17 00:00:00 2001 From: MOZGIII Date: Tue, 7 Apr 2020 00:07:15 +0300 Subject: [PATCH 007/118] Fix minor issues, apply autoformat Signed-off-by: MOZGIII --- .../2020-04-04-2221-kubernetes-integration.md | 71 ++++++++++--------- 1 file changed, 36 insertions(+), 35 deletions(-) diff --git a/rfcs/2020-04-04-2221-kubernetes-integration.md b/rfcs/2020-04-04-2221-kubernetes-integration.md index e88f4299e1396..934dafbcd9ee6 100644 --- a/rfcs/2020-04-04-2221-kubernetes-integration.md +++ b/rfcs/2020-04-04-2221-kubernetes-integration.md @@ -2,10 +2,10 @@ This RFC outlines how the Vector will integration with Kubernetes (k8s). -**Note: This RFC is retroactive and meant to seve as an audit to complete our +**Note: This RFC is retroactive and meant to serve as an audit to complete our Kubernetes integration. At the time of writing this RFC, Vector has already made considerable progress on it's Kubernetes integration. It has a `kubernetes` -source, `kubernetes_pod_metadata` transform, an example daemonset file, and the +source, `kubernetes_pod_metadata` transform, an example `DaemonSet` file, and the ability automatically reload configuration when it changes. The fundamental pieces are mostly in place to complete this integration, but as we approach the finish line we're being faced with deeper questions that heavily affect the @@ -20,7 +20,7 @@ the time of writing this RFC; many large companies, with large production deployments, depend heavily on Kubernetes. Kubernetes handles log collection but does not facilitate shipping. Shipping is meant to be delegated to tools like Vector. This is precisely the use case that Vector was built for. So, -the motivation is three-fold: +motivation is three-fold: 1. A Kubernetes integration is essential to achieving Vector's vision of being the dominant, single collector for observability data. @@ -32,7 +32,7 @@ the motivation is three-fold: **Note: This guide largely follows the format of our existing guides ([example][guide_example]). There are two perspectives to our guides: 1) A new -user coming from Google 2) A user that is familar with Vector. This guide is +user coming from Google 2) A user that is familiar with Vector. This guide is from perspective 2.** This guide covers integrating Vector with Kubernetes. We'll touch on the basic @@ -47,9 +47,9 @@ Kubernetes logs and metrics to any destination you please. Our recommended strategy deploys Vector as a Kubernetes [daemonset]. This is the most efficient means of collecting Kubernetes observability data since -Vector is guaranteed to deploy _once_ on each of your Pods. In addition, +Vector is guaranteed to deploy _once_ on each of your Nodes. In addition, we'll use the [`kubernetes_pod_metadata` transform][kubernetes_pod_metadata_transform] -to enrich your logs with Kubernetes context. This transform interacts with +to enrich your logs with the Kubernetes context. This transform interacts with the Kubernetes watch API to collect cluster metadata and update in real-time when things change. The following diagram demonstrates how this works: @@ -57,11 +57,11 @@ TODO: insert diagram ### What We'll Accomplish -* Collect data from each of your Kubernetes Pods - * Ability to filter by container name, Pod IDs, and namespaces. - * Automatically merge logs that Kubernetes splits. - * Enrich your logs with useful Kubernetes context. -* Send your logs to one or more destinations. +- Collect data from each of your Kubernetes Pods + - Ability to filter by container names, Pod IDs, and namespaces. + - Automatically merge logs that Kubernetes splits. + - Enrich your logs with useful Kubernetes context. +- Send your logs to one or more destinations. ### Tutorial @@ -69,13 +69,13 @@ TODO: insert diagram 1. Configure Vector: - Before we can deplo Vector we must configure. This is done by creating + Before we can deploy Vector we must configure. This is done by creating a Kubernetes `ConfigMap`: ...insert selector to select any of Vector's sinks... ```bash - echo ' + cat <<-CONFIG > vector-configmap.yaml apiVersion: v1 kind: ConfigMap metadata: @@ -107,17 +107,17 @@ TODO: insert diagram compression = "gzip" region = "us-east-1" key_prefix = "date=%F/" - ' > vector-configmap.toml + CONFIG ``` 2. Deploy Vector! Now that you have your custom `ConfigMap` ready it's time to deploy Vector. To ensure Vector is isolated and has the necessary permissions - we must create a `namespace`, `ServiceAccount`, `ClusterRole`, and + we must create a `Namespace`, `ServiceAccount`, `ClusterRole`, and `ClusterRoleBinding`: - ```bash + ```shell kubectl create namespace logging kubectl create -f vector-service-account.yaml kubectl create -f vector-role.yaml @@ -126,7 +126,7 @@ TODO: insert diagram kubectl create -f vector-daemonset.yaml ``` - * *See [outstanding questions 3, 4, 5, 6, and 7](#outstanding-questions).* + - _See [outstanding questions 3, 4, 5, 6, and 7](#outstanding-questions)._ That's it! @@ -138,14 +138,14 @@ TODO: fill in 1. [Filebeat k8s integration] 1. [Fluentbit k8s integration] -2. [Fluentd k8s integration] -3. [LogDNA k8s integration] -4. [Honeycomb integration] -3. [Bonzai logging operator] - This is approach is likely outside of the scope +1. [Fluentd k8s integration] +1. [LogDNA k8s integration] +1. [Honeycomb integration] +1. [Bonzai logging operator] - This is approach is likely outside of the scope of Vector's initial Kubernetes integration because it focuses more on deployment strategies and topologies. There are likely some very useful and interesting tactics in their approach though. -4. [Influx Helm charts] +1. [Influx Helm charts] ## Sales Pitch @@ -157,7 +157,7 @@ See [motivation](#motivation). ## Alternatives -1. Not do this integration and rely solely on external community driven +1. Not do this integration and rely solely on external community-driven integrations. ## Outstanding Questions @@ -165,15 +165,15 @@ See [motivation](#motivation). 1. What is the minimal Kubernetes version that we want to support. See [this comment][kubernetes_version_comment]. 1. What is the best to avoid Vector from ingesting it's own logs? I'm assuming - that my [`kubectl` tutoria](#kubectl-interface) handles this with namespaces? - We'd just need to configure Vector to excluse this namespace? + that my [`kubectl` tutorial](#kubectl-interface) handles this with namespaces? + We'd just need to configure Vector to exclude this namespace? 1. I've seen two different installation strategies. For example, Fluentd offers a [single daemonset configuration file][fluentd_daemonset] while Fluentbit offers [four separate configuration files][fluentbit_installation] (`service-account.yaml`, `role.yaml`, `role-binding.yaml`, `configmap.yaml`). Which approach is better? Why are they different? 1. Should we prefer `kubectl create ...` or `kubectl apply ...`? The examples - in the [prior art](#prior-art) section use both. + in the [prior art](#prior-art) section use both. 1. From what I understand, Vector requires the Kubernetes `watch` verb in order to receive updates to k8s cluster changes. This is required for the `kubernetes_pod_metadata` transform. Yet, Fluentbit [requires the `get`, @@ -186,7 +186,7 @@ See [motivation](#motivation). For example [LogDNA's daemonset][logdna_daemonset]. I assume this is limiting resources. Do we want to consider this? 1. What the hell is going on with [Honeycomb's integration - strategy][Hoenycomb integration]? :) It seems like the whole "Heapster" + strategy][honeycomb integration]? :) It seems like the whole "Heapster" pipeline is specifically for system events, but Heapster is deprecated? This leads me to my next question... 1. How are we collecting Kubernetes system events? Is that outside of the @@ -202,7 +202,7 @@ See [motivation](#motivation). - [ ] Support for customizable k8s clusters. See [issue#2170]. - [ ] Look into [issue#2225] and see if we can include it as part of this work. - - [ ] Stabilize k8s integration tests. See [isue#2193], [issue#2216], + - [ ] Stabilize k8s integration tests. See [issue#2193], [issue#2216], and [issue#1635]. - [ ] Ensure we are testing all supported minor versions. See [issue#2223]. @@ -220,18 +220,18 @@ See [motivation](#motivation). - [ ] Add Kubernetes setup/integration guide. - [ ] Release `0.10.0` and announce. -[Bonzai logging operator]: https://github.com/banzaicloud/logging-operator +[bonzai logging operator]: https://github.com/banzaicloud/logging-operator [daemonset]: https://kubernetes.io/docs/concepts/workloads/controllers/daemonset/ -[Filebeat k8s integration]: https://www.elastic.co/guide/en/beats/filebeat/master/running-on-kubernetes.html -[Fluentbit k8s integration]: https://docs.fluentbit.io/manual/installation/kubernetes +[filebeat k8s integration]: https://www.elastic.co/guide/en/beats/filebeat/master/running-on-kubernetes.html +[fluentbit k8s integration]: https://docs.fluentbit.io/manual/installation/kubernetes [fluentbit_daemonset]: https://raw.githubusercontent.com/fluent/fluent-bit-kubernetes-logging/master/output/elasticsearch/fluent-bit-ds.yaml [fluentbit_installation]: https://docs.fluentbit.io/manual/installation/kubernetes#installation [fluentbit_role]: https://raw.githubusercontent.com/fluent/fluent-bit-kubernetes-logging/master/fluent-bit-role.yaml -[Fluentd k8s integration]: https://docs.fluentd.org/v/0.12/articles/kubernetes-fluentd +[fluentd k8s integration]: https://docs.fluentd.org/v/0.12/articles/kubernetes-fluentd [fluentd_daemonset]: https://github.com/fluent/fluentd-kubernetes-daemonset/blob/master/fluentd-daemonset-papertrail.yaml [guide_example]: https://vector.dev/guides/integrate/sources/syslog/aws_kinesis_firehose/ -[Honeycomb integration]: https://docs.honeycomb.io/getting-data-in/integrations/kubernetes/ -[Influx Helm charts]: https://github.com/influxdata/helm-charts +[honeycomb integration]: https://docs.honeycomb.io/getting-data-in/integrations/kubernetes/ +[influx helm charts]: https://github.com/influxdata/helm-charts [issue#1293]: https://github.com/timberio/vector/issues/1293 [issue#1635]: https://github.com/timberio/vector/issues/1635 [issue#1816]: https://github.com/timberio/vector/issues/1867 @@ -239,6 +239,7 @@ See [motivation](#motivation). [issue#1910]: https://github.com/timberio/vector/issues/1910 [issue#2170]: https://github.com/timberio/vector/issues/2170 [issue#2171]: https://github.com/timberio/vector/issues/2171 +[issue#2193]: https://github.com/timberio/vector/issues/2193 [issue#2199]: https://github.com/timberio/vector/issues/2199 [issue#2216]: https://github.com/timberio/vector/issues/2216 [issue#2218]: https://github.com/timberio/vector/issues/2218 @@ -246,7 +247,7 @@ See [motivation](#motivation). [issue#2224]: https://github.com/timberio/vector/issues/2224 [issue#2225]: https://github.com/timberio/vector/issues/2225 [kubernetes_version_comment]: https://github.com/timberio/vector/pull/2188#discussion_r403120481 -[LogDNA k8s integration]: https://docs.logdna.com/docs/kubernetes +[logdna k8s integration]: https://docs.logdna.com/docs/kubernetes [logdna_daemonset]: https://raw.githubusercontent.com/logdna/logdna-agent/master/logdna-agent-ds.yaml [pr#2134]: https://github.com/timberio/vector/pull/2134 [pr#2188]: https://github.com/timberio/vector/pull/2188 From 0803c6eb80b798093fb4ab39f6830d53d5573e38 Mon Sep 17 00:00:00 2001 From: MOZGIII Date: Tue, 7 Apr 2020 13:34:07 +0300 Subject: [PATCH 008/118] Fix more nits Signed-off-by: MOZGIII --- rfcs/2020-04-04-2221-kubernetes-integration.md | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/rfcs/2020-04-04-2221-kubernetes-integration.md b/rfcs/2020-04-04-2221-kubernetes-integration.md index 934dafbcd9ee6..4b1a0ab576a17 100644 --- a/rfcs/2020-04-04-2221-kubernetes-integration.md +++ b/rfcs/2020-04-04-2221-kubernetes-integration.md @@ -45,7 +45,7 @@ Kubernetes logs and metrics to any destination you please. #### How This Guide Works -Our recommended strategy deploys Vector as a Kubernetes [daemonset]. This is +Our recommended strategy deploys Vector as a Kubernetes [DaemonSet]. This is the most efficient means of collecting Kubernetes observability data since Vector is guaranteed to deploy _once_ on each of your Nodes. In addition, we'll use the [`kubernetes_pod_metadata` transform][kubernetes_pod_metadata_transform] @@ -162,6 +162,8 @@ See [motivation](#motivation). ## Outstanding Questions +### From Ben + 1. What is the minimal Kubernetes version that we want to support. See [this comment][kubernetes_version_comment]. 1. What is the best to avoid Vector from ingesting it's own logs? I'm assuming @@ -192,7 +194,7 @@ See [motivation](#motivation). 1. How are we collecting Kubernetes system events? Is that outside of the scope of this RFC? And why does this take an entirely different path? (ref [issue#1293]) -1. What are some of the details that sets Vector's Kubernetes integration apart? +1. What are some of the details that set Vector's Kubernetes integration apart? This is for marketing purposes and also helps us "raise the bar". ## Plan Of Attack From d55ed93961f644165a0c180d0ce75470a56e1557 Mon Sep 17 00:00:00 2001 From: MOZGIII Date: Tue, 7 Apr 2020 13:39:46 +0300 Subject: [PATCH 009/118] Add question regarding kubernetes cluster flavors Signed-off-by: MOZGIII --- rfcs/2020-04-04-2221-kubernetes-integration.md | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/rfcs/2020-04-04-2221-kubernetes-integration.md b/rfcs/2020-04-04-2221-kubernetes-integration.md index 4b1a0ab576a17..dd8087cff8325 100644 --- a/rfcs/2020-04-04-2221-kubernetes-integration.md +++ b/rfcs/2020-04-04-2221-kubernetes-integration.md @@ -197,9 +197,17 @@ See [motivation](#motivation). 1. What are some of the details that set Vector's Kubernetes integration apart? This is for marketing purposes and also helps us "raise the bar". +### From Mike + +1. What significantly different k8s cluster "flavors" are there? Which ones do + we want to test against? Some clusters use `docker`, some use `CRI-O`, + [etc][container_runtimes]. Some even use [gVisor] or [Firecracker]. There + might be differences in how different container runtimes handle logs. + ## Plan Of Attack - [ ] Agree on minimal Kubernetes version. +- [ ] Agree on a list of Kubernetes cluster flavors we want to test against. - [ ] Setup a proper testing suite for k8s. - [ ] Support for customizable k8s clusters. See [issue#2170]. - [ ] Look into [issue#2225] and see if we can include it as part of this @@ -223,8 +231,10 @@ See [motivation](#motivation). - [ ] Release `0.10.0` and announce. [bonzai logging operator]: https://github.com/banzaicloud/logging-operator +[container_runtimes]: https://kubernetes.io/docs/setup/production-environment/container-runtimes/ [daemonset]: https://kubernetes.io/docs/concepts/workloads/controllers/daemonset/ [filebeat k8s integration]: https://www.elastic.co/guide/en/beats/filebeat/master/running-on-kubernetes.html +[firecracker]: https://github.com/firecracker-microvm/firecracker [fluentbit k8s integration]: https://docs.fluentbit.io/manual/installation/kubernetes [fluentbit_daemonset]: https://raw.githubusercontent.com/fluent/fluent-bit-kubernetes-logging/master/output/elasticsearch/fluent-bit-ds.yaml [fluentbit_installation]: https://docs.fluentbit.io/manual/installation/kubernetes#installation @@ -232,6 +242,7 @@ See [motivation](#motivation). [fluentd k8s integration]: https://docs.fluentd.org/v/0.12/articles/kubernetes-fluentd [fluentd_daemonset]: https://github.com/fluent/fluentd-kubernetes-daemonset/blob/master/fluentd-daemonset-papertrail.yaml [guide_example]: https://vector.dev/guides/integrate/sources/syslog/aws_kinesis_firehose/ +[gvisor]: https://github.com/google/gvisor [honeycomb integration]: https://docs.honeycomb.io/getting-data-in/integrations/kubernetes/ [influx helm charts]: https://github.com/influxdata/helm-charts [issue#1293]: https://github.com/timberio/vector/issues/1293 From cf46c020113248fced03c10bd526a05a8a4df937 Mon Sep 17 00:00:00 2001 From: MOZGIII Date: Tue, 7 Apr 2020 13:40:14 +0300 Subject: [PATCH 010/118] Add design considerations section Signed-off-by: MOZGIII --- rfcs/2020-04-04-2221-kubernetes-integration.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/rfcs/2020-04-04-2221-kubernetes-integration.md b/rfcs/2020-04-04-2221-kubernetes-integration.md index dd8087cff8325..d76ec1df0c61a 100644 --- a/rfcs/2020-04-04-2221-kubernetes-integration.md +++ b/rfcs/2020-04-04-2221-kubernetes-integration.md @@ -134,6 +134,8 @@ TODO: insert diagram TODO: fill in +## Design considerations + ## Prior Art 1. [Filebeat k8s integration] From 2ce1690a4809c953ddeaec6c5a2f818240a7c04b Mon Sep 17 00:00:00 2001 From: MOZGIII Date: Tue, 7 Apr 2020 13:40:43 +0300 Subject: [PATCH 011/118] Add a section on minimal supported kubernetes version MSKV Signed-off-by: MOZGIII --- .../2020-04-04-2221-kubernetes-integration.md | 25 +++++++++++++++++++ 1 file changed, 25 insertions(+) diff --git a/rfcs/2020-04-04-2221-kubernetes-integration.md b/rfcs/2020-04-04-2221-kubernetes-integration.md index d76ec1df0c61a..4b49c5bbc083e 100644 --- a/rfcs/2020-04-04-2221-kubernetes-integration.md +++ b/rfcs/2020-04-04-2221-kubernetes-integration.md @@ -136,6 +136,31 @@ TODO: fill in ## Design considerations +### Minimal supported Kubernetes version + +The minimal supported Kubernetes version is the earliest released version of +Kubernetes that we intend to support at full capacity. + +We use minimal supported Kubernetes version (or MSKV for short), in the +following ways: + +- to communicate to our users what versions of Kubernetes Vector will work on; +- to run our Kubernetes test suite against Kubernetes clusters starting from + this version; +- to track what Kubernetes API feature level we can use when developing Vector + code. + +We can change MSKV over time, but we have to notify our users accordingly. + +There has to be one "root" location where current MSKV for the whole Vector +project is specified, and it should be a single source of truth for all the +decisions that involve MSKV, as well as documentation. A good candidate for +such location is a file at `.meta` dir of the Vector repo. `.meta/mskv` for +instance. + +For the moment, the discussion on the initial MSKV is in progress. The proposed +version is Kubernetes `1.14`. + ## Prior Art 1. [Filebeat k8s integration] From 449a41749d6708ace95182b2a85ab51a525d1e36 Mon Sep 17 00:00:00 2001 From: MOZGIII Date: Tue, 7 Apr 2020 18:14:12 +0300 Subject: [PATCH 012/118] Add a section on helm charts and yaml files Signed-off-by: MOZGIII --- .../2020-04-04-2221-kubernetes-integration.md | 22 +++++++++++++++++++ 1 file changed, 22 insertions(+) diff --git a/rfcs/2020-04-04-2221-kubernetes-integration.md b/rfcs/2020-04-04-2221-kubernetes-integration.md index 4b49c5bbc083e..96f7f977403da 100644 --- a/rfcs/2020-04-04-2221-kubernetes-integration.md +++ b/rfcs/2020-04-04-2221-kubernetes-integration.md @@ -161,6 +161,28 @@ instance. For the moment, the discussion on the initial MSKV is in progress. The proposed version is Kubernetes `1.14`. +### Helm vs raw YAML files + +We consider both raw YAML files and Helm Chart officially supported installation +methods. + +With Helm, people usually use the Chart we provide, and tweak it to their needs +via variables we expose as the chart configuration. This means we can offer a +lot of customization, however, in the end, we're in charge of generating the +YAML configuration that will k8s will run from our templates. +This means that, while it is very straightforward for users, we have to keep in +mind the compatibility concerns when we update our Helm Chart. +We should provide a lot of flexibility in our Helm Charts, but also have sane +defaults that would be work for the majority of users. + +With raw YAML files, they have to be usable out of the box, but we shouldn't +expect users to use them as-is. People would often maintain their own "forks" of +those, tailored to their use case. We shouldn't overcomplicate our recommended +configuration, but we shouldn't oversimplify it either. It has to be +production-ready. But it also has to be portable, in a sense that it should work +without tweaking with as much cluster setups as possible. +We should support both `kubectl create` and `kubectl apply` flows. + ## Prior Art 1. [Filebeat k8s integration] From ea251e10c02dbcc93e731518af3cb396678047be Mon Sep 17 00:00:00 2001 From: MOZGIII Date: Tue, 7 Apr 2020 20:10:10 +0300 Subject: [PATCH 013/118] Add a section on reading container logs Signed-off-by: MOZGIII --- .../2020-04-04-2221-kubernetes-integration.md | 69 +++++++++++++++++++ 1 file changed, 69 insertions(+) diff --git a/rfcs/2020-04-04-2221-kubernetes-integration.md b/rfcs/2020-04-04-2221-kubernetes-integration.md index 96f7f977403da..22c5a51ee46ac 100644 --- a/rfcs/2020-04-04-2221-kubernetes-integration.md +++ b/rfcs/2020-04-04-2221-kubernetes-integration.md @@ -183,6 +183,67 @@ production-ready. But it also has to be portable, in a sense that it should work without tweaking with as much cluster setups as possible. We should support both `kubectl create` and `kubectl apply` flows. +### Reading container logs + +#### Kubernetes logging architecture + +Kubernetes does not directly control the logging, as the actual implementation +of the logging mechanisms is a domain of the container runtime. +That said, Kubernetes requires container runtime to fulfill a certain contract, +and allowing it to enforce desired behavior. + +Kubernetes tries to store logs at consistent filesystem paths for any container +runtime. In particular, `kubelet` is responsible of configuring the container +runtime it controls to put the log at the right place. +Log file format can vary per container runtime, and we have to support all the +formats that Kubernetes itself supports. + +Generally, most Kubernetes setups will put the logs at the `kubelet`-configured +locations in a . + +There is [official documentation][k8s_log_path_location_docs] at Kubernetes +project regarding logging. I had a misconception that it specifies reading these +log files as an explicitly supported way of consuming the logs, however, I +couldn't find a confirmation of that when I checked. +Nonetheless, Kubernetes log files is a de-facto well-settled interface, that we +should be able to use reliably. + +#### File locations + +We can read container logs directly from the host filesystem. Kubernetes stores +logs such that they're accessible from the following locations: + +- [`/var/log/pods`][var_log_pods_src]; +- `/var/log/containers` - legacy location, kept for backward compatibility + with pre `1.14` clusters. + +To make our lives easier, here's a [link][build_container_logs_directory_src] to +the part of the k8s source that's responsible for building the path to the log +file. If we encounter issues, this would be a good starting point to unwrap the +k8s code. + +#### Log file format + +As already been mentioned above, log formats can vary, but there are certain +invariants that are imposed on the container runtimes by the implementation of +Kubernetes itself. + +A particularity interesting piece of code is the [`ReadLogs`][k8s_src_read_logs] +function - it is responsible for reading container logs. We should carefully +inspect it to gain knowledge on the edge cases. To achieve the best +compatibility, we can base our log files consumption procedure on the logic +implemented by that function. + +Based on the [`parseFuncs`][k8s_src_parse_funcs] (that +[`ReadLogs`][k8s_src_read_logs] uses), it's evident that k8s supports the +following formats: + +- Docker [JSON File logging driver] format - which is essentially a simple + [`JSONLines`][jsonlines] (aka `ndjson`) format; +- [CRI format][cri_log_format]. + +We have to support both formats. + ## Prior Art 1. [Filebeat k8s integration] @@ -314,3 +375,11 @@ See [motivation](#motivation). [pr#2134]: https://github.com/timberio/vector/pull/2134 [pr#2188]: https://github.com/timberio/vector/pull/2188 [vector_daemonset]: 2020-04-04-2221-kubernetes-integration/vector-daemonset.yaml +[var_log_pods_src]: https://github.com/kubernetes/kubernetes/blob/58596b2bf5eb0d84128fa04d0395ddd148d96e51/pkg/kubelet/kuberuntime/kuberuntime_manager.go#L60 +[build_container_logs_directory_src]: https://github.com/kubernetes/kubernetes/blob/31305966789525fca49ec26c289e565467d1f1c4/pkg/kubelet/kuberuntime/helpers.go#L173 +[k8s_log_path_location_docs]: https://kubernetes.io/docs/concepts/cluster-administration/logging/#logging-at-the-node-level +[jsonlines]: http://jsonlines.org/ +[k8s_src_read_logs]: https://github.com/kubernetes/kubernetes/blob/e74ad388541b15ae7332abf2e586e2637b55d7a7/pkg/kubelet/kuberuntime/logs/logs.go#L277 +[k8s_src_parse_funcs]: https://github.com/kubernetes/kubernetes/blob/e74ad388541b15ae7332abf2e586e2637b55d7a7/pkg/kubelet/kuberuntime/logs/logs.go#L116 +[json file logging driver]: https://docs.docker.com/config/containers/logging/json-file/ +[cri_log_format]: https://github.com/kubernetes/community/blob/ee2abbf9dbfa4523b414f99a04ddc97bd38c74b2/contributors/design-proposals/node/kubelet-cri-logging.md From 8646aa601119c0211e9167c2afa70eb5901aad3a Mon Sep 17 00:00:00 2001 From: MOZGIII Date: Wed, 8 Apr 2020 13:45:37 +0300 Subject: [PATCH 014/118] Adjust the commands for installation from YAML Signed-off-by: MOZGIII --- rfcs/2020-04-04-2221-kubernetes-integration.md | 7 ++----- 1 file changed, 2 insertions(+), 5 deletions(-) diff --git a/rfcs/2020-04-04-2221-kubernetes-integration.md b/rfcs/2020-04-04-2221-kubernetes-integration.md index 22c5a51ee46ac..a5c95554597e8 100644 --- a/rfcs/2020-04-04-2221-kubernetes-integration.md +++ b/rfcs/2020-04-04-2221-kubernetes-integration.md @@ -119,11 +119,8 @@ TODO: insert diagram ```shell kubectl create namespace logging - kubectl create -f vector-service-account.yaml - kubectl create -f vector-role.yaml - kubectl create -f vector-role-binding.yaml - kubectl create -f vector-configmap.yaml - kubectl create -f vector-daemonset.yaml + kubectl apply -f vector-configmap.yaml + kubectl apply -f https://packages.timber.io/vector/latest/kubernetes/vector.yaml ``` - _See [outstanding questions 3, 4, 5, 6, and 7](#outstanding-questions)._ From 9359ed471ffeb474124153b383733e81551959aa Mon Sep 17 00:00:00 2001 From: MOZGIII Date: Wed, 8 Apr 2020 13:45:59 +0300 Subject: [PATCH 015/118] Add a chapter on helm chart registry Signed-off-by: MOZGIII --- rfcs/2020-04-04-2221-kubernetes-integration.md | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/rfcs/2020-04-04-2221-kubernetes-integration.md b/rfcs/2020-04-04-2221-kubernetes-integration.md index a5c95554597e8..6969c14630104 100644 --- a/rfcs/2020-04-04-2221-kubernetes-integration.md +++ b/rfcs/2020-04-04-2221-kubernetes-integration.md @@ -241,6 +241,14 @@ following formats: We have to support both formats. +### Helm Chart Repository + +We should not just maintain a Helm Chart, we also should offer Helm repo to make +installations easily upgradable. + +Everything we need to do to achieve this is outlined at the +[The Chart Repository Guide]. + ## Prior Art 1. [Filebeat k8s integration] @@ -310,6 +318,7 @@ See [motivation](#motivation). we want to test against? Some clusters use `docker`, some use `CRI-O`, [etc][container_runtimes]. Some even use [gVisor] or [Firecracker]. There might be differences in how different container runtimes handle logs. +1. How do we want to approach Helm Chart Repository management. ## Plan Of Attack @@ -380,3 +389,4 @@ See [motivation](#motivation). [k8s_src_parse_funcs]: https://github.com/kubernetes/kubernetes/blob/e74ad388541b15ae7332abf2e586e2637b55d7a7/pkg/kubelet/kuberuntime/logs/logs.go#L116 [json file logging driver]: https://docs.docker.com/config/containers/logging/json-file/ [cri_log_format]: https://github.com/kubernetes/community/blob/ee2abbf9dbfa4523b414f99a04ddc97bd38c74b2/contributors/design-proposals/node/kubelet-cri-logging.md +[the chart repository guide]: https://helm.sh/docs/topics/chart_repository/ From 83d4dc4c6a286d6f6d92cf6d0f62f4a054af2062 Mon Sep 17 00:00:00 2001 From: MOZGIII Date: Wed, 8 Apr 2020 14:12:11 +0300 Subject: [PATCH 016/118] Sort refs Signed-off-by: MOZGIII --- rfcs/2020-04-04-2221-kubernetes-integration.md | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/rfcs/2020-04-04-2221-kubernetes-integration.md b/rfcs/2020-04-04-2221-kubernetes-integration.md index 6969c14630104..fc10bada2777c 100644 --- a/rfcs/2020-04-04-2221-kubernetes-integration.md +++ b/rfcs/2020-04-04-2221-kubernetes-integration.md @@ -347,7 +347,9 @@ See [motivation](#motivation). - [ ] Release `0.10.0` and announce. [bonzai logging operator]: https://github.com/banzaicloud/logging-operator +[build_container_logs_directory_src]: https://github.com/kubernetes/kubernetes/blob/31305966789525fca49ec26c289e565467d1f1c4/pkg/kubelet/kuberuntime/helpers.go#L173 [container_runtimes]: https://kubernetes.io/docs/setup/production-environment/container-runtimes/ +[cri_log_format]: https://github.com/kubernetes/community/blob/ee2abbf9dbfa4523b414f99a04ddc97bd38c74b2/contributors/design-proposals/node/kubelet-cri-logging.md [daemonset]: https://kubernetes.io/docs/concepts/workloads/controllers/daemonset/ [filebeat k8s integration]: https://www.elastic.co/guide/en/beats/filebeat/master/running-on-kubernetes.html [firecracker]: https://github.com/firecracker-microvm/firecracker @@ -375,18 +377,16 @@ See [motivation](#motivation). [issue#2223]: https://github.com/timberio/vector/issues/2223 [issue#2224]: https://github.com/timberio/vector/issues/2224 [issue#2225]: https://github.com/timberio/vector/issues/2225 +[json file logging driver]: https://docs.docker.com/config/containers/logging/json-file/ +[jsonlines]: http://jsonlines.org/ +[k8s_log_path_location_docs]: https://kubernetes.io/docs/concepts/cluster-administration/logging/#logging-at-the-node-level +[k8s_src_parse_funcs]: https://github.com/kubernetes/kubernetes/blob/e74ad388541b15ae7332abf2e586e2637b55d7a7/pkg/kubelet/kuberuntime/logs/logs.go#L116 +[k8s_src_read_logs]: https://github.com/kubernetes/kubernetes/blob/e74ad388541b15ae7332abf2e586e2637b55d7a7/pkg/kubelet/kuberuntime/logs/logs.go#L277 [kubernetes_version_comment]: https://github.com/timberio/vector/pull/2188#discussion_r403120481 [logdna k8s integration]: https://docs.logdna.com/docs/kubernetes [logdna_daemonset]: https://raw.githubusercontent.com/logdna/logdna-agent/master/logdna-agent-ds.yaml [pr#2134]: https://github.com/timberio/vector/pull/2134 [pr#2188]: https://github.com/timberio/vector/pull/2188 -[vector_daemonset]: 2020-04-04-2221-kubernetes-integration/vector-daemonset.yaml -[var_log_pods_src]: https://github.com/kubernetes/kubernetes/blob/58596b2bf5eb0d84128fa04d0395ddd148d96e51/pkg/kubelet/kuberuntime/kuberuntime_manager.go#L60 -[build_container_logs_directory_src]: https://github.com/kubernetes/kubernetes/blob/31305966789525fca49ec26c289e565467d1f1c4/pkg/kubelet/kuberuntime/helpers.go#L173 -[k8s_log_path_location_docs]: https://kubernetes.io/docs/concepts/cluster-administration/logging/#logging-at-the-node-level -[jsonlines]: http://jsonlines.org/ -[k8s_src_read_logs]: https://github.com/kubernetes/kubernetes/blob/e74ad388541b15ae7332abf2e586e2637b55d7a7/pkg/kubelet/kuberuntime/logs/logs.go#L277 -[k8s_src_parse_funcs]: https://github.com/kubernetes/kubernetes/blob/e74ad388541b15ae7332abf2e586e2637b55d7a7/pkg/kubelet/kuberuntime/logs/logs.go#L116 -[json file logging driver]: https://docs.docker.com/config/containers/logging/json-file/ -[cri_log_format]: https://github.com/kubernetes/community/blob/ee2abbf9dbfa4523b414f99a04ddc97bd38c74b2/contributors/design-proposals/node/kubelet-cri-logging.md [the chart repository guide]: https://helm.sh/docs/topics/chart_repository/ +[var_log_pods_src]: https://github.com/kubernetes/kubernetes/blob/58596b2bf5eb0d84128fa04d0395ddd148d96e51/pkg/kubelet/kuberuntime/kuberuntime_manager.go#L60 +[vector_daemonset]: 2020-04-04-2221-kubernetes-integration/vector-daemonset.yaml From b25d37d5eb01cfa65a9773389797790067fd635f Mon Sep 17 00:00:00 2001 From: MOZGIII Date: Wed, 8 Apr 2020 14:12:36 +0300 Subject: [PATCH 017/118] Rename var_log_pods_src to k8s_src_var_log_pods Signed-off-by: MOZGIII --- rfcs/2020-04-04-2221-kubernetes-integration.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/rfcs/2020-04-04-2221-kubernetes-integration.md b/rfcs/2020-04-04-2221-kubernetes-integration.md index fc10bada2777c..0d76fb9b6205e 100644 --- a/rfcs/2020-04-04-2221-kubernetes-integration.md +++ b/rfcs/2020-04-04-2221-kubernetes-integration.md @@ -210,7 +210,7 @@ should be able to use reliably. We can read container logs directly from the host filesystem. Kubernetes stores logs such that they're accessible from the following locations: -- [`/var/log/pods`][var_log_pods_src]; +- [`/var/log/pods`][k8s_src_var_log_pods]; - `/var/log/containers` - legacy location, kept for backward compatibility with pre `1.14` clusters. @@ -382,11 +382,11 @@ See [motivation](#motivation). [k8s_log_path_location_docs]: https://kubernetes.io/docs/concepts/cluster-administration/logging/#logging-at-the-node-level [k8s_src_parse_funcs]: https://github.com/kubernetes/kubernetes/blob/e74ad388541b15ae7332abf2e586e2637b55d7a7/pkg/kubelet/kuberuntime/logs/logs.go#L116 [k8s_src_read_logs]: https://github.com/kubernetes/kubernetes/blob/e74ad388541b15ae7332abf2e586e2637b55d7a7/pkg/kubelet/kuberuntime/logs/logs.go#L277 +[k8s_src_var_log_pods]: https://github.com/kubernetes/kubernetes/blob/58596b2bf5eb0d84128fa04d0395ddd148d96e51/pkg/kubelet/kuberuntime/kuberuntime_manager.go#L60 [kubernetes_version_comment]: https://github.com/timberio/vector/pull/2188#discussion_r403120481 [logdna k8s integration]: https://docs.logdna.com/docs/kubernetes [logdna_daemonset]: https://raw.githubusercontent.com/logdna/logdna-agent/master/logdna-agent-ds.yaml [pr#2134]: https://github.com/timberio/vector/pull/2134 [pr#2188]: https://github.com/timberio/vector/pull/2188 [the chart repository guide]: https://helm.sh/docs/topics/chart_repository/ -[var_log_pods_src]: https://github.com/kubernetes/kubernetes/blob/58596b2bf5eb0d84128fa04d0395ddd148d96e51/pkg/kubelet/kuberuntime/kuberuntime_manager.go#L60 [vector_daemonset]: 2020-04-04-2221-kubernetes-integration/vector-daemonset.yaml From 2592632fda63937987af637e58984e135219812a Mon Sep 17 00:00:00 2001 From: MOZGIII Date: Wed, 8 Apr 2020 14:15:26 +0300 Subject: [PATCH 018/118] Rename build_container_logs_directory_src ref to k8s_src_build_container_logs_directory Signed-off-by: MOZGIII --- rfcs/2020-04-04-2221-kubernetes-integration.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/rfcs/2020-04-04-2221-kubernetes-integration.md b/rfcs/2020-04-04-2221-kubernetes-integration.md index 0d76fb9b6205e..c4fe656f55309 100644 --- a/rfcs/2020-04-04-2221-kubernetes-integration.md +++ b/rfcs/2020-04-04-2221-kubernetes-integration.md @@ -214,10 +214,10 @@ logs such that they're accessible from the following locations: - `/var/log/containers` - legacy location, kept for backward compatibility with pre `1.14` clusters. -To make our lives easier, here's a [link][build_container_logs_directory_src] to -the part of the k8s source that's responsible for building the path to the log -file. If we encounter issues, this would be a good starting point to unwrap the -k8s code. +To make our lives easier, here's a [link][k8s_src_build_container_logs_directory] +to the part of the k8s source that's responsible for building the path to the +log file. If we encounter issues, this would be a good starting point to unwrap +the k8s code. #### Log file format @@ -347,7 +347,6 @@ See [motivation](#motivation). - [ ] Release `0.10.0` and announce. [bonzai logging operator]: https://github.com/banzaicloud/logging-operator -[build_container_logs_directory_src]: https://github.com/kubernetes/kubernetes/blob/31305966789525fca49ec26c289e565467d1f1c4/pkg/kubelet/kuberuntime/helpers.go#L173 [container_runtimes]: https://kubernetes.io/docs/setup/production-environment/container-runtimes/ [cri_log_format]: https://github.com/kubernetes/community/blob/ee2abbf9dbfa4523b414f99a04ddc97bd38c74b2/contributors/design-proposals/node/kubelet-cri-logging.md [daemonset]: https://kubernetes.io/docs/concepts/workloads/controllers/daemonset/ @@ -380,6 +379,7 @@ See [motivation](#motivation). [json file logging driver]: https://docs.docker.com/config/containers/logging/json-file/ [jsonlines]: http://jsonlines.org/ [k8s_log_path_location_docs]: https://kubernetes.io/docs/concepts/cluster-administration/logging/#logging-at-the-node-level +[k8s_src_build_container_logs_directory]: https://github.com/kubernetes/kubernetes/blob/31305966789525fca49ec26c289e565467d1f1c4/pkg/kubelet/kuberuntime/helpers.go#L173 [k8s_src_parse_funcs]: https://github.com/kubernetes/kubernetes/blob/e74ad388541b15ae7332abf2e586e2637b55d7a7/pkg/kubelet/kuberuntime/logs/logs.go#L116 [k8s_src_read_logs]: https://github.com/kubernetes/kubernetes/blob/e74ad388541b15ae7332abf2e586e2637b55d7a7/pkg/kubelet/kuberuntime/logs/logs.go#L277 [k8s_src_var_log_pods]: https://github.com/kubernetes/kubernetes/blob/58596b2bf5eb0d84128fa04d0395ddd148d96e51/pkg/kubelet/kuberuntime/kuberuntime_manager.go#L60 From 68ea6a2974abc6a11164472b3fbf3636f48a7846 Mon Sep 17 00:00:00 2001 From: MOZGIII Date: Wed, 8 Apr 2020 14:40:31 +0300 Subject: [PATCH 019/118] Correct installation commands to explicitly consume namespace Signed-off-by: MOZGIII --- rfcs/2020-04-04-2221-kubernetes-integration.md | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/rfcs/2020-04-04-2221-kubernetes-integration.md b/rfcs/2020-04-04-2221-kubernetes-integration.md index c4fe656f55309..e4f24424cb0db 100644 --- a/rfcs/2020-04-04-2221-kubernetes-integration.md +++ b/rfcs/2020-04-04-2221-kubernetes-integration.md @@ -80,7 +80,6 @@ TODO: insert diagram kind: ConfigMap metadata: name: vector-config - namespace: logging labels: k8s-app: vector data: @@ -118,9 +117,9 @@ TODO: insert diagram `ClusterRoleBinding`: ```shell - kubectl create namespace logging - kubectl apply -f vector-configmap.yaml - kubectl apply -f https://packages.timber.io/vector/latest/kubernetes/vector.yaml + kubectl create namespace vector + kubectl apply --namespace vector -f vector-configmap.yaml + kubectl apply --namespace vector -f https://packages.timber.io/vector/latest/kubernetes/vector.yaml ``` - _See [outstanding questions 3, 4, 5, 6, and 7](#outstanding-questions)._ From b9a1b118ee06106e44c3c9b7ae3371094ae6fe4f Mon Sep 17 00:00:00 2001 From: MOZGIII Date: Wed, 8 Apr 2020 14:42:17 +0300 Subject: [PATCH 020/118] Adjust the installation instructions Signed-off-by: MOZGIII --- rfcs/2020-04-04-2221-kubernetes-integration.md | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/rfcs/2020-04-04-2221-kubernetes-integration.md b/rfcs/2020-04-04-2221-kubernetes-integration.md index e4f24424cb0db..857aba47f57cf 100644 --- a/rfcs/2020-04-04-2221-kubernetes-integration.md +++ b/rfcs/2020-04-04-2221-kubernetes-integration.md @@ -111,10 +111,9 @@ TODO: insert diagram 2. Deploy Vector! - Now that you have your custom `ConfigMap` ready it's time to deploy - Vector. To ensure Vector is isolated and has the necessary permissions - we must create a `Namespace`, `ServiceAccount`, `ClusterRole`, and - `ClusterRoleBinding`: + Now that you have your custom `ConfigMap` ready it's time to deploy Vector. + Create a `Namespace` and apply your `ConfigMap` and our recommended + deployment configuration into it: ```shell kubectl create namespace vector From f0cc72d666fed0777b9a1c33ca07486a5b723ebd Mon Sep 17 00:00:00 2001 From: MOZGIII Date: Wed, 8 Apr 2020 15:16:42 +0300 Subject: [PATCH 021/118] Add a section on deployment variants Signed-off-by: MOZGIII --- .../2020-04-04-2221-kubernetes-integration.md | 25 +++++++++++++++++++ 1 file changed, 25 insertions(+) diff --git a/rfcs/2020-04-04-2221-kubernetes-integration.md b/rfcs/2020-04-04-2221-kubernetes-integration.md index 857aba47f57cf..200a80e4571a9 100644 --- a/rfcs/2020-04-04-2221-kubernetes-integration.md +++ b/rfcs/2020-04-04-2221-kubernetes-integration.md @@ -247,6 +247,30 @@ installations easily upgradable. Everything we need to do to achieve this is outlined at the [The Chart Repository Guide]. +### Deployment Variants + +We have two ways to deploy vector: + +- as a [`DaemonSet`][daemonset]; +- as a [sidecar `Container`][sidecar_container]. + +Deploying as a [`DaemonSet`][daemonset] is trivial, applies cluster-wide and +makes sense to as default scenario for the most use cases. + +Sidecar container deployments make sense when cluster-wide deployment is not +available. This can generally occur when users are not in control of the whole +cluster (for instance in shared clusters, or in highly isolated clusters). +We should provide recommendations for this deployment variant, however, since +people generally know what they're doing in such use cases, and because those +cases are often very custom, we probably don't have to go deeper than explaining +the generic concerns. We should provide enough flexibility at the Vector code +level for those use cases to be possible. + +It is possible to implement a sidecar deployment via implementing an operator +to automatically inject Vector `Container` into `Pod`s (via admission +controller), but that doesn't make a lot of sense for us to work on, since +[`DaemonSet`][daemonset] works for most of use cases already. + ## Prior Art 1. [Filebeat k8s integration] @@ -386,5 +410,6 @@ See [motivation](#motivation). [logdna_daemonset]: https://raw.githubusercontent.com/logdna/logdna-agent/master/logdna-agent-ds.yaml [pr#2134]: https://github.com/timberio/vector/pull/2134 [pr#2188]: https://github.com/timberio/vector/pull/2188 +[sidecar_container]: https://github.com/kubernetes/enhancements/blob/a8262db2ce38b2ec7941bdb6810a8d81c5141447/keps/sig-apps/sidecarcontainers.md [the chart repository guide]: https://helm.sh/docs/topics/chart_repository/ [vector_daemonset]: 2020-04-04-2221-kubernetes-integration/vector-daemonset.yaml From d4315e91afc9cea2ebf4be49dfe37f3d9aba189d Mon Sep 17 00:00:00 2001 From: MOZGIII Date: Wed, 8 Apr 2020 15:17:31 +0300 Subject: [PATCH 022/118] Add steps to attack plan Signed-off-by: MOZGIII --- rfcs/2020-04-04-2221-kubernetes-integration.md | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/rfcs/2020-04-04-2221-kubernetes-integration.md b/rfcs/2020-04-04-2221-kubernetes-integration.md index 200a80e4571a9..4854cbb487fcc 100644 --- a/rfcs/2020-04-04-2221-kubernetes-integration.md +++ b/rfcs/2020-04-04-2221-kubernetes-integration.md @@ -365,6 +365,10 @@ See [motivation](#motivation). - [ ] Don't exit when there are configuration errors. See [issue#1816]. - [ ] Test this. See [issue#2224]. - [ ] Add `kubernetes` source reference documentation. +- [ ] Prepare YAML deployment config. +- [ ] Prepare Heml Chart. +- [ ] Prepare Heml Chart Repository. +- [ ] Integrate kubernetes configuration snapshotting into the release process. - [ ] Add Kubernetes setup/integration guide. - [ ] Release `0.10.0` and announce. From d371aa6fd667fe82556b93f92ff576f6c69f060c Mon Sep 17 00:00:00 2001 From: MOZGIII Date: Thu, 9 Apr 2020 15:03:08 +0300 Subject: [PATCH 023/118] Specify initial minimal supported Kubernetes version Signed-off-by: MOZGIII --- .../2020-04-04-2221-kubernetes-integration.md | 22 +++++++++++++++++-- 1 file changed, 20 insertions(+), 2 deletions(-) diff --git a/rfcs/2020-04-04-2221-kubernetes-integration.md b/rfcs/2020-04-04-2221-kubernetes-integration.md index 4854cbb487fcc..f0cbcd237abcc 100644 --- a/rfcs/2020-04-04-2221-kubernetes-integration.md +++ b/rfcs/2020-04-04-2221-kubernetes-integration.md @@ -153,8 +153,25 @@ decisions that involve MSKV, as well as documentation. A good candidate for such location is a file at `.meta` dir of the Vector repo. `.meta/mskv` for instance. -For the moment, the discussion on the initial MSKV is in progress. The proposed -version is Kubernetes `1.14`. +#### Initial Minimal Supported Kubernetes Version + +Kubernetes 1.14 introduced some significant improvements to how logs files are +organized, putting more useful metadata into the log file path. This allows us +to implement more high-efficient flexible ways to filter what log files we +consume, which is important for preventing Vector from consuming logs that +it itself produces - which is bad since it can potentially result in an +flood-kind DoS. + +We can still offer support for Kubernetes 1.13 and earlier, but it will be +limiting our high-efficient filtering capabilities significantly. It will +also increase the maintenance costs and code complexity. + +On the other hand, Kubernetes pre-1.14 versions are quite rare these days. +At the time of writing, the latest Kubernetes version is 1.18, and, according +to the [Kubernetes version and version skew support policy], only versions +1.18, 1.17 and 1.16 are currently maintained. + +Considering all of the above, we assign **1.14** as the initial MSKV. ### Helm vs raw YAML files @@ -409,6 +426,7 @@ See [motivation](#motivation). [k8s_src_parse_funcs]: https://github.com/kubernetes/kubernetes/blob/e74ad388541b15ae7332abf2e586e2637b55d7a7/pkg/kubelet/kuberuntime/logs/logs.go#L116 [k8s_src_read_logs]: https://github.com/kubernetes/kubernetes/blob/e74ad388541b15ae7332abf2e586e2637b55d7a7/pkg/kubelet/kuberuntime/logs/logs.go#L277 [k8s_src_var_log_pods]: https://github.com/kubernetes/kubernetes/blob/58596b2bf5eb0d84128fa04d0395ddd148d96e51/pkg/kubelet/kuberuntime/kuberuntime_manager.go#L60 +[kubernetes version and version skew support policy]: https://kubernetes.io/docs/setup/release/version-skew-policy/ [kubernetes_version_comment]: https://github.com/timberio/vector/pull/2188#discussion_r403120481 [logdna k8s integration]: https://docs.logdna.com/docs/kubernetes [logdna_daemonset]: https://raw.githubusercontent.com/logdna/logdna-agent/master/logdna-agent-ds.yaml From 20aa12f8897ee78579543c85f4541cc8a2d68787 Mon Sep 17 00:00:00 2001 From: MOZGIII Date: Fri, 10 Apr 2020 13:22:31 +0300 Subject: [PATCH 024/118] Add some questions Signed-off-by: MOZGIII --- rfcs/2020-04-04-2221-kubernetes-integration.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/rfcs/2020-04-04-2221-kubernetes-integration.md b/rfcs/2020-04-04-2221-kubernetes-integration.md index f0cbcd237abcc..f7f191f5f7345 100644 --- a/rfcs/2020-04-04-2221-kubernetes-integration.md +++ b/rfcs/2020-04-04-2221-kubernetes-integration.md @@ -358,6 +358,9 @@ See [motivation](#motivation). [etc][container_runtimes]. Some even use [gVisor] or [Firecracker]. There might be differences in how different container runtimes handle logs. 1. How do we want to approach Helm Chart Repository management. +1. How do we implement liveness, readiness and startup probes? +1. Can we populate file at `terminationMessagePath` with some meaningful + information when we exit or crash? ## Plan Of Attack From 51c456523d5011ba3335c279608d08a7dfa074fe Mon Sep 17 00:00:00 2001 From: MOZGIII Date: Fri, 10 Apr 2020 14:48:52 +0300 Subject: [PATCH 025/118] Add a deployment configuration section Signed-off-by: MOZGIII --- .../2020-04-04-2221-kubernetes-integration.md | 61 +++++++++++++++++++ 1 file changed, 61 insertions(+) diff --git a/rfcs/2020-04-04-2221-kubernetes-integration.md b/rfcs/2020-04-04-2221-kubernetes-integration.md index f7f191f5f7345..e4ceb2690e6c4 100644 --- a/rfcs/2020-04-04-2221-kubernetes-integration.md +++ b/rfcs/2020-04-04-2221-kubernetes-integration.md @@ -288,6 +288,65 @@ to automatically inject Vector `Container` into `Pod`s (via admission controller), but that doesn't make a lot of sense for us to work on, since [`DaemonSet`][daemonset] works for most of use cases already. +### Deployment configuration + +It is important that provide a well-thought for deployment configuration for +Vector as part of our Kubernetes integration. We want to ensure good user +experience, and it includes installation, configuration and upgrading. + +We have to make sure that Vector, being itself an app, runs well in Kubernetes, +and sanely makes use of all the control and monitoring interfaces that +Kubernetes exposes to manage Vector itself. + +We will provide YAML and Helm as deployment options. While Helm configuration is +templated and more generic, and YAML is intended for manual configuration, a lot +of design considerations apply to both of them. + +#### Managing Object + +For the reasons discussed above, we'll be using [`DaemonSet`][daemonset]. + +#### Data directory + +Vector needs a location to keep the disk buffers and other data it requires for +operation at runtime. This directory has to persist across restarts, since it's +essential for some features to function (i.e. not losing buffered data if/while +the sink is gone). + +We'll be using [`DaemonSet`][daemonset], so, naturally, we can leverage +[`hostPath`][k8s_api_host_path_volume_source] volumes. + +We'll be using `hostPath` volumes at our YAML config, and at the Helm Chart +we'll be using this by default, but we'll also allow configuring this to provide +the flexibility users will expect. + +An alternative to `hostPath` volumes would be a user-provided +[persistent volume][k8s_doc_persistent_volumes] of some kind. The only +requirement is that it has to have a `ReadWriteMany` access mode. + +#### Vector config files + +Vector configuration in the Kubernetes environment can generally be split into +two logical parts: a common Kubernetes-related configuration, and a custom +user-supplied configuration. + +A common Kubernetes-related configuration is a part that is generally expected +to be the same (or very similar) across all of the Kubernetes environments. +Things like `kubernetes` source and `kubernetes_pod_metadata` transform belong +there. + +A custom user-supplied configuration specifies a part of the configuration that +contains parameters like what sink to use or what additional filtering or +transformation to apply. This part is expected to be a unique custom thing for +every user. + +Vector supports multiple configuration files, and we can rely on that to ship +a config file with the common configuration part in of our YAML / Helm suite, +and let users keep their custom config part in a separate file. + +We will then mount two `ConfigMap`s into a container, and start Vector in +multiple configuration files mode. + ## Prior Art 1. [Filebeat k8s integration] @@ -424,6 +483,8 @@ See [motivation](#motivation). [issue#2225]: https://github.com/timberio/vector/issues/2225 [json file logging driver]: https://docs.docker.com/config/containers/logging/json-file/ [jsonlines]: http://jsonlines.org/ +[k8s_api_host_path_volume_source]: https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#hostpathvolumesource-v1-core +[k8s_doc_persistent_volumes]: https://kubernetes.io/docs/concepts/storage/persistent-volumes [k8s_log_path_location_docs]: https://kubernetes.io/docs/concepts/cluster-administration/logging/#logging-at-the-node-level [k8s_src_build_container_logs_directory]: https://github.com/kubernetes/kubernetes/blob/31305966789525fca49ec26c289e565467d1f1c4/pkg/kubelet/kuberuntime/helpers.go#L173 [k8s_src_parse_funcs]: https://github.com/kubernetes/kubernetes/blob/e74ad388541b15ae7332abf2e586e2637b55d7a7/pkg/kubelet/kuberuntime/logs/logs.go#L116 From d53dde5a4944ef82e1a255c77e763f35582c3985 Mon Sep 17 00:00:00 2001 From: MOZGIII Date: Fri, 10 Apr 2020 15:10:18 +0300 Subject: [PATCH 026/118] Update guide YAML for multiple config files Signed-off-by: MOZGIII --- rfcs/2020-04-04-2221-kubernetes-integration.md | 15 ++------------- 1 file changed, 2 insertions(+), 13 deletions(-) diff --git a/rfcs/2020-04-04-2221-kubernetes-integration.md b/rfcs/2020-04-04-2221-kubernetes-integration.md index e4ceb2690e6c4..77df1df1b0d16 100644 --- a/rfcs/2020-04-04-2221-kubernetes-integration.md +++ b/rfcs/2020-04-04-2221-kubernetes-integration.md @@ -85,23 +85,12 @@ TODO: insert diagram data: vector.toml: | # Docs: https://vector.dev/docs/ - - # Set global options - data_dir = "/var/tmp/vector" - - # Ingest logs from Kubernetes - [sources.kubernetes] - type = "kubernetes" - - # Enrich logs with Pod metadata - [transforms.pod_metadata] - type = "kubernetes_pod_metadata" - inputs = ["kubernetes"] + # Container logs are available from "kubernetes" input. # Send data to one or more sinks! [sinks.aws_s3] type = "aws_s3" - inputs = ["pod_metadata"] + inputs = ["kubernetes"] bucket = "my-bucket" compression = "gzip" region = "us-east-1" From 7d9ff4e8418e5e2152abf04155a47933e3988efd Mon Sep 17 00:00:00 2001 From: MOZGIII Date: Mon, 13 Apr 2020 19:44:24 +0300 Subject: [PATCH 027/118] Add sections on metadata and filtering Signed-off-by: MOZGIII --- .../2020-04-04-2221-kubernetes-integration.md | 111 ++++++++++++++++++ 1 file changed, 111 insertions(+) diff --git a/rfcs/2020-04-04-2221-kubernetes-integration.md b/rfcs/2020-04-04-2221-kubernetes-integration.md index 77df1df1b0d16..b1ed6078f2d2e 100644 --- a/rfcs/2020-04-04-2221-kubernetes-integration.md +++ b/rfcs/2020-04-04-2221-kubernetes-integration.md @@ -336,6 +336,116 @@ and let users keep their custom config part in a separate file. We will then mount two `ConfigMap`s into a container, and start Vector in multiple configuration files mode. +### Annotating events with metadata from Kubernetes + +Kubernetes has a lot of metadata that can be associated with the logs, and most +of the users expect us to add some parts of that metadata as fields to the +event. + +We already have an implementation that does this in the form of +`kubernetes_pod_metadata` transform. + +It works great, however, as can be seen from the next section, we might need +to implement a very similar functionality at the `kubernetes` source as well to +perform log filtering. So, if we'll be obtaining pod metadata at the +`kubernetes` source, we might as well enhance the event right there. This would +render `kubernetes_pod_metadata` useless, as there would be no use case for +it that wouldn't be covered by `kubernetes` source. + +What parts of metadata we inject into events should be configurable, but we can +and want to offer a sane default here. + +### Origin filtering + +We can do a highly efficient filtering based on the log file path, and a more +comprehensive filtering via metadata from the k8s API, that is, unfortunately, +has a bit move overhead. + +The best user experience is via k8s API, because then we can support filtering +by labels/annotations, which is a standard way of doing things with k8s. + +#### Filtering based on path + +We already do that in our current implementation. + +The idea we can derive some useful parameters from the logs file paths. +For more info on the logs file paths, see the +[File locations][anchor_file_locations] section of this RFC. + +So, Kubernetes 1.14+ [exposes][k8s_src_build_container_logs_directory] the +following information via the file path: + +- `pod namespace` +- `pod name` +- `pod uuid` + +This is enough information for the basic filtering, and the best part is it's +available to us without and extra work - we're reading the files anyways. + +#### Filtering based on Kubernetes API metadata + +Filtering bu Kubernetes metadata is way more advanced and flexible from the user +perspective. + +The idea of doing filtering like that is when Vector picks up a new log file to +process at `kubernetes` source, it has to be able to somehow make a decision on +whether to consume the logs from that file, or to ignore it, based on the state +at the k8s API and Vector configuration. + +This means that there has to be way of making the data from the k8s API related +to the log file available for Vector. + +Based on the k8s API structure, it looks like we should aim for obtaining the +`Pod` object, since it contains the essential information about the containers +that produced the log file. Also, is is the `Pod` objects that `kubelet` relies +on to manage the workloads on the node, so this makes `Pod` objects the best +option for our case, i.e. better than fetching `Deployment` objects. + +There in a number of approaches to get the required `Pod` objects: + +1. Per-file requests. + + The file paths provide enough data for us to make a query to the k8s API. In + fact, we only need a `pod namespace` and a `pod uuid` to successfully obtain + the `Pod` object. + +2. Per-node requests. + + This approach is to list all the pods that are running at the same node as + Vector runs. This effectively lists all the `Pod` objects we could possibly + care about. + +One important thing to note is metadata for the given pod can change over time, +and the implementation has to take that into account, and update the filtering +state accordingly. + +We also can't overload the k8s API with requests. General rule of thumb is we +shouldn't do requests much more often that k8s itself generates events. + +Each approach has very different properties. It is hard to estimate which ones +are a better fit. + +A single watch call for a list of pods running per node (2) should generate +less overhead and would probably be easier to implement. + +Issuing a watch per individual pod (1) is more straightforward, but will +definitely use more sockets. We could speculate that we'll get a smaller latency +than with doing per-node filtering, however it's very unclear if that's the +case. + +Either way, we probably want to keep some form of cache + a circuit breaker to +avoid hitting the k8s API too often. + +#### Filtering based on event fields after annotation + +This is an alternative approach to the previous implementation. + +Current implementation allows doing this, but is has a certain downsides - +the main problem is we're paying the price of reading the log files that are +filtered out completely. + +In most scenarios it'd be a significant overhead, and can lead to cycles. + ## Prior Art 1. [Filebeat k8s integration] @@ -440,6 +550,7 @@ See [motivation](#motivation). - [ ] Add Kubernetes setup/integration guide. - [ ] Release `0.10.0` and announce. +[anchor_file_locations]: #file-locations [bonzai logging operator]: https://github.com/banzaicloud/logging-operator [container_runtimes]: https://kubernetes.io/docs/setup/production-environment/container-runtimes/ [cri_log_format]: https://github.com/kubernetes/community/blob/ee2abbf9dbfa4523b414f99a04ddc97bd38c74b2/contributors/design-proposals/node/kubelet-cri-logging.md From 9b9c9ca52cd2be1510a4fb341497f3efda109e08 Mon Sep 17 00:00:00 2001 From: MOZGIII Date: Mon, 13 Apr 2020 19:45:24 +0300 Subject: [PATCH 028/118] Add new section drafts Signed-off-by: MOZGIII --- rfcs/2020-04-04-2221-kubernetes-integration.md | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/rfcs/2020-04-04-2221-kubernetes-integration.md b/rfcs/2020-04-04-2221-kubernetes-integration.md index b1ed6078f2d2e..346fc68a42787 100644 --- a/rfcs/2020-04-04-2221-kubernetes-integration.md +++ b/rfcs/2020-04-04-2221-kubernetes-integration.md @@ -446,6 +446,16 @@ filtered out completely. In most scenarios it'd be a significant overhead, and can lead to cycles. +### Configuring Vector via Kubernetes API + +#### Annotations and labels on vector pod via downward API + +TODO + +#### Custom CRDs + +TODO + ## Prior Art 1. [Filebeat k8s integration] From 721cf060a3ecf90f2b43bb1c0d2f0cff98210812 Mon Sep 17 00:00:00 2001 From: MOZGIII Date: Mon, 13 Apr 2020 21:45:41 +0300 Subject: [PATCH 029/118] Fill in Helm section Signed-off-by: MOZGIII --- .../2020-04-04-2221-kubernetes-integration.md | 31 ++++++++++++++++++- 1 file changed, 30 insertions(+), 1 deletion(-) diff --git a/rfcs/2020-04-04-2221-kubernetes-integration.md b/rfcs/2020-04-04-2221-kubernetes-integration.md index 346fc68a42787..6bbe02a29050f 100644 --- a/rfcs/2020-04-04-2221-kubernetes-integration.md +++ b/rfcs/2020-04-04-2221-kubernetes-integration.md @@ -116,7 +116,35 @@ TODO: insert diagram #### Helm Interface -TODO: fill in +1. Install [`helm`][helm_install]. + +2. Add our Helm Chart repo. + + ```shell + helm repo add vector https://charts.vector.dev + helm repo update + ``` + +3. Configure Vector. + + TODO: address this when we decide on the helm chart internals. + +4. Deploy Vector! + + ```shell + kubectl create namespace vector + + # Helm v3 + helm install \ + cert-manager vector/vector \ + --namespace vector + + # Helm v2 + helm install \ + --name vector \ + --namespace vector \ + vector/vector + ``` ## Design considerations @@ -575,6 +603,7 @@ See [motivation](#motivation). [fluentd_daemonset]: https://github.com/fluent/fluentd-kubernetes-daemonset/blob/master/fluentd-daemonset-papertrail.yaml [guide_example]: https://vector.dev/guides/integrate/sources/syslog/aws_kinesis_firehose/ [gvisor]: https://github.com/google/gvisor +[helm_install]: https://cert-manager.io/docs/installation/kubernetes/ [honeycomb integration]: https://docs.honeycomb.io/getting-data-in/integrations/kubernetes/ [influx helm charts]: https://github.com/influxdata/helm-charts [issue#1293]: https://github.com/timberio/vector/issues/1293 From caa3df8771f9b80bb109ef3b49ee7d16baf5193b Mon Sep 17 00:00:00 2001 From: MOZGIII Date: Mon, 13 Apr 2020 22:19:18 +0300 Subject: [PATCH 030/118] Add more on helm charts repo Signed-off-by: MOZGIII --- rfcs/2020-04-04-2221-kubernetes-integration.md | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/rfcs/2020-04-04-2221-kubernetes-integration.md b/rfcs/2020-04-04-2221-kubernetes-integration.md index 6bbe02a29050f..a7747fff0023f 100644 --- a/rfcs/2020-04-04-2221-kubernetes-integration.md +++ b/rfcs/2020-04-04-2221-kubernetes-integration.md @@ -281,6 +281,15 @@ installations easily upgradable. Everything we need to do to achieve this is outlined at the [The Chart Repository Guide]. +We can use a tool like [ChartMuseum] to manage our repo. Alternatively, we can +use a bare HTTP server, like AWS S3 or Github Pages. A tool like like +[ChartMuseum] has a benefit of doing some things for us. It can use S3 +for storage, and offers a convenient [helm plugin][helm_push] to release charts, +so the release process should be very simple. + +From the user experience perspective, it would be cool if we expose our chart +repo at `https://charts.vector.dev` - short and easy to remember or even guess. + ### Deployment Variants We have two ways to deploy vector: @@ -590,6 +599,7 @@ See [motivation](#motivation). [anchor_file_locations]: #file-locations [bonzai logging operator]: https://github.com/banzaicloud/logging-operator +[chartmuseum]: https://chartmuseum.com/ [container_runtimes]: https://kubernetes.io/docs/setup/production-environment/container-runtimes/ [cri_log_format]: https://github.com/kubernetes/community/blob/ee2abbf9dbfa4523b414f99a04ddc97bd38c74b2/contributors/design-proposals/node/kubelet-cri-logging.md [daemonset]: https://kubernetes.io/docs/concepts/workloads/controllers/daemonset/ @@ -604,6 +614,7 @@ See [motivation](#motivation). [guide_example]: https://vector.dev/guides/integrate/sources/syslog/aws_kinesis_firehose/ [gvisor]: https://github.com/google/gvisor [helm_install]: https://cert-manager.io/docs/installation/kubernetes/ +[helm_push]: https://github.com/chartmuseum/helm-push [honeycomb integration]: https://docs.honeycomb.io/getting-data-in/integrations/kubernetes/ [influx helm charts]: https://github.com/influxdata/helm-charts [issue#1293]: https://github.com/timberio/vector/issues/1293 From 2f727a8f37141596ac1407d9253dfab64d7cda3e Mon Sep 17 00:00:00 2001 From: MOZGIII Date: Mon, 13 Apr 2020 22:33:25 +0300 Subject: [PATCH 031/118] Add dummy section on Changes to Vector release process Signed-off-by: MOZGIII --- rfcs/2020-04-04-2221-kubernetes-integration.md | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/rfcs/2020-04-04-2221-kubernetes-integration.md b/rfcs/2020-04-04-2221-kubernetes-integration.md index a7747fff0023f..432478b79f075 100644 --- a/rfcs/2020-04-04-2221-kubernetes-integration.md +++ b/rfcs/2020-04-04-2221-kubernetes-integration.md @@ -493,6 +493,10 @@ TODO TODO +### Changes to Vector release process + +TODO + ## Prior Art 1. [Filebeat k8s integration] From 76f05192fdb58d960abebf5b08ebd3ab0e43b710 Mon Sep 17 00:00:00 2001 From: MOZGIII Date: Mon, 13 Apr 2020 23:05:53 +0300 Subject: [PATCH 032/118] Mark some questions as solved Signed-off-by: MOZGIII --- rfcs/2020-04-04-2221-kubernetes-integration.md | 14 ++++++++++---- 1 file changed, 10 insertions(+), 4 deletions(-) diff --git a/rfcs/2020-04-04-2221-kubernetes-integration.md b/rfcs/2020-04-04-2221-kubernetes-integration.md index 432478b79f075..5c8da0130ace0 100644 --- a/rfcs/2020-04-04-2221-kubernetes-integration.md +++ b/rfcs/2020-04-04-2221-kubernetes-integration.md @@ -211,6 +211,7 @@ configuration, but we shouldn't oversimplify it either. It has to be production-ready. But it also has to be portable, in a sense that it should work without tweaking with as much cluster setups as possible. We should support both `kubectl create` and `kubectl apply` flows. +`kubectl apply` is generally more limiting than `kubectl create`. ### Reading container logs @@ -527,8 +528,10 @@ See [motivation](#motivation). ### From Ben -1. What is the minimal Kubernetes version that we want to support. See - [this comment][kubernetes_version_comment]. +1. ~~What is the minimal Kubernetes version that we want to support. See + [this comment][kubernetes_version_comment].~~ + See the [Minimal supported Kubernetes version][anchor_minimal_supported_kubernetes_version] + section. 1. What is the best to avoid Vector from ingesting it's own logs? I'm assuming that my [`kubectl` tutorial](#kubectl-interface) handles this with namespaces? We'd just need to configure Vector to exclude this namespace? @@ -537,8 +540,9 @@ See [motivation](#motivation). offers [four separate configuration files][fluentbit_installation] (`service-account.yaml`, `role.yaml`, `role-binding.yaml`, `configmap.yaml`). Which approach is better? Why are they different? -1. Should we prefer `kubectl create ...` or `kubectl apply ...`? The examples - in the [prior art](#prior-art) section use both. +1. ~~Should we prefer `kubectl create ...` or `kubectl apply ...`? The examples + in the [prior art](#prior-art) section use both.~~ + See [Helm vs raw YAML files][anchor_helm_vs_raw_yaml_files] section. 1. From what I understand, Vector requires the Kubernetes `watch` verb in order to receive updates to k8s cluster changes. This is required for the `kubernetes_pod_metadata` transform. Yet, Fluentbit [requires the `get`, @@ -602,6 +606,8 @@ See [motivation](#motivation). - [ ] Release `0.10.0` and announce. [anchor_file_locations]: #file-locations +[anchor_helm_vs_raw_yaml_files]: #helm-vs-raw-yaml-files +[anchor_minimal_supported_kubernetes_version]: #minimal-supported-kubernetes-version [bonzai logging operator]: https://github.com/banzaicloud/logging-operator [chartmuseum]: https://chartmuseum.com/ [container_runtimes]: https://kubernetes.io/docs/setup/production-environment/container-runtimes/ From a2914d03a39ecea464d094e4dcec3ffea86c914f Mon Sep 17 00:00:00 2001 From: MOZGIII Date: Mon, 13 Apr 2020 23:06:34 +0300 Subject: [PATCH 033/118] Corrected the Deployment configuration section Signed-off-by: MOZGIII --- rfcs/2020-04-04-2221-kubernetes-integration.md | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/rfcs/2020-04-04-2221-kubernetes-integration.md b/rfcs/2020-04-04-2221-kubernetes-integration.md index 5c8da0130ace0..2a9f7095ee000 100644 --- a/rfcs/2020-04-04-2221-kubernetes-integration.md +++ b/rfcs/2020-04-04-2221-kubernetes-integration.md @@ -317,7 +317,7 @@ controller), but that doesn't make a lot of sense for us to work on, since ### Deployment configuration -It is important that provide a well-thought for deployment configuration for +It is important that provide a well-thought deployment configuration for Vector as part of our Kubernetes integration. We want to ensure good user experience, and it includes installation, configuration and upgrading. @@ -372,7 +372,8 @@ a config file with the common configuration part in of our YAML / Helm suite, and let users keep their custom config part in a separate file. We will then mount two `ConfigMap`s into a container, and start Vector in -multiple configuration files mode. +multiple configuration files mode +(`vector --config .../common.toml --config .../custom.toml`). ### Annotating events with metadata from Kubernetes From af409e5104e2696621574f58764d55c89f2a2535 Mon Sep 17 00:00:00 2001 From: MOZGIII Date: Mon, 13 Apr 2020 23:32:25 +0300 Subject: [PATCH 034/118] Add a "Strategy on YAML file grouping" section Signed-off-by: MOZGIII --- .../2020-04-04-2221-kubernetes-integration.md | 50 ++++++++++++++++++- 1 file changed, 49 insertions(+), 1 deletion(-) diff --git a/rfcs/2020-04-04-2221-kubernetes-integration.md b/rfcs/2020-04-04-2221-kubernetes-integration.md index 2a9f7095ee000..4ab3ad2810902 100644 --- a/rfcs/2020-04-04-2221-kubernetes-integration.md +++ b/rfcs/2020-04-04-2221-kubernetes-integration.md @@ -107,7 +107,8 @@ TODO: insert diagram ```shell kubectl create namespace vector kubectl apply --namespace vector -f vector-configmap.yaml - kubectl apply --namespace vector -f https://packages.timber.io/vector/latest/kubernetes/vector.yaml + kubectl apply -f https://packages.timber.io/vector/latest/kubernetes/vector-global.yaml + kubectl apply --namespace vector -f https://packages.timber.io/vector/latest/kubernetes/vector-namespaced.yaml ``` - _See [outstanding questions 3, 4, 5, 6, and 7](#outstanding-questions)._ @@ -353,6 +354,8 @@ requirement is that it has to have a `ReadWriteMany` access mode. #### Vector config files +> This section is about Vector `.toml` config files. + Vector configuration in the Kubernetes environment can generally be split into two logical parts: a common Kubernetes-related configuration, and a custom user-supplied configuration. @@ -375,6 +378,51 @@ We will then mount two `ConfigMap`s into a container, and start Vector in multiple configuration files mode (`vector --config .../common.toml --config .../custom.toml`). +#### Strategy on YAML file grouping + +> This section is about Kubernetes `.yaml` files. + +YAML files storing Kubernetes API objects configuration can be grouped +differently. + +The layout proposed in [guide above](#kubectl-interface) is what we're planing +to use. It is in line with the sections above on Vector configuration splitting +into the common and custom parts. + +The idea is to have a single file with a namespaced configuration (`DaemonSet`, +`ServiceAccount`, `ClusterRoleBinding`, common `ConfigMap`, etc), a single file +with a global (non-namespaced) configuration (mainly just `ClusterRole`) and a +user-supplied file containing just a `ConfigMap` with the custom part of the +Vector configuration. Three `.yaml` files in total, two of which are supplied by +us, and one is created by the user. + +Ideally we'd want to make the presence of the user-supplied optional, but it +just doesn't make sense, because sink has to be configured somewhere. + +We can offer some simple "typical custom configurations" at our documentation as +an example: + +- with a sink to push data to our Alloy; +- with a cluster-agnosic `elasticsearch` sink; +- for AWS clusters, with a `cloudwatch` sink; +- etc... + +We must be careful with our `.yaml` files to make them play well with not just +`kubectl create -f`, but also with `kubectl apply -f`. There are often issues +with impotency when labels and selectors aren't configured properly and we +should be wary of that. + +##### Considered Alternatives + +We can use a separate `.yaml` file per object. +That's more inconvenient since we'll need users to execute more commands, yet it +doesn't seems like it provides any benefit. + +We expect users to "fork" and adjust our config files as they see fit, so +they'll be able to split the files if required. They then maintain their +configuration on their own, and we assume they're capable and know what they're +doing. + ### Annotating events with metadata from Kubernetes Kubernetes has a lot of metadata that can be associated with the logs, and most From 6ec01f0c69a9724d4becfe79590d6fb16be78e40 Mon Sep 17 00:00:00 2001 From: MOZGIII Date: Mon, 13 Apr 2020 23:56:29 +0300 Subject: [PATCH 035/118] Add a section on vector config file reloads Signed-off-by: MOZGIII --- rfcs/2020-04-04-2221-kubernetes-integration.md | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/rfcs/2020-04-04-2221-kubernetes-integration.md b/rfcs/2020-04-04-2221-kubernetes-integration.md index 4ab3ad2810902..1d46298f2f87e 100644 --- a/rfcs/2020-04-04-2221-kubernetes-integration.md +++ b/rfcs/2020-04-04-2221-kubernetes-integration.md @@ -378,6 +378,16 @@ We will then mount two `ConfigMap`s into a container, and start Vector in multiple configuration files mode (`vector --config .../common.toml --config .../custom.toml`). +#### Vector config file reloads + +It is best to explicitly disable reloads in our default deployment +configuration, because this provides more reliability that [eventually consistent +`ConfigMap` updates][configmap_updates]. + +Users can recreate the `Pod`s (thus restarting Vector, and making it aware of +the new config) via +[`kubectl rollout restart -n vector daemonset/vector`][kubectl_rollout_restart]. + #### Strategy on YAML file grouping > This section is about Kubernetes `.yaml` files. @@ -659,6 +669,7 @@ See [motivation](#motivation). [anchor_minimal_supported_kubernetes_version]: #minimal-supported-kubernetes-version [bonzai logging operator]: https://github.com/banzaicloud/logging-operator [chartmuseum]: https://chartmuseum.com/ +[configmap_updates]: https://kubernetes.io/docs/tasks/configure-pod-container/configure-pod-configmap/#mounted-configmaps-are-updated-automatically [container_runtimes]: https://kubernetes.io/docs/setup/production-environment/container-runtimes/ [cri_log_format]: https://github.com/kubernetes/community/blob/ee2abbf9dbfa4523b414f99a04ddc97bd38c74b2/contributors/design-proposals/node/kubelet-cri-logging.md [daemonset]: https://kubernetes.io/docs/concepts/workloads/controllers/daemonset/ @@ -699,6 +710,7 @@ See [motivation](#motivation). [k8s_src_parse_funcs]: https://github.com/kubernetes/kubernetes/blob/e74ad388541b15ae7332abf2e586e2637b55d7a7/pkg/kubelet/kuberuntime/logs/logs.go#L116 [k8s_src_read_logs]: https://github.com/kubernetes/kubernetes/blob/e74ad388541b15ae7332abf2e586e2637b55d7a7/pkg/kubelet/kuberuntime/logs/logs.go#L277 [k8s_src_var_log_pods]: https://github.com/kubernetes/kubernetes/blob/58596b2bf5eb0d84128fa04d0395ddd148d96e51/pkg/kubelet/kuberuntime/kuberuntime_manager.go#L60 +[kubectl_rollout_restart]: https://kubernetes.io/docs/reference/generated/kubectl/kubectl-commands#-em-restart-em- [kubernetes version and version skew support policy]: https://kubernetes.io/docs/setup/release/version-skew-policy/ [kubernetes_version_comment]: https://github.com/timberio/vector/pull/2188#discussion_r403120481 [logdna k8s integration]: https://docs.logdna.com/docs/kubernetes From 57771c9dc6be44a9070467c76721157581380de2 Mon Sep 17 00:00:00 2001 From: MOZGIII Date: Wed, 15 Apr 2020 15:34:59 +0300 Subject: [PATCH 036/118] Minor grammar and styling corrections Signed-off-by: MOZGIII --- .../2020-04-04-2221-kubernetes-integration.md | 38 +++++++++---------- 1 file changed, 19 insertions(+), 19 deletions(-) diff --git a/rfcs/2020-04-04-2221-kubernetes-integration.md b/rfcs/2020-04-04-2221-kubernetes-integration.md index 1d46298f2f87e..49ac95f2c7af1 100644 --- a/rfcs/2020-04-04-2221-kubernetes-integration.md +++ b/rfcs/2020-04-04-2221-kubernetes-integration.md @@ -182,7 +182,7 @@ flood-kind DoS. We can still offer support for Kubernetes 1.13 and earlier, but it will be limiting our high-efficient filtering capabilities significantly. It will -also increase the maintenance costs and code complexity. +also increase maintenance costs and code complexity. On the other hand, Kubernetes pre-1.14 versions are quite rare these days. At the time of writing, the latest Kubernetes version is 1.18, and, according @@ -209,8 +209,8 @@ With raw YAML files, they have to be usable out of the box, but we shouldn't expect users to use them as-is. People would often maintain their own "forks" of those, tailored to their use case. We shouldn't overcomplicate our recommended configuration, but we shouldn't oversimplify it either. It has to be -production-ready. But it also has to be portable, in a sense that it should work -without tweaking with as much cluster setups as possible. +production-ready. But it also has to be portable, in the sense that it should +work without tweaking with as much cluster setups as possible. We should support both `kubectl create` and `kubectl apply` flows. `kubectl apply` is generally more limiting than `kubectl create`. @@ -285,7 +285,7 @@ Everything we need to do to achieve this is outlined at the We can use a tool like [ChartMuseum] to manage our repo. Alternatively, we can use a bare HTTP server, like AWS S3 or Github Pages. A tool like like -[ChartMuseum] has a benefit of doing some things for us. It can use S3 +[ChartMuseum] has the benefit of doing some things for us. It can use S3 for storage, and offers a convenient [helm plugin][helm_push] to release charts, so the release process should be very simple. @@ -314,13 +314,13 @@ level for those use cases to be possible. It is possible to implement a sidecar deployment via implementing an operator to automatically inject Vector `Container` into `Pod`s (via admission controller), but that doesn't make a lot of sense for us to work on, since -[`DaemonSet`][daemonset] works for most of use cases already. +[`DaemonSet`][daemonset] works for most of the use cases already. ### Deployment configuration -It is important that provide a well-thought deployment configuration for +It is important that provide a well-thought deployment configuration for the Vector as part of our Kubernetes integration. We want to ensure good user -experience, and it includes installation, configuration and upgrading. +experience, and it includes installation, configuration, and upgrading. We have to make sure that Vector, being itself an app, runs well in Kubernetes, and sanely makes use of all the control and monitoring interfaces that @@ -450,18 +450,18 @@ render `kubernetes_pod_metadata` useless, as there would be no use case for it that wouldn't be covered by `kubernetes` source. What parts of metadata we inject into events should be configurable, but we can -and want to offer a sane default here. +and want to offer sane defaults here. ### Origin filtering -We can do a highly efficient filtering based on the log file path, and a more +We can do highly efficient filtering based on the log file path, and a more comprehensive filtering via metadata from the k8s API, that is, unfortunately, has a bit move overhead. The best user experience is via k8s API, because then we can support filtering by labels/annotations, which is a standard way of doing things with k8s. -#### Filtering based on path +#### Filtering based on the log file path We already do that in our current implementation. @@ -485,15 +485,15 @@ Filtering bu Kubernetes metadata is way more advanced and flexible from the user perspective. The idea of doing filtering like that is when Vector picks up a new log file to -process at `kubernetes` source, it has to be able to somehow make a decision on -whether to consume the logs from that file, or to ignore it, based on the state -at the k8s API and Vector configuration. +process at `kubernetes` source, it has to be able to somehow decide on whether +to consume the logs from that file, or to ignore it, based on the state at the +k8s API and the Vector configuration. -This means that there has to be way of making the data from the k8s API related -to the log file available for Vector. +This means that there has to be a way to make the data from the k8s API related +to the log file available to Vector. Based on the k8s API structure, it looks like we should aim for obtaining the -`Pod` object, since it contains the essential information about the containers +`Pod` object, since it contains essential information about the containers that produced the log file. Also, is is the `Pod` objects that `kubelet` relies on to manage the workloads on the node, so this makes `Pod` objects the best option for our case, i.e. better than fetching `Deployment` objects. @@ -516,8 +516,8 @@ One important thing to note is metadata for the given pod can change over time, and the implementation has to take that into account, and update the filtering state accordingly. -We also can't overload the k8s API with requests. General rule of thumb is we -shouldn't do requests much more often that k8s itself generates events. +We also can't overload the k8s API with requests. The general rule of thumb is +we shouldn't do requests much more often that k8s itself generates events. Each approach has very different properties. It is hard to estimate which ones are a better fit. @@ -537,7 +537,7 @@ avoid hitting the k8s API too often. This is an alternative approach to the previous implementation. -Current implementation allows doing this, but is has a certain downsides - +The current implementation allows doing this, but is has certain downsides - the main problem is we're paying the price of reading the log files that are filtered out completely. From 7f49a00febabc4f80ae8a52465eec467ffa4c5f4 Mon Sep 17 00:00:00 2001 From: MOZGIII Date: Wed, 15 Apr 2020 16:00:02 +0300 Subject: [PATCH 037/118] Add a note on k8s API server availability and `Pod` objects cache Signed-off-by: MOZGIII --- .../2020-04-04-2221-kubernetes-integration.md | 27 +++++++++++++++++++ 1 file changed, 27 insertions(+) diff --git a/rfcs/2020-04-04-2221-kubernetes-integration.md b/rfcs/2020-04-04-2221-kubernetes-integration.md index 49ac95f2c7af1..a7c37b3646db9 100644 --- a/rfcs/2020-04-04-2221-kubernetes-integration.md +++ b/rfcs/2020-04-04-2221-kubernetes-integration.md @@ -533,6 +533,32 @@ case. Either way, we probably want to keep some form of cache + a circuit breaker to avoid hitting the k8s API too often. +##### A note on k8s API server availability and `Pod` objects cache + +One downside is we'll probably have to stall the events originated from a +particular log file until we obtain the data from k8s API and decide whether +to allow that file or filter it. During disasters, if the API server becomes +unavailable, we'll end up stalling the events for which we don't have `Pod` +object data cached. It is a good idea to handle this elegantly, for instance +if we detect that k8s API is gone, we should pause cache-busting until it comes +up again - because no changes can ever arrive while k8s API server is down, and +it makes sense to keep the cache while it's happening. + +We're in a good position here, because we have a good understanding of the +system properties, and can intelligently handle k8s API server being down. + +Since we'll be stalling the events while we don't have the `Pod` object, there's +an edge case where we won't be able to ship the events for a prolonged time. +This scenario occurs when a new pod is added to the node and then kubernetes API +server goes down. If `kubelet` picks up the update and starts the containers, +and they start producing logs, but Vector at the same node doesn't get the +update - we're going to stall the logs indefinitely. Ideally, we'd want to talk +to the `kubelet` instead of the API server to get the `Pod` object data - since +it's local (hence has a much higher chance to be present) and has even more +authoritative information, in a sense, than the API server on what pods are +actually running on the node. However there's currently no interface to the +`kubelet` we could utilize for that. + #### Filtering based on event fields after annotation This is an alternative approach to the previous implementation. @@ -633,6 +659,7 @@ See [motivation](#motivation). 1. How do we implement liveness, readiness and startup probes? 1. Can we populate file at `terminationMessagePath` with some meaningful information when we exit or crash? +1. Can we allow passing arbitrary fields from the `Pod` object to the event? ## Plan Of Attack From 51d7f21a1627108817f153de03c1141b20c6ece9 Mon Sep 17 00:00:00 2001 From: MOZGIII Date: Wed, 15 Apr 2020 16:01:04 +0300 Subject: [PATCH 038/118] Add a description of the technical approach for metadata annotation Signed-off-by: MOZGIII --- rfcs/2020-04-04-2221-kubernetes-integration.md | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/rfcs/2020-04-04-2221-kubernetes-integration.md b/rfcs/2020-04-04-2221-kubernetes-integration.md index a7c37b3646db9..c36c6b7efe31f 100644 --- a/rfcs/2020-04-04-2221-kubernetes-integration.md +++ b/rfcs/2020-04-04-2221-kubernetes-integration.md @@ -452,6 +452,16 @@ it that wouldn't be covered by `kubernetes` source. What parts of metadata we inject into events should be configurable, but we can and want to offer sane defaults here. +Technically, the approach implemented at `kubernetes_pod_metadata` already is +pretty good. + +One small detail is that we probably want to allow adding arbitrary fields from +the `Pod` object record to the event, instead of a predefined set of fields. +The rationale is we can never imagine all the use cases people could have +in the k8s environment, so we probably should be as flexible as possible. +There doesn't seem to be any technical barriers preventing us from offering +this. + ### Origin filtering We can do highly efficient filtering based on the log file path, and a more From 2e01a7d279dec8ec959419bb8abaf96f21b1a687 Mon Sep 17 00:00:00 2001 From: MOZGIII Date: Wed, 15 Apr 2020 16:41:37 +0300 Subject: [PATCH 039/118] Fill in the annotations and labels on vector pod via downward API Signed-off-by: MOZGIII --- .../2020-04-04-2221-kubernetes-integration.md | 54 ++++++++++++++++++- 1 file changed, 53 insertions(+), 1 deletion(-) diff --git a/rfcs/2020-04-04-2221-kubernetes-integration.md b/rfcs/2020-04-04-2221-kubernetes-integration.md index c36c6b7efe31f..097febe0f5f14 100644 --- a/rfcs/2020-04-04-2221-kubernetes-integration.md +++ b/rfcs/2020-04-04-2221-kubernetes-integration.md @@ -583,7 +583,58 @@ In most scenarios it'd be a significant overhead, and can lead to cycles. #### Annotations and labels on vector pod via downward API -TODO +We might want to implement support for configuring Vector via annotations +and/or labels in addition to the configuration files at the `ConfigMap`s. + +This actually should be a pretty easy thing to do with a [downward API]. It +exposes pod data as files, so all we need is a slightly altered configuration +loading procedure. + +This is how is would look like (very simplified): + +```yaml +apiVersion: v1 +kind: Pod +metadata: + name: kubernetes-downwardapi-volume-example + annotations: + vector.dev/config: | + [sinks.aws_s3] + type = "aws_s3" + inputs = ["kubernetes"] + bucket = "my-bucket" + compression = "gzip" + region = "us-east-1" + key_prefix = "date=%F/" +spec: + containers: + - name: vector + image: vector-image + command: + ["vector", "--k8s-downward-api-config", "/etc/podinfo/annotations"] + volumeMounts: + - name: podinfo + mountPath: /etc/podinfo + volumes: + - name: podinfo + downwardAPI: + items: + - path: "annotations" + fieldRef: + fieldPath: metadata.annotations +``` + +The `/etc/podinfo/annotations` file will look something like this: + +``` +kubernetes.io/config.seen="2020-04-15T13:35:27.290739039Z" +kubernetes.io/config.source="api" +vector.dev/config="[sinks.aws_s3]\ntype = \"aws_s3\"\ninputs = [\"kubernetes\"]\nbucket = \"my-bucket\"\ncompression = \"gzip\"\nregion = \"us-east-1\"\nkey_prefix = \"date=%F/\"\n" +``` + +It's quite trivial to extract the configration. + +While possible, this is outside of the scope of the initial integration. #### Custom CRDs @@ -710,6 +761,7 @@ See [motivation](#motivation). [container_runtimes]: https://kubernetes.io/docs/setup/production-environment/container-runtimes/ [cri_log_format]: https://github.com/kubernetes/community/blob/ee2abbf9dbfa4523b414f99a04ddc97bd38c74b2/contributors/design-proposals/node/kubelet-cri-logging.md [daemonset]: https://kubernetes.io/docs/concepts/workloads/controllers/daemonset/ +[downward api]: https://kubernetes.io/docs/tasks/inject-data-application/downward-api-volume-expose-pod-information/#store-pod-fields [filebeat k8s integration]: https://www.elastic.co/guide/en/beats/filebeat/master/running-on-kubernetes.html [firecracker]: https://github.com/firecracker-microvm/firecracker [fluentbit k8s integration]: https://docs.fluentbit.io/manual/installation/kubernetes From d93be2337bb6695d9dcf6ea9b9b1ec608afae08f Mon Sep 17 00:00:00 2001 From: MOZGIII Date: Wed, 15 Apr 2020 16:49:02 +0300 Subject: [PATCH 040/118] Fill in the section on Custom CRDs Signed-off-by: MOZGIII --- rfcs/2020-04-04-2221-kubernetes-integration.md | 16 +++++++++++++++- 1 file changed, 15 insertions(+), 1 deletion(-) diff --git a/rfcs/2020-04-04-2221-kubernetes-integration.md b/rfcs/2020-04-04-2221-kubernetes-integration.md index 097febe0f5f14..9847265a8688f 100644 --- a/rfcs/2020-04-04-2221-kubernetes-integration.md +++ b/rfcs/2020-04-04-2221-kubernetes-integration.md @@ -638,7 +638,19 @@ While possible, this is outside of the scope of the initial integration. #### Custom CRDs -TODO +A much more involved feature than the one above would be making `Vector` +configurable via [`Custom Resource Definition`][k8s_docs_crds]. + +This feature is not considered for the initial integration with Kubernetes, and +is not even explored, since it is a way more advanced level of integration that +we can achieve in the short term in the near future. + +This section is here for completeness, and we would probably like to explore +this in the future. + +This includes both adding the support for CRDs to Vector itself, and +implementing an orchestrating component (such things are usually called +[operators][k8s_docs_operator] in the k8s context, i.e. `vector-operator`). ### Changes to Vector release process @@ -794,6 +806,8 @@ See [motivation](#motivation). [jsonlines]: http://jsonlines.org/ [k8s_api_host_path_volume_source]: https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#hostpathvolumesource-v1-core [k8s_doc_persistent_volumes]: https://kubernetes.io/docs/concepts/storage/persistent-volumes +[k8s_docs_crds]: https://kubernetes.io/docs/tasks/access-kubernetes-api/custom-resources/custom-resource-definitions/ +[k8s_docs_operator]: https://kubernetes.io/docs/concepts/extend-kubernetes/operator/ [k8s_log_path_location_docs]: https://kubernetes.io/docs/concepts/cluster-administration/logging/#logging-at-the-node-level [k8s_src_build_container_logs_directory]: https://github.com/kubernetes/kubernetes/blob/31305966789525fca49ec26c289e565467d1f1c4/pkg/kubelet/kuberuntime/helpers.go#L173 [k8s_src_parse_funcs]: https://github.com/kubernetes/kubernetes/blob/e74ad388541b15ae7332abf2e586e2637b55d7a7/pkg/kubelet/kuberuntime/logs/logs.go#L116 From a6cdcbd53fec1cf636bcdaf6059125d7cb761e8c Mon Sep 17 00:00:00 2001 From: MOZGIII Date: Wed, 15 Apr 2020 17:04:06 +0300 Subject: [PATCH 041/118] Add Awesome Operators List Signed-off-by: MOZGIII --- rfcs/2020-04-04-2221-kubernetes-integration.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/rfcs/2020-04-04-2221-kubernetes-integration.md b/rfcs/2020-04-04-2221-kubernetes-integration.md index 9847265a8688f..482b0fee6250b 100644 --- a/rfcs/2020-04-04-2221-kubernetes-integration.md +++ b/rfcs/2020-04-04-2221-kubernetes-integration.md @@ -668,6 +668,7 @@ TODO deployment strategies and topologies. There are likely some very useful and interesting tactics in their approach though. 1. [Influx Helm charts] +1. [Awesome Operators List] - an "awesome list" of operators. ## Sales Pitch @@ -767,6 +768,7 @@ See [motivation](#motivation). [anchor_file_locations]: #file-locations [anchor_helm_vs_raw_yaml_files]: #helm-vs-raw-yaml-files [anchor_minimal_supported_kubernetes_version]: #minimal-supported-kubernetes-version +[awesome operators list]: https://github.com/operator-framework/awesome-operators [bonzai logging operator]: https://github.com/banzaicloud/logging-operator [chartmuseum]: https://chartmuseum.com/ [configmap_updates]: https://kubernetes.io/docs/tasks/configure-pod-container/configure-pod-configmap/#mounted-configmaps-are-updated-automatically From de0e8309bcb576179a46485be7d77b7b8b21d000 Mon Sep 17 00:00:00 2001 From: MOZGIII Date: Wed, 15 Apr 2020 23:44:34 +0300 Subject: [PATCH 042/118] Add a practical example on event filtering via metadata Signed-off-by: MOZGIII --- .../2020-04-04-2221-kubernetes-integration.md | 41 +++++++++++++++++++ 1 file changed, 41 insertions(+) diff --git a/rfcs/2020-04-04-2221-kubernetes-integration.md b/rfcs/2020-04-04-2221-kubernetes-integration.md index 482b0fee6250b..5f4bf1b1cf1c1 100644 --- a/rfcs/2020-04-04-2221-kubernetes-integration.md +++ b/rfcs/2020-04-04-2221-kubernetes-integration.md @@ -569,6 +569,47 @@ authoritative information, in a sense, than the API server on what pods are actually running on the node. However there's currently no interface to the `kubelet` we could utilize for that. +##### Practical example of filtering by annotation + +Here's an example of an `nginx` deployment. + +```yaml +apiVersion: apps/v1 +kind: Deployment +metadata: + name: nginx-deployment + labels: + app: nginx +spec: + replicas: 3 + selector: + matchLabels: + app: nginx + template: + metadata: + labels: + app: nginx + annotations: + vector.dev/exclude: "true" + spec: + containers: + - name: nginx + image: nginx:1.14.2 + ports: + - containerPort: 80 +``` + +The `vector.dev/exclude: "true"` +`annotation` at the `PodTemplateSpec` is intended to let Vector know that it +shouldn't collect logs from the relevant `Pod`s. + +Upon picking us a new log file for processing, Vector is intended to read the +`Pod` object, see the `vector.dev/exclude: "true"` annotation and ignore the +log file altogether. This should save take much less resources compared to +reading logs files into events and then filtering them out. + +This is also a perfectly valid way of filtering out logs of Vector itself. + #### Filtering based on event fields after annotation This is an alternative approach to the previous implementation. From 8452ccf5ec8a7a8a34995b071e22dc892362b87b Mon Sep 17 00:00:00 2001 From: MOZGIII Date: Wed, 15 Apr 2020 23:46:53 +0300 Subject: [PATCH 043/118] Fix typo at downward API section Signed-off-by: MOZGIII --- rfcs/2020-04-04-2221-kubernetes-integration.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/rfcs/2020-04-04-2221-kubernetes-integration.md b/rfcs/2020-04-04-2221-kubernetes-integration.md index 5f4bf1b1cf1c1..5939436566011 100644 --- a/rfcs/2020-04-04-2221-kubernetes-integration.md +++ b/rfcs/2020-04-04-2221-kubernetes-integration.md @@ -673,7 +673,7 @@ kubernetes.io/config.source="api" vector.dev/config="[sinks.aws_s3]\ntype = \"aws_s3\"\ninputs = [\"kubernetes\"]\nbucket = \"my-bucket\"\ncompression = \"gzip\"\nregion = \"us-east-1\"\nkey_prefix = \"date=%F/\"\n" ``` -It's quite trivial to extract the configration. +It's quite trivial to extract the configuration. While possible, this is outside of the scope of the initial integration. From 52140942be585eb4f2e8b4d13595a30c1e094ea9 Mon Sep 17 00:00:00 2001 From: MOZGIII Date: Wed, 15 Apr 2020 23:48:54 +0300 Subject: [PATCH 044/118] Correct markdown lint Signed-off-by: MOZGIII --- rfcs/2020-04-04-2221-kubernetes-integration.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/rfcs/2020-04-04-2221-kubernetes-integration.md b/rfcs/2020-04-04-2221-kubernetes-integration.md index 5939436566011..b413c2ac9d9fb 100644 --- a/rfcs/2020-04-04-2221-kubernetes-integration.md +++ b/rfcs/2020-04-04-2221-kubernetes-integration.md @@ -667,7 +667,7 @@ spec: The `/etc/podinfo/annotations` file will look something like this: -``` +```text kubernetes.io/config.seen="2020-04-15T13:35:27.290739039Z" kubernetes.io/config.source="api" vector.dev/config="[sinks.aws_s3]\ntype = \"aws_s3\"\ninputs = [\"kubernetes\"]\nbucket = \"my-bucket\"\ncompression = \"gzip\"\nregion = \"us-east-1\"\nkey_prefix = \"date=%F/\"\n" From 370c42f39a89316930c5004e29f758452915eaa2 Mon Sep 17 00:00:00 2001 From: MOZGIII Date: Thu, 16 Apr 2020 00:36:49 +0300 Subject: [PATCH 045/118] Add a TODO section on testing with some notes Signed-off-by: MOZGIII --- rfcs/2020-04-04-2221-kubernetes-integration.md | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/rfcs/2020-04-04-2221-kubernetes-integration.md b/rfcs/2020-04-04-2221-kubernetes-integration.md index b413c2ac9d9fb..720be782626b6 100644 --- a/rfcs/2020-04-04-2221-kubernetes-integration.md +++ b/rfcs/2020-04-04-2221-kubernetes-integration.md @@ -697,6 +697,15 @@ implementing an orchestrating component (such things are usually called TODO +### Testing + +TODO + +- integration tests are cluster-agnostic +- at CI we test against `minikube` and against all versions from MSKV till the latest k8s +- at test harness we run non-ephemeral "real" clusters and test against them (i.e. GCP GKE, AWS EKS, Azure K8s, DO K8s, RedHat OpenShift, Rancher, CoreOS Tekton, etc) +- we integrate our unit tests into test harness in such a way that we can run them as correctness tests + ## Prior Art 1. [Filebeat k8s integration] From 332cd3a4d5d48b24e62e9493ac82484ffc619d73 Mon Sep 17 00:00:00 2001 From: MOZGIII Date: Thu, 16 Apr 2020 00:41:01 +0300 Subject: [PATCH 046/118] Add support for optional config files to the plan of attack Signed-off-by: MOZGIII --- rfcs/2020-04-04-2221-kubernetes-integration.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/rfcs/2020-04-04-2221-kubernetes-integration.md b/rfcs/2020-04-04-2221-kubernetes-integration.md index 720be782626b6..a916082d16853 100644 --- a/rfcs/2020-04-04-2221-kubernetes-integration.md +++ b/rfcs/2020-04-04-2221-kubernetes-integration.md @@ -804,6 +804,9 @@ See [motivation](#motivation). - [ ] Merge split logs. See [pr#2134]. - [ ] Audit and improve the `kubernetes_pod_matadata` transform. - [ ] Use the `log_schema.kubernetes_key` setting. See [issue#1867]. +- [ ] Add a way to load optional config files (i.e. load config file if it + exists, and ignore it if it doesn't). Required to elegantly load multiple + files so that we can split the configuration. - [ ] Ensure our config reload strategy is solid. - [ ] Don't exit when there are configuration errors. See [issue#1816]. - [ ] Test this. See [issue#2224]. From 5c3e45b56763e54370cda33f518d4e3e901533f8 Mon Sep 17 00:00:00 2001 From: MOZGIII Date: Thu, 16 Apr 2020 00:42:56 +0300 Subject: [PATCH 047/118] Remove config reload from the plan of attack The config reload is something we explicitly want to disable rather than rely on here, so we can skip it and address separately. See relevant chapter. Signed-off-by: MOZGIII --- rfcs/2020-04-04-2221-kubernetes-integration.md | 3 --- 1 file changed, 3 deletions(-) diff --git a/rfcs/2020-04-04-2221-kubernetes-integration.md b/rfcs/2020-04-04-2221-kubernetes-integration.md index a916082d16853..31eb0d215f4b8 100644 --- a/rfcs/2020-04-04-2221-kubernetes-integration.md +++ b/rfcs/2020-04-04-2221-kubernetes-integration.md @@ -807,9 +807,6 @@ See [motivation](#motivation). - [ ] Add a way to load optional config files (i.e. load config file if it exists, and ignore it if it doesn't). Required to elegantly load multiple files so that we can split the configuration. -- [ ] Ensure our config reload strategy is solid. - - [ ] Don't exit when there are configuration errors. See [issue#1816]. - - [ ] Test this. See [issue#2224]. - [ ] Add `kubernetes` source reference documentation. - [ ] Prepare YAML deployment config. - [ ] Prepare Heml Chart. From 229a223dd592b4cadbe113560b3bc825e61d367a Mon Sep 17 00:00:00 2001 From: MOZGIII Date: Thu, 16 Apr 2020 00:49:01 +0300 Subject: [PATCH 048/118] Add a link to ConfigMapVolumeSource Signed-off-by: MOZGIII --- rfcs/2020-04-04-2221-kubernetes-integration.md | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/rfcs/2020-04-04-2221-kubernetes-integration.md b/rfcs/2020-04-04-2221-kubernetes-integration.md index 31eb0d215f4b8..1e765e8274b22 100644 --- a/rfcs/2020-04-04-2221-kubernetes-integration.md +++ b/rfcs/2020-04-04-2221-kubernetes-integration.md @@ -374,8 +374,8 @@ Vector supports multiple configuration files, and we can rely on that to ship a config file with the common configuration part in of our YAML / Helm suite, and let users keep their custom config part in a separate file. -We will then mount two `ConfigMap`s into a container, and start Vector in -multiple configuration files mode +We will then [mount][k8s_api_config_map_volume_source] two `ConfigMap`s into a +container, and start Vector in multiple configuration files mode (`vector --config .../common.toml --config .../custom.toml`). #### Vector config file reloads @@ -856,6 +856,7 @@ See [motivation](#motivation). [issue#2225]: https://github.com/timberio/vector/issues/2225 [json file logging driver]: https://docs.docker.com/config/containers/logging/json-file/ [jsonlines]: http://jsonlines.org/ +[k8s_api_config_map_volume_source]: https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#configmapvolumesource-v1-core [k8s_api_host_path_volume_source]: https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#hostpathvolumesource-v1-core [k8s_doc_persistent_volumes]: https://kubernetes.io/docs/concepts/storage/persistent-volumes [k8s_docs_crds]: https://kubernetes.io/docs/tasks/access-kubernetes-api/custom-resources/custom-resource-definitions/ From 39bba3bdcfc391a9cae90af8a86fa1646aa080aa Mon Sep 17 00:00:00 2001 From: MOZGIII Date: Thu, 16 Apr 2020 00:49:49 +0300 Subject: [PATCH 049/118] Unify link notation around k8s_doc Signed-off-by: MOZGIII --- rfcs/2020-04-04-2221-kubernetes-integration.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/rfcs/2020-04-04-2221-kubernetes-integration.md b/rfcs/2020-04-04-2221-kubernetes-integration.md index 1e765e8274b22..4e911fb31dbd5 100644 --- a/rfcs/2020-04-04-2221-kubernetes-integration.md +++ b/rfcs/2020-04-04-2221-kubernetes-integration.md @@ -349,7 +349,7 @@ we'll be using this by default, but we'll also allow configuring this to provide the flexibility users will expect. An alternative to `hostPath` volumes would be a user-provided -[persistent volume][k8s_doc_persistent_volumes] of some kind. The only +[persistent volume][k8s_docs_persistent_volumes] of some kind. The only requirement is that it has to have a `ReadWriteMany` access mode. #### Vector config files @@ -858,9 +858,9 @@ See [motivation](#motivation). [jsonlines]: http://jsonlines.org/ [k8s_api_config_map_volume_source]: https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#configmapvolumesource-v1-core [k8s_api_host_path_volume_source]: https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#hostpathvolumesource-v1-core -[k8s_doc_persistent_volumes]: https://kubernetes.io/docs/concepts/storage/persistent-volumes [k8s_docs_crds]: https://kubernetes.io/docs/tasks/access-kubernetes-api/custom-resources/custom-resource-definitions/ [k8s_docs_operator]: https://kubernetes.io/docs/concepts/extend-kubernetes/operator/ +[k8s_docs_persistent_volumes]: https://kubernetes.io/docs/concepts/storage/persistent-volumes [k8s_log_path_location_docs]: https://kubernetes.io/docs/concepts/cluster-administration/logging/#logging-at-the-node-level [k8s_src_build_container_logs_directory]: https://github.com/kubernetes/kubernetes/blob/31305966789525fca49ec26c289e565467d1f1c4/pkg/kubelet/kuberuntime/helpers.go#L173 [k8s_src_parse_funcs]: https://github.com/kubernetes/kubernetes/blob/e74ad388541b15ae7332abf2e586e2637b55d7a7/pkg/kubelet/kuberuntime/logs/logs.go#L116 From 89218d8fd3c901eb96579eb33f5a882ee720dc62 Mon Sep 17 00:00:00 2001 From: MOZGIII Date: Thu, 16 Apr 2020 19:48:50 +0300 Subject: [PATCH 050/118] Add kustomize guide Signed-off-by: MOZGIII --- .../2020-04-04-2221-kubernetes-integration.md | 25 +++++++++++++++++++ 1 file changed, 25 insertions(+) diff --git a/rfcs/2020-04-04-2221-kubernetes-integration.md b/rfcs/2020-04-04-2221-kubernetes-integration.md index 4e911fb31dbd5..077e0fc3b92e6 100644 --- a/rfcs/2020-04-04-2221-kubernetes-integration.md +++ b/rfcs/2020-04-04-2221-kubernetes-integration.md @@ -147,6 +147,30 @@ TODO: insert diagram vector/vector ``` +#### Install using Kustomize + +1. Install [`kustomize`][kustomize]. + +1. Prepare `kustomization.yaml`. + + Use the same config as in [Kubectl Interface]. + + ```yaml + # kustomization.yaml + namespace: vector + + resources: + - https://packages.timber.io/vector/latest/kubernetes/vector-global.yaml + - https://packages.timber.io/vector/latest/kubernetes/vector-namespaced.yaml + - vector-configmap.yaml + ``` + +1. Deploy Vector! + + ```shell + kustomize build . | kubectl apply -f - + ``` + ## Design considerations ### Minimal supported Kubernetes version @@ -869,6 +893,7 @@ See [motivation](#motivation). [kubectl_rollout_restart]: https://kubernetes.io/docs/reference/generated/kubectl/kubectl-commands#-em-restart-em- [kubernetes version and version skew support policy]: https://kubernetes.io/docs/setup/release/version-skew-policy/ [kubernetes_version_comment]: https://github.com/timberio/vector/pull/2188#discussion_r403120481 +[kustomize]: https://github.com/kubernetes-sigs/kustomize [logdna k8s integration]: https://docs.logdna.com/docs/kubernetes [logdna_daemonset]: https://raw.githubusercontent.com/logdna/logdna-agent/master/logdna-agent-ds.yaml [pr#2134]: https://github.com/timberio/vector/pull/2134 From abbc752747217ba5614969fb2e9f3b2e44ca38cf Mon Sep 17 00:00:00 2001 From: MOZGIII Date: Fri, 17 Apr 2020 04:57:24 +0300 Subject: [PATCH 051/118] Add a section on release process Signed-off-by: MOZGIII --- .../2020-04-04-2221-kubernetes-integration.md | 41 ++++++++++++++++++- 1 file changed, 40 insertions(+), 1 deletion(-) diff --git a/rfcs/2020-04-04-2221-kubernetes-integration.md b/rfcs/2020-04-04-2221-kubernetes-integration.md index 077e0fc3b92e6..5b5336f133607 100644 --- a/rfcs/2020-04-04-2221-kubernetes-integration.md +++ b/rfcs/2020-04-04-2221-kubernetes-integration.md @@ -719,7 +719,45 @@ implementing an orchestrating component (such things are usually called ### Changes to Vector release process -TODO +We need to ship a particular Vector version along with a particular set of k8s +configuration YAML files and a Helm chart. This is so that we can be sure +all our configurations actually tested and known to work for a particular Vector +release. This is very important for maintaining the legacy releases, and for +people to be able to downgrade if needed, which is one of the major properties +for a system-level component like Vector is. + +This means we need to orchestrate the releases of the YAML configs and Helm +Charts together with the Vector releases. + +Naturally, it's easiest to implement if we keep the code for both the YAML +configs and the Helm Chart in our Vector repo. + +The alternative - having either just the Helm Chart or it together with YAML +files in a separate repo - has a benefit of being a smaller footprint to grasp - +i.e. a dedicated repo with just the k8s deployment config would obviously have +smaller code and history - but it would make it significantly more difficult to +correlate histories with Vector mainline, and it's a major downside. For this +reason, using the Vector repo for keeping everything is preferable. + +During the release process, together with shipping the Vector version, we'd +have to also bump the Vector versions at the YAML and Helm Chart configs, and +also bump the version of the Helm Chart as well. We then copy the YAML configs +to the same location where we keep release artifacts (i.e. `.deb`s, `.rpm`s, +etc) for that particular Vector version. We also publish a new Helm Chart +release into our Helm Chart repo. + +While bumping the versions is human work, and is hard to automate - copying the +YAML files and publishing a Helm Chart release is easy, and we should take care +of that. We can also add CI lints to ensure the version of Vector at YAML +file and Helm Chart and the one the Rust code has baked in match at all times. +Ideally, they should be bumped together atomically and never diverge. + +If we need to ship an update to just YAML configs or a new Helm Chart without +changes to the Vector code, as our default strategy we can consider cutting a +patch release of Vector - simply as a way to go through the whole process. +What is bumping Vector version as well, even though there's no practical reason +for that since the code didn't change. This strategy will not only simplify the +process on our end, but will also be very simple to understand for our users. ### Testing @@ -729,6 +767,7 @@ TODO - at CI we test against `minikube` and against all versions from MSKV till the latest k8s - at test harness we run non-ephemeral "real" clusters and test against them (i.e. GCP GKE, AWS EKS, Azure K8s, DO K8s, RedHat OpenShift, Rancher, CoreOS Tekton, etc) - we integrate our unit tests into test harness in such a way that we can run them as correctness tests +- we want to test our deployment configurations - Helm charts, YAML files and etc, in addition to unit tests ## Prior Art From 1bbedd504c48070ad50a2cd86d05286836f867d9 Mon Sep 17 00:00:00 2001 From: MOZGIII Date: Fri, 17 Apr 2020 22:46:23 +0300 Subject: [PATCH 052/118] Clarify the question on metadata fields Signed-off-by: MOZGIII --- rfcs/2020-04-04-2221-kubernetes-integration.md | 1 + 1 file changed, 1 insertion(+) diff --git a/rfcs/2020-04-04-2221-kubernetes-integration.md b/rfcs/2020-04-04-2221-kubernetes-integration.md index 5b5336f133607..076ae450a0874 100644 --- a/rfcs/2020-04-04-2221-kubernetes-integration.md +++ b/rfcs/2020-04-04-2221-kubernetes-integration.md @@ -847,6 +847,7 @@ See [motivation](#motivation). 1. Can we populate file at `terminationMessagePath` with some meaningful information when we exit or crash? 1. Can we allow passing arbitrary fields from the `Pod` object to the event? + Currently we only to pass `pod_id`, pod `annotations` and pod `labels`. ## Plan Of Attack From 4559adda3a9ec4f6237e3f76dca467aa1338c5af Mon Sep 17 00:00:00 2001 From: MOZGIII Date: Fri, 17 Apr 2020 23:18:38 +0300 Subject: [PATCH 053/118] Provided some responses to the outstanding questions Signed-off-by: MOZGIII --- .../2020-04-04-2221-kubernetes-integration.md | 33 ++++++++++++++----- 1 file changed, 25 insertions(+), 8 deletions(-) diff --git a/rfcs/2020-04-04-2221-kubernetes-integration.md b/rfcs/2020-04-04-2221-kubernetes-integration.md index 076ae450a0874..2aebe7fa9bd27 100644 --- a/rfcs/2020-04-04-2221-kubernetes-integration.md +++ b/rfcs/2020-04-04-2221-kubernetes-integration.md @@ -804,25 +804,38 @@ See [motivation](#motivation). [this comment][kubernetes_version_comment].~~ See the [Minimal supported Kubernetes version][anchor_minimal_supported_kubernetes_version] section. -1. What is the best to avoid Vector from ingesting it's own logs? I'm assuming +1. ~~What is the best to avoid Vector from ingesting it's own logs? I'm assuming that my [`kubectl` tutorial](#kubectl-interface) handles this with namespaces? - We'd just need to configure Vector to exclude this namespace? -1. I've seen two different installation strategies. For example, Fluentd offers + We'd just need to configure Vector to exclude this namespace?~~ + See the [Origin filtering][anchor_origin_filtering] section. +1. ~~I've seen two different installation strategies. For example, Fluentd offers a [single daemonset configuration file][fluentd_daemonset] while Fluentbit offers [four separate configuration files][fluentbit_installation] (`service-account.yaml`, `role.yaml`, `role-binding.yaml`, `configmap.yaml`). - Which approach is better? Why are they different? + Which approach is better? Why are they different?~~ + See the + [Strategy on YAML file grouping][anchor_strategy_on_yaml_file_grouping] + section. 1. ~~Should we prefer `kubectl create ...` or `kubectl apply ...`? The examples in the [prior art](#prior-art) section use both.~~ See [Helm vs raw YAML files][anchor_helm_vs_raw_yaml_files] section. -1. From what I understand, Vector requires the Kubernetes `watch` verb in order +1. ~~From what I understand, Vector requires the Kubernetes `watch` verb in order to receive updates to k8s cluster changes. This is required for the `kubernetes_pod_metadata` transform. Yet, Fluentbit [requires the `get`, - `list`, and `watch` verbs][fluentbit_role]. Why don't we require the same? -1. What is `updateStrategy` ... `RollingUpdate`? This is not included in + `list`, and `watch` verbs][fluentbit_role]. Why don't we require the same?~~ + Right, this is a requirement since we're using k8s API. The exact set of + permissions is to be determined at YAML files design stage - after we + complete the implementation. It's really trivial to determine from a set of + API calls used. +1. ~~What is `updateStrategy` ... `RollingUpdate`? This is not included in [our daemonset][vector_daemonset] or in [any of Fluentbit's config files][fluentbit_installation]. But it is included in both [Fluentd's - daemonset][fluentd_daemonset] and [LogDNA's daemonset][logdna_daemonset]. + daemonset][fluentd_daemonset] and [LogDNA's daemonset][logdna_daemonset].~~ + `RollingUpdate` is the default value for + [`updateStrategy`][k8s_api_daemon_set_update_strategy] of the + [`DaemonSet`][daemonset]. Alternative is `OnDelete`. `RollingUpdate` makes + more sense for us to use as the default, more info on this is available at + the [docs][k8s_docs_rolling_update]. 1. I've also noticed `resources` declarations in some of these config files. For example [LogDNA's daemonset][logdna_daemonset]. I assume this is limiting resources. Do we want to consider this? @@ -882,6 +895,8 @@ See [motivation](#motivation). [anchor_file_locations]: #file-locations [anchor_helm_vs_raw_yaml_files]: #helm-vs-raw-yaml-files [anchor_minimal_supported_kubernetes_version]: #minimal-supported-kubernetes-version +[anchor_origin_filtering]: #origin-filtering +[anchor_strategy_on_yaml_file_grouping]: #strategy-on-yaml-file-grouping [awesome operators list]: https://github.com/operator-framework/awesome-operators [bonzai logging operator]: https://github.com/banzaicloud/logging-operator [chartmuseum]: https://chartmuseum.com/ @@ -921,10 +936,12 @@ See [motivation](#motivation). [json file logging driver]: https://docs.docker.com/config/containers/logging/json-file/ [jsonlines]: http://jsonlines.org/ [k8s_api_config_map_volume_source]: https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#configmapvolumesource-v1-core +[k8s_api_daemon_set_update_strategy]: https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#daemonsetupdatestrategy-v1-apps [k8s_api_host_path_volume_source]: https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#hostpathvolumesource-v1-core [k8s_docs_crds]: https://kubernetes.io/docs/tasks/access-kubernetes-api/custom-resources/custom-resource-definitions/ [k8s_docs_operator]: https://kubernetes.io/docs/concepts/extend-kubernetes/operator/ [k8s_docs_persistent_volumes]: https://kubernetes.io/docs/concepts/storage/persistent-volumes +[k8s_docs_rolling_update]: https://kubernetes.io/docs/tasks/manage-daemon/update-daemon-set/ [k8s_log_path_location_docs]: https://kubernetes.io/docs/concepts/cluster-administration/logging/#logging-at-the-node-level [k8s_src_build_container_logs_directory]: https://github.com/kubernetes/kubernetes/blob/31305966789525fca49ec26c289e565467d1f1c4/pkg/kubelet/kuberuntime/helpers.go#L173 [k8s_src_parse_funcs]: https://github.com/kubernetes/kubernetes/blob/e74ad388541b15ae7332abf2e586e2637b55d7a7/pkg/kubelet/kuberuntime/logs/logs.go#L116 From 50d8154247d2413b2289f35a7f512b182401027b Mon Sep 17 00:00:00 2001 From: MOZGIII Date: Fri, 17 Apr 2020 23:25:04 +0300 Subject: [PATCH 054/118] Add a dummy section on handling non-log k8s events Signed-off-by: MOZGIII --- rfcs/2020-04-04-2221-kubernetes-integration.md | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/rfcs/2020-04-04-2221-kubernetes-integration.md b/rfcs/2020-04-04-2221-kubernetes-integration.md index 2aebe7fa9bd27..7ccf8be727bb3 100644 --- a/rfcs/2020-04-04-2221-kubernetes-integration.md +++ b/rfcs/2020-04-04-2221-kubernetes-integration.md @@ -769,6 +769,16 @@ TODO - we integrate our unit tests into test harness in such a way that we can run them as correctness tests - we want to test our deployment configurations - Helm charts, YAML files and etc, in addition to unit tests +### Other data gathering + +> This section is on gathering data other than container logs. + +TODO + +- we can expose watch events that we get from the k8s API as Vector events +- we can grab and process prometheus metrics from the pods that expose them +- we can gather node-level logs, useful for cluster operators + ## Prior Art 1. [Filebeat k8s integration] From 28af990fbf330d1a1e08742e0558f43c542bad16 Mon Sep 17 00:00:00 2001 From: MOZGIII Date: Sat, 18 Apr 2020 01:50:34 +0300 Subject: [PATCH 055/118] Partially fill the testing section Signed-off-by: MOZGIII --- .../2020-04-04-2221-kubernetes-integration.md | 76 +++++++++++++++++++ 1 file changed, 76 insertions(+) diff --git a/rfcs/2020-04-04-2221-kubernetes-integration.md b/rfcs/2020-04-04-2221-kubernetes-integration.md index 7ccf8be727bb3..cb942192fc341 100644 --- a/rfcs/2020-04-04-2221-kubernetes-integration.md +++ b/rfcs/2020-04-04-2221-kubernetes-integration.md @@ -761,6 +761,81 @@ process on our end, but will also be very simple to understand for our users. ### Testing +We want to implement a comprehensive test system to maintain our k8s +integration. + +As usual, we need a way to do unit tests to validate isolated individual +components during development. We also need integration tests, whose purpose is +to validate that, as a whole, Vector properly functions when deployed into a +real Kubernetes cluster. + +#### Unit tests + +To be able to utilize unit tests, we have to build the code from the modular, +composable, and loosely-coupled components. These requirements often allow unit +testing individual components easily, thus significantly improving the +confidence in the overall implementation. + +If we have to, we can rely on mocks to test all the edge cases of the individual +components. + +#### Integration tests + +Integration tests are performed against the real k8s clusters. + +We have a matrix of concerns, we'd like to ensure Vectors works properly with. + +- Kubernetes Versions + - Minimal Supported Kubernetes Version + - Latest version + - All versions in between the latest and MSKV +- Managed Kubernetes offers + - [Amazon Elastic Kubernetes Service](https://aws.amazon.com/ru/eks/) + - [Google Kubernetes Engine](https://cloud.google.com/kubernetes-engine/) + - [Azure Kubernetes Service](https://azure.microsoft.com/en-us/services/kubernetes-service/) + - [DigitalOcean Kubernetes](https://www.digitalocean.com/products/kubernetes/) + - [Platform9 Managed Kubernetes](https://platform9.com/managed-kubernetes/) + - [Red Hat OpenShift Container Platform](https://www.openshift.com/products/container-platform) + - [IBM Cloud Kubernetes Service](https://www.ibm.com/cloud/container-service/) + - [Alibaba Cloud Container Service for Kubernetes](https://www.alibabacloud.com/product/kubernetes) + - [Oracle Container Engine for Kubernetes](https://www.oracle.com/cloud/compute/container-engine-kubernetes.html) + - [OVH Managed Kubernetes Service](https://www.ovhcloud.com/en-gb/public-cloud/kubernetes/) + - [Rackspace Kubernetes-as-a-Service](https://www.rackspace.com/managed-kubernetes) + - [Linode Kubernetes Engine](https://www.linode.com/products/kubernetes/) + - [Yandex Managed Service for Kubernetes](https://cloud.yandex.com/services/managed-kubernetes) + - [Tencent Kubernetes Engine](https://intl.cloud.tencent.com/product/tke) +- Kubernetes Distributions (for on-premise deployment) + - Production-grade + - bare `kubeadm` + - [OKD](https://www.okd.io/) (deploys OpenShift Origin) + - [Rancher Kubernetes Engine](https://rancher.com/products/rke/) + - [Metal3](http://metal3.io/) + - [Project Atomic Kubernetes](https://www.projectatomic.io/docs/kubernetes/) + - [Canonical Charmed Kubernetes](https://ubuntu.com/kubernetes/install#multi-node) + - [Kubernetes on DC/OS](https://github.com/mesosphere/dcos-kubernetes-quickstart) + - For small/dev deployments + - [Minikube](https://kubernetes.io/ru/docs/setup/learning-environment/minikube/) + - [MicroK8s](https://microk8s.io/) + - [Docker Desktop Kubernetes](https://www.docker.com/products/docker-desktop) + - [kind](https://kubernetes.io/docs/setup/learning-environment/kind/) + - [minishift](https://www.okd.io/minishift/) +- [Container Runtimes (CRI impls)](https://kubernetes.io/docs/setup/production-environment/container-runtimes/) + - [Docker](https://www.docker.com/) (Kubernetes still has some "special" + integration with Docker; these days, "using Docker" technically means using + `runc` via `containerd` via `docker-engine`) + - OCI (via [CRI-O](https://cri-o.io/) or [containerd](https://containerd.io/)) + - [runc](https://github.com/opencontainers/runc) + - [runhcs](https://github.com/Microsoft/hcsshim/tree/master/cmd/runhcs) - + see more [here][windows_in_kubernetes] + - [Kata Containers](https://github.com/kata-containers/runtime) + - [gVisor](https://github.com/google/gvisor) + - [Firecracker](https://github.com/firecracker-microvm/firecracker-containerd) + +We can't possibly expand this matrix densely due to the enormous amount of +effort required to maintain the infrastructure and the costs. It may also be +inefficient to test everything everywhere, because a lot of configurations +don't have any significant or meaningful differences among each other. + TODO - integration tests are cluster-agnostic @@ -968,3 +1043,4 @@ See [motivation](#motivation). [sidecar_container]: https://github.com/kubernetes/enhancements/blob/a8262db2ce38b2ec7941bdb6810a8d81c5141447/keps/sig-apps/sidecarcontainers.md [the chart repository guide]: https://helm.sh/docs/topics/chart_repository/ [vector_daemonset]: 2020-04-04-2221-kubernetes-integration/vector-daemonset.yaml +[windows_in_kubernetes]: https://kubernetes.io/docs/setup/production-environment/windows/intro-windows-in-kubernetes/ From 1b8d27549e9cae7dd742ab8e7b12fd53ecdf4fdc Mon Sep 17 00:00:00 2001 From: MOZGIII Date: Sat, 18 Apr 2020 02:28:32 +0300 Subject: [PATCH 056/118] Add a section on resource limits Signed-off-by: MOZGIII --- .../2020-04-04-2221-kubernetes-integration.md | 38 +++++++++++++++++++ 1 file changed, 38 insertions(+) diff --git a/rfcs/2020-04-04-2221-kubernetes-integration.md b/rfcs/2020-04-04-2221-kubernetes-integration.md index cb942192fc341..6cdda4fa7f025 100644 --- a/rfcs/2020-04-04-2221-kubernetes-integration.md +++ b/rfcs/2020-04-04-2221-kubernetes-integration.md @@ -457,6 +457,42 @@ they'll be able to split the files if required. They then maintain their configuration on their own, and we assume they're capable and know what they're doing. +#### Resource Limits + +> This section is on [`Container`][k8s_api_container] [`resources`][k8s_api_resource_requirements] clause. + +Setting resource requirements for Vector container is very important to enable +Kubernetes to properly manage node resources. + +Optimal configuration is very case-specific, and while we have some +understanding of Vector performance characteristics, we can't account for the +environment Vector will run at. This means it's nearly impossible for us to come +up with sane defaults, and we have to rely on users properly configuring the +resources for their use case. + +However, it doesn't mean we should ignore this concern. Instead, we must share +our understanding of Vector runtime properties and data, and provide as much +assistance to the users trying to determine the resource requirements as +possible. + +We should provide the documentation explaining the inner architecture of Vector +and our considerations on how to estimate memory / CPU usage. + +At to our configuration, we'll omit the `resources` from the YAML files, and +make them configurable at Helm Charts. + +##### Vector Runtime Properties Bulletin + +It would be great to publish a regularly updated bulletin on Vector runtime +properties (i.e. how much memory and CPU Vector can utilize and under what +conditions). That would be a real killer feature for everyone that wants to +deploy Vector under load, not just in the context of Kubernetes integration. +Though it's a lot of hard work to determine these properties, people with large +deployments tend to do this anyway to gain confidence in their setup. We could +exchange this data with our partners and derive an even more realistic profile +for Vector's runtime properties, based on real data from the multiple data sets. +This worth a separate dedicated RFC though. + ### Annotating events with metadata from Kubernetes Kubernetes has a lot of metadata that can be associated with the logs, and most @@ -1021,8 +1057,10 @@ See [motivation](#motivation). [json file logging driver]: https://docs.docker.com/config/containers/logging/json-file/ [jsonlines]: http://jsonlines.org/ [k8s_api_config_map_volume_source]: https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#configmapvolumesource-v1-core +[k8s_api_container]: https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#container-v1-core [k8s_api_daemon_set_update_strategy]: https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#daemonsetupdatestrategy-v1-apps [k8s_api_host_path_volume_source]: https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#hostpathvolumesource-v1-core +[k8s_api_resource_requirements]: https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#resourcerequirements-v1-core [k8s_docs_crds]: https://kubernetes.io/docs/tasks/access-kubernetes-api/custom-resources/custom-resource-definitions/ [k8s_docs_operator]: https://kubernetes.io/docs/concepts/extend-kubernetes/operator/ [k8s_docs_persistent_volumes]: https://kubernetes.io/docs/concepts/storage/persistent-volumes From ab7be282b7136bf5bdc772117d929b9ef3f3f7d3 Mon Sep 17 00:00:00 2001 From: MOZGIII Date: Sat, 18 Apr 2020 02:42:20 +0300 Subject: [PATCH 057/118] Provide a response to resource limits question Signed-off-by: MOZGIII --- rfcs/2020-04-04-2221-kubernetes-integration.md | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/rfcs/2020-04-04-2221-kubernetes-integration.md b/rfcs/2020-04-04-2221-kubernetes-integration.md index 6cdda4fa7f025..4990de65eea03 100644 --- a/rfcs/2020-04-04-2221-kubernetes-integration.md +++ b/rfcs/2020-04-04-2221-kubernetes-integration.md @@ -957,9 +957,10 @@ See [motivation](#motivation). [`DaemonSet`][daemonset]. Alternative is `OnDelete`. `RollingUpdate` makes more sense for us to use as the default, more info on this is available at the [docs][k8s_docs_rolling_update]. -1. I've also noticed `resources` declarations in some of these config files. +1. ~~I've also noticed `resources` declarations in some of these config files. For example [LogDNA's daemonset][logdna_daemonset]. I assume this is limiting - resources. Do we want to consider this? + resources. Do we want to consider this?~~ + See the [Resource Limits][anchor_resource_limits] section of this RFC. 1. What the hell is going on with [Honeycomb's integration strategy][honeycomb integration]? :) It seems like the whole "Heapster" pipeline is specifically for system events, but Heapster is deprecated? @@ -1017,6 +1018,7 @@ See [motivation](#motivation). [anchor_helm_vs_raw_yaml_files]: #helm-vs-raw-yaml-files [anchor_minimal_supported_kubernetes_version]: #minimal-supported-kubernetes-version [anchor_origin_filtering]: #origin-filtering +[anchor_resource_limits]: #resource-limits [anchor_strategy_on_yaml_file_grouping]: #strategy-on-yaml-file-grouping [awesome operators list]: https://github.com/operator-framework/awesome-operators [bonzai logging operator]: https://github.com/banzaicloud/logging-operator From e49eda7dad77b0631ca7cda3947a21080bb689fa Mon Sep 17 00:00:00 2001 From: MOZGIII Date: Sat, 18 Apr 2020 02:50:34 +0300 Subject: [PATCH 058/118] A response on Honeycomb and a note on Heapster Signed-off-by: MOZGIII --- rfcs/2020-04-04-2221-kubernetes-integration.md | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/rfcs/2020-04-04-2221-kubernetes-integration.md b/rfcs/2020-04-04-2221-kubernetes-integration.md index 4990de65eea03..d22f7868dc43b 100644 --- a/rfcs/2020-04-04-2221-kubernetes-integration.md +++ b/rfcs/2020-04-04-2221-kubernetes-integration.md @@ -961,10 +961,14 @@ See [motivation](#motivation). For example [LogDNA's daemonset][logdna_daemonset]. I assume this is limiting resources. Do we want to consider this?~~ See the [Resource Limits][anchor_resource_limits] section of this RFC. -1. What the hell is going on with [Honeycomb's integration +1. ~~What the hell is going on with [Honeycomb's integration strategy][honeycomb integration]? :) It seems like the whole "Heapster" pipeline is specifically for system events, but Heapster is deprecated? - This leads me to my next question... + This leads me to my next question...~~ + Heapster is indeed outdated, as well as Honeycomb integration guide. + Kubernetes now solves it's internal autoscaling pipelines needs with + [`metrics-server`][metrics-server] - a similar idea yet much more lightweight + implementation. 1. How are we collecting Kubernetes system events? Is that outside of the scope of this RFC? And why does this take an entirely different path? (ref [issue#1293]) @@ -1078,6 +1082,7 @@ See [motivation](#motivation). [kustomize]: https://github.com/kubernetes-sigs/kustomize [logdna k8s integration]: https://docs.logdna.com/docs/kubernetes [logdna_daemonset]: https://raw.githubusercontent.com/logdna/logdna-agent/master/logdna-agent-ds.yaml +[metrics-server]: https://github.com/kubernetes-sigs/metrics-server [pr#2134]: https://github.com/timberio/vector/pull/2134 [pr#2188]: https://github.com/timberio/vector/pull/2188 [sidecar_container]: https://github.com/kubernetes/enhancements/blob/a8262db2ce38b2ec7941bdb6810a8d81c5141447/keps/sig-apps/sidecarcontainers.md From 7a59cb5dbe0cd668de963194a50d4533a4cf9623 Mon Sep 17 00:00:00 2001 From: MOZGIII Date: Sat, 18 Apr 2020 07:12:24 +0300 Subject: [PATCH 059/118] Correct helm instructions Signed-off-by: MOZGIII --- rfcs/2020-04-04-2221-kubernetes-integration.md | 15 ++++++++++----- 1 file changed, 10 insertions(+), 5 deletions(-) diff --git a/rfcs/2020-04-04-2221-kubernetes-integration.md b/rfcs/2020-04-04-2221-kubernetes-integration.md index d22f7868dc43b..8d2804b068919 100644 --- a/rfcs/2020-04-04-2221-kubernetes-integration.md +++ b/rfcs/2020-04-04-2221-kubernetes-integration.md @@ -136,14 +136,19 @@ TODO: insert diagram kubectl create namespace vector # Helm v3 - helm install \ - cert-manager vector/vector \ - --namespace vector + helm upgrade \ + --install \ + --namespace vector \ + --values vector-values.yaml \ + vector \ + vector/vector # Helm v2 - helm install \ - --name vector \ + helm upgrade + --install \ --namespace vector \ + --values vector-values.yaml \ + --name vector \ vector/vector ``` From c357e7a36eaaa46d8b8badbf87eac6c66b0ceb34 Mon Sep 17 00:00:00 2001 From: MOZGIII Date: Sat, 18 Apr 2020 20:58:42 +0300 Subject: [PATCH 060/118] Add CNCF link Signed-off-by: MOZGIII --- rfcs/2020-04-04-2221-kubernetes-integration.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/rfcs/2020-04-04-2221-kubernetes-integration.md b/rfcs/2020-04-04-2221-kubernetes-integration.md index 8d2804b068919..7ade8e6b189df 100644 --- a/rfcs/2020-04-04-2221-kubernetes-integration.md +++ b/rfcs/2020-04-04-2221-kubernetes-integration.md @@ -830,7 +830,7 @@ We have a matrix of concerns, we'd like to ensure Vectors works properly with. - Minimal Supported Kubernetes Version - Latest version - All versions in between the latest and MSKV -- Managed Kubernetes offers +- Managed Kubernetes offers (see also [CNCF Certified Kubernetes][cncf_software_conformance]) - [Amazon Elastic Kubernetes Service](https://aws.amazon.com/ru/eks/) - [Google Kubernetes Engine](https://cloud.google.com/kubernetes-engine/) - [Azure Kubernetes Service](https://azure.microsoft.com/en-us/services/kubernetes-service/) @@ -1032,6 +1032,7 @@ See [motivation](#motivation). [awesome operators list]: https://github.com/operator-framework/awesome-operators [bonzai logging operator]: https://github.com/banzaicloud/logging-operator [chartmuseum]: https://chartmuseum.com/ +[cncf_software_conformance]: https://www.cncf.io/certification/software-conformance/ [configmap_updates]: https://kubernetes.io/docs/tasks/configure-pod-container/configure-pod-configmap/#mounted-configmaps-are-updated-automatically [container_runtimes]: https://kubernetes.io/docs/setup/production-environment/container-runtimes/ [cri_log_format]: https://github.com/kubernetes/community/blob/ee2abbf9dbfa4523b414f99a04ddc97bd38c74b2/contributors/design-proposals/node/kubelet-cri-logging.md From d2233bfc2bc63d6aea0ae0cf34085b15b38d88b0 Mon Sep 17 00:00:00 2001 From: MOZGIII Date: Sat, 18 Apr 2020 21:18:57 +0300 Subject: [PATCH 061/118] Write more of the integration testing section Signed-off-by: MOZGIII --- rfcs/2020-04-04-2221-kubernetes-integration.md | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+) diff --git a/rfcs/2020-04-04-2221-kubernetes-integration.md b/rfcs/2020-04-04-2221-kubernetes-integration.md index 7ade8e6b189df..13ac185afeaa6 100644 --- a/rfcs/2020-04-04-2221-kubernetes-integration.md +++ b/rfcs/2020-04-04-2221-kubernetes-integration.md @@ -877,6 +877,24 @@ effort required to maintain the infrastructure and the costs. It may also be inefficient to test everything everywhere, because a lot of configurations don't have any significant or meaningful differences among each other. +Testing various managed offers and distributions is not as important as testing +different Kubernetes versions and container runtimes. + +It's probably a good idea to also test against the most famous managed +Kubernetes provides: AWS, GCP and Azure. Just because our users are most likely +to be on one of those. + +So, the goal for integration tests is to somehow test Vector with Kubernetes +versions from MSKV to latest, all the container runtimes listed above and, +additionally, on AWS, GCP and Azure. + +We can combine our requirements with offers from cloud providers. For instance, +`runhcs` (and Windows containers in general) are supported at Azure. Although, +whether we want to address Windows containers support is a different topic, we +still should plan ahead. + +We'll need to come up with an optimal configuration. + TODO - integration tests are cluster-agnostic From b5195d670b4275e5c3ad37a7e8a52bf51edaeb31 Mon Sep 17 00:00:00 2001 From: MOZGIII Date: Tue, 21 Apr 2020 04:04:21 +0300 Subject: [PATCH 062/118] Add new sections on tests Signed-off-by: MOZGIII --- .../2020-04-04-2221-kubernetes-integration.md | 152 ++++++++++++++++++ 1 file changed, 152 insertions(+) diff --git a/rfcs/2020-04-04-2221-kubernetes-integration.md b/rfcs/2020-04-04-2221-kubernetes-integration.md index 13ac185afeaa6..bd5d9da3b16cc 100644 --- a/rfcs/2020-04-04-2221-kubernetes-integration.md +++ b/rfcs/2020-04-04-2221-kubernetes-integration.md @@ -824,6 +824,8 @@ components. Integration tests are performed against the real k8s clusters. +##### Test targets + We have a matrix of concerns, we'd like to ensure Vectors works properly with. - Kubernetes Versions @@ -895,6 +897,154 @@ still should plan ahead. We'll need to come up with an optimal configuration. +##### Where to keep and how to manage integration infrastructure config + +This is a very controversial question. + +Currently we have: + +- the Vector repo (with the Github Actions based CI flow) +- the test harness (also integrated with CI, but this is it's own thing) + +We don't necessarily have to choose one of those places: we can add a new +location if it's justified enough. + +Let's outline the requirements on the properties of the solution: + +- We want to have the ability to run the checks from the Vector repo CI, i.e. + per commit, per PR, per tag etc. This might not be immediately utilized, but + we just want to have that option. + +- We want to consolidate the management of the _cloud_ resources we allocate and + pay for Kubernetes test infrastructure in a single place. This is to avoid + spreading the responsibility, duplicating the logic, reusing allocated + resources for all our testing needs, and simplify accounting and make the + configuration management more flexible. + We can, for example, have a shared dependency for Vector CI flow, Test Harness + invocations, locally run tests - and whatever else we have - to rely on. + +- We want our test infrastructure easily available for the trusted developers + (Vector core team) to run experiments and tests against locally. This doesn't + mean we want to automate this and include running tests locally against our + whole k8s test infrastructure - but the ability to do it with little effort is + very important: even if we employ super-reliable CI automation, the turnaround + time of going through it is way higher than conducting an experiment locally. + Locally means using local code tree and binaries - the infrastructure itself + is still in the cloud. + +- Ideally, we want the test system to be available not just to the Vector core + team, but to the whole open-source community. Of course, we don't want to give + unrestricted access to _our_ cloud testing infrastructure - but the solution + we employ should allow third-parties to bring their own resources. Things that + are local in essence (like `minikube`) should just work. There shouldn't be a + situation where one can't run tests in `minikube` because cloud parts aren't + available. We already have a similar constraints at the Vector Test Harness. + +- We need the required efforts to managements the solution to be low, and the + price to be relatively small. This means that the solution has to be simple. + +- We want to expose the same kind of interface to each of the clusters, so the + cluster we run the tests is easily interchangeable. + A kubectl config file is a good option, since it encapsulates all the + necessary information tp connect to a cluster. + +Based on all of the above, it makes sense to split the infrastructure into two +parts. + +- Cloud infrastructure that we manage and pay for. + + We will create a dedicated public repo with [Terraform] configs to setup a + long-running Kubernetes test infrastructure. + The goal here is to make the real, live cloud environments available for + people and automation to work with. + +- Self-hosted infrastructure that we maintain configs for. + + This is what keep so that it's easy to run the a self-hosted cluster. Most + likely locally - for things like `minikube`, but not limited to. The focus + here is lock particular _versions_ and _configuration_ of the tooling, so it's + easy to run tests against. Potentially even having multiple versions of the + same tool, for instance, when you need to compare `minikube` `1.9.2` and + `1.8.2`. + The goal here is to address the problem of configuring the self hosted cluster + management tools once and for all, and share those configurations. For people + it has the benefit of enabling them to spend time on soling the problem (or + doing whatever they need to do with k8s) rather than spending time on + configuration. For automation flows - it'll make it really simple to reference + a particular self-hosted configuration - and offload the complexity of + preparing it. + + This one we'll have to figure out, but most likely we'll create a dedicated + repo per tool, each with different rules - but with a single interface. + +The interface (and the goal) those repos is to provide kubectl-compatible config +files, enabling access to clusters where we can deploy Vector to and conduct +some tests (and, in general, _other arbitrary activity_). + +##### What to assert/verify in integration tests + +We can recognize three typical categories of integration tests that are relevant +to the Kubernetes integration: correctness, performance and reliability. +In fact, this is actually how we split things at the Vector Test Harness +already. + +It is important that with Kubernetes we don't only have to test that Vector +itself perform correctly, but also that our YAML configs and Helm Chart +templates are sane and work properly. So in a sense, we still have the same +test categories, but the scope is broader than just testing Vector binary. We +want to test the whole integration. + +Ideally we want to test everything: the correctness, performance and +reliability. Correctness tests are relatively easy, however, it's not yet clear +how to orchestrate the performance and reliability tests. Measuring performance +in clusters is quite difficult and requires insight thought to make it right. +For example, we have to consider and control a lot more variables of the +environment - like CNI driver, underlying network topology and so on - to +understand the conditions we're testing. Reliability tests also require +more careful designing the test environment. +For this reason, the initial Kubernetes integration only focuses on correctness +tests. Once we get som experience with correctness test we can expand our test +suite with tests from other categories. + +It is important that we do actually test correctness on all the configurations - +see this [comment][why_so_much_configurations] as an example. Kubernetes is has +a lot of LOC, is very complex and properly supporting it is quite a challenge. + +The exact design of the tests is an implementation detail, so it's not specified +in this RFC, but the suggested approach, as a starting point, could be to deploy +Vector using our documented installation methods, then run some log-generating +workload and then run assertions on the collected logs. + +The things we'd generally want to ensure work properly include (but are not +limited to): + +- basic log collection and parsing +- log message filtering (both by file paths and by metadata) +- log events enhancement with metadata +- partial log events merging + +We want the assertions and tests to be cluster-agnostic, so that they work with +any supplied kubectl config. + +#### Existing k8s tests + +We already have k8s integration tests implemented in Rust in the Vector repo. +Currently, they're being run as part of the `cd tests; make tests`. They +assert that Vector code works properly by deploying Vector plus some test +log producers and asserting that Vector produced the expected output. This is +very elegant solution. +However, these tests are really more like unit tests - in a sense that they +completely ignore the YAMLs and Helm Charts and +use their own test configs. While they do a good job in what they're built for - +we probably shouldn't really consider them integration tests in a broad sense. + +It was discussed that we'd want to reuse them as our integration tests, however, +for the reasons above I don't think it's a good idea. At least as they're now. +We can decouple the deployment of Vector from the deployment of test containers +and assertions - then we use just the second half with Vector deployed via YAMLs +and/or Helm Charts. For now, we should probably leave them as is, maintain them, +but hold the adoption as integration tests. + TODO - integration tests are cluster-agnostic @@ -1110,6 +1260,8 @@ See [motivation](#motivation). [pr#2134]: https://github.com/timberio/vector/pull/2134 [pr#2188]: https://github.com/timberio/vector/pull/2188 [sidecar_container]: https://github.com/kubernetes/enhancements/blob/a8262db2ce38b2ec7941bdb6810a8d81c5141447/keps/sig-apps/sidecarcontainers.md +[terraform]: https://www.terraform.io/ [the chart repository guide]: https://helm.sh/docs/topics/chart_repository/ [vector_daemonset]: 2020-04-04-2221-kubernetes-integration/vector-daemonset.yaml +[why_so_much_configurations]: https://github.com/timberio/vector/pull/2134/files#r401634895 [windows_in_kubernetes]: https://kubernetes.io/docs/setup/production-environment/windows/intro-windows-in-kubernetes/ From cb5a105c8fa70c9ff4505c0f24ffd0c88c56fe01 Mon Sep 17 00:00:00 2001 From: MOZGIII Date: Tue, 21 Apr 2020 04:05:00 +0300 Subject: [PATCH 063/118] Remove TODO from the tests section, as it's now fully resolved Signed-off-by: MOZGIII --- rfcs/2020-04-04-2221-kubernetes-integration.md | 8 -------- 1 file changed, 8 deletions(-) diff --git a/rfcs/2020-04-04-2221-kubernetes-integration.md b/rfcs/2020-04-04-2221-kubernetes-integration.md index bd5d9da3b16cc..a413b4223e599 100644 --- a/rfcs/2020-04-04-2221-kubernetes-integration.md +++ b/rfcs/2020-04-04-2221-kubernetes-integration.md @@ -1045,14 +1045,6 @@ and assertions - then we use just the second half with Vector deployed via YAMLs and/or Helm Charts. For now, we should probably leave them as is, maintain them, but hold the adoption as integration tests. -TODO - -- integration tests are cluster-agnostic -- at CI we test against `minikube` and against all versions from MSKV till the latest k8s -- at test harness we run non-ephemeral "real" clusters and test against them (i.e. GCP GKE, AWS EKS, Azure K8s, DO K8s, RedHat OpenShift, Rancher, CoreOS Tekton, etc) -- we integrate our unit tests into test harness in such a way that we can run them as correctness tests -- we want to test our deployment configurations - Helm charts, YAML files and etc, in addition to unit tests - ### Other data gathering > This section is on gathering data other than container logs. From f8237299649e044709ba070974517bb6db8e6447 Mon Sep 17 00:00:00 2001 From: MOZGIII Date: Tue, 21 Apr 2020 04:20:27 +0300 Subject: [PATCH 064/118] Adjust the guide annotation Signed-off-by: MOZGIII --- rfcs/2020-04-04-2221-kubernetes-integration.md | 17 ++++++++--------- 1 file changed, 8 insertions(+), 9 deletions(-) diff --git a/rfcs/2020-04-04-2221-kubernetes-integration.md b/rfcs/2020-04-04-2221-kubernetes-integration.md index a413b4223e599..c264e02e59f51 100644 --- a/rfcs/2020-04-04-2221-kubernetes-integration.md +++ b/rfcs/2020-04-04-2221-kubernetes-integration.md @@ -45,15 +45,14 @@ Kubernetes logs and metrics to any destination you please. #### How This Guide Works -Our recommended strategy deploys Vector as a Kubernetes [DaemonSet]. This is -the most efficient means of collecting Kubernetes observability data since -Vector is guaranteed to deploy _once_ on each of your Nodes. In addition, -we'll use the [`kubernetes_pod_metadata` transform][kubernetes_pod_metadata_transform] -to enrich your logs with the Kubernetes context. This transform interacts with -the Kubernetes watch API to collect cluster metadata and update in real-time -when things change. The following diagram demonstrates how this works: - -TODO: insert diagram +Our recommended strategy deploys Vector as a Kubernetes +[`DaemonSet`][daemonset]. Vector is reading the logs files directly from the +file system, so to collect the logs from all the `Pod`s it has to be deployed +on every `Node` in your cluster. + +The following diagram demonstrates how this works: + +TODO: add deployment topology diagram here. ### What We'll Accomplish From 30321b1f050975f08ea245d124a8fd56d92c0311 Mon Sep 17 00:00:00 2001 From: MOZGIII Date: Tue, 21 Apr 2020 04:28:18 +0300 Subject: [PATCH 065/118] Reordered section for easier readability Signed-off-by: MOZGIII --- .../2020-04-04-2221-kubernetes-integration.md | 46 +++++++++---------- 1 file changed, 23 insertions(+), 23 deletions(-) diff --git a/rfcs/2020-04-04-2221-kubernetes-integration.md b/rfcs/2020-04-04-2221-kubernetes-integration.md index c264e02e59f51..51aaf4b1fe545 100644 --- a/rfcs/2020-04-04-2221-kubernetes-integration.md +++ b/rfcs/2020-04-04-2221-kubernetes-integration.md @@ -219,29 +219,6 @@ to the [Kubernetes version and version skew support policy], only versions Considering all of the above, we assign **1.14** as the initial MSKV. -### Helm vs raw YAML files - -We consider both raw YAML files and Helm Chart officially supported installation -methods. - -With Helm, people usually use the Chart we provide, and tweak it to their needs -via variables we expose as the chart configuration. This means we can offer a -lot of customization, however, in the end, we're in charge of generating the -YAML configuration that will k8s will run from our templates. -This means that, while it is very straightforward for users, we have to keep in -mind the compatibility concerns when we update our Helm Chart. -We should provide a lot of flexibility in our Helm Charts, but also have sane -defaults that would be work for the majority of users. - -With raw YAML files, they have to be usable out of the box, but we shouldn't -expect users to use them as-is. People would often maintain their own "forks" of -those, tailored to their use case. We shouldn't overcomplicate our recommended -configuration, but we shouldn't oversimplify it either. It has to be -production-ready. But it also has to be portable, in the sense that it should -work without tweaking with as much cluster setups as possible. -We should support both `kubectl create` and `kubectl apply` flows. -`kubectl apply` is generally more limiting than `kubectl create`. - ### Reading container logs #### Kubernetes logging architecture @@ -303,6 +280,29 @@ following formats: We have to support both formats. +### Helm vs raw YAML files + +We consider both raw YAML files and Helm Chart officially supported installation +methods. + +With Helm, people usually use the Chart we provide, and tweak it to their needs +via variables we expose as the chart configuration. This means we can offer a +lot of customization, however, in the end, we're in charge of generating the +YAML configuration that will k8s will run from our templates. +This means that, while it is very straightforward for users, we have to keep in +mind the compatibility concerns when we update our Helm Chart. +We should provide a lot of flexibility in our Helm Charts, but also have sane +defaults that would be work for the majority of users. + +With raw YAML files, they have to be usable out of the box, but we shouldn't +expect users to use them as-is. People would often maintain their own "forks" of +those, tailored to their use case. We shouldn't overcomplicate our recommended +configuration, but we shouldn't oversimplify it either. It has to be +production-ready. But it also has to be portable, in the sense that it should +work without tweaking with as much cluster setups as possible. +We should support both `kubectl create` and `kubectl apply` flows. +`kubectl apply` is generally more limiting than `kubectl create`. + ### Helm Chart Repository We should not just maintain a Helm Chart, we also should offer Helm repo to make From 3d007bd06836bb56ecfeb1bc404d063ae27319ee Mon Sep 17 00:00:00 2001 From: MOZGIII Date: Tue, 21 Apr 2020 04:34:58 +0300 Subject: [PATCH 066/118] Add a note on sidecar deploymens Signed-off-by: MOZGIII --- rfcs/2020-04-04-2221-kubernetes-integration.md | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/rfcs/2020-04-04-2221-kubernetes-integration.md b/rfcs/2020-04-04-2221-kubernetes-integration.md index 51aaf4b1fe545..525a613424bc1 100644 --- a/rfcs/2020-04-04-2221-kubernetes-integration.md +++ b/rfcs/2020-04-04-2221-kubernetes-integration.md @@ -344,6 +344,11 @@ to automatically inject Vector `Container` into `Pod`s (via admission controller), but that doesn't make a lot of sense for us to work on, since [`DaemonSet`][daemonset] works for most of the use cases already. +Note that [`DaemonSet`][daemonset] deployment does require special support at +Vector code (a dedicated `kubernetes` source), while a perfectly valid sidecar +configuration can be implemented with just a simple `file` source. +This is another reason why we don't pay as much attention to sidecar model. + ### Deployment configuration It is important that provide a well-thought deployment configuration for the From 704c5f776da4b5a666ded05f7b1373e7afc0b100 Mon Sep 17 00:00:00 2001 From: MOZGIII Date: Tue, 21 Apr 2020 04:37:09 +0300 Subject: [PATCH 067/118] Fix a typo Signed-off-by: MOZGIII --- rfcs/2020-04-04-2221-kubernetes-integration.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/rfcs/2020-04-04-2221-kubernetes-integration.md b/rfcs/2020-04-04-2221-kubernetes-integration.md index 525a613424bc1..5f98e421e9793 100644 --- a/rfcs/2020-04-04-2221-kubernetes-integration.md +++ b/rfcs/2020-04-04-2221-kubernetes-integration.md @@ -560,7 +560,7 @@ available to us without and extra work - we're reading the files anyways. #### Filtering based on Kubernetes API metadata -Filtering bu Kubernetes metadata is way more advanced and flexible from the user +Filtering by Kubernetes metadata is way more advanced and flexible from the user perspective. The idea of doing filtering like that is when Vector picks up a new log file to From b4025ade5fbbfe1504bdc2853b4d21d17cc6a788 Mon Sep 17 00:00:00 2001 From: MOZGIII Date: Tue, 21 Apr 2020 11:04:07 +0300 Subject: [PATCH 068/118] Add a section on Windows support Signed-off-by: MOZGIII --- rfcs/2020-04-04-2221-kubernetes-integration.md | 14 ++++++++++++++ 1 file changed, 14 insertions(+) diff --git a/rfcs/2020-04-04-2221-kubernetes-integration.md b/rfcs/2020-04-04-2221-kubernetes-integration.md index 5f98e421e9793..1f650e3033e74 100644 --- a/rfcs/2020-04-04-2221-kubernetes-integration.md +++ b/rfcs/2020-04-04-2221-kubernetes-integration.md @@ -1059,6 +1059,20 @@ TODO - we can grab and process prometheus metrics from the pods that expose them - we can gather node-level logs, useful for cluster operators +### Windows support + +We don't aim to support Windows Kubernetes clusters initially. The reason for +that is Windows support in general (i.e. outside of Kubernetes context) is a bit +lacking - we don't measure performance on Windows, don't run unit tests on +Windows, don't build Windows docker images, etc. +This is a blocker for a proper integration with Kubernetes clusters running on +Windows. + +To sum up: if it works - it works, if it doesn't - we'll take care of it later. + +> If you're reading this and want to use Vector with Windows - please let us +> know. + ## Prior Art 1. [Filebeat k8s integration] From 7ec40d6da3c36af364e2ede1d37200f10a7a00c2 Mon Sep 17 00:00:00 2001 From: MOZGIII Date: Tue, 21 Apr 2020 12:59:37 +0300 Subject: [PATCH 069/118] Fill in the section on other data gathering Signed-off-by: MOZGIII --- .../2020-04-04-2221-kubernetes-integration.md | 97 ++++++++++++++++++- 1 file changed, 93 insertions(+), 4 deletions(-) diff --git a/rfcs/2020-04-04-2221-kubernetes-integration.md b/rfcs/2020-04-04-2221-kubernetes-integration.md index 1f650e3033e74..f3db69da75b1f 100644 --- a/rfcs/2020-04-04-2221-kubernetes-integration.md +++ b/rfcs/2020-04-04-2221-kubernetes-integration.md @@ -1053,11 +1053,94 @@ but hold the adoption as integration tests. > This section is on gathering data other than container logs. -TODO +While our main focus for the integration is collecting log data from the `Pod`s, +there are other possibilities to gain observaravibility in the Kubernetes +environment. -- we can expose watch events that we get from the k8s API as Vector events -- we can grab and process prometheus metrics from the pods that expose them -- we can gather node-level logs, useful for cluster operators +#### Exposing Kubernetes [`Event`s][k8s_api_event] as Vector events + +It is possible to subscribe to Kubernetes [`Event`s][k8s_api_event], similarly +to how this command works: + +```shell +kubectl get events --all-namespaces --watch +``` + +Implementing this in Vector would allow capturing the Kubernetes +[`Event`s][k8s_api_event] and processing them as Vector events. + +This feature might be very useful for anyone that wants to see what's going on +in their cluster. + +Note that this feature would require deploying Vector in a differently: instead +of running Vector on every node, here we need only once Vector instance running +per cluster. If run on every node, it'd be unnecessarily capturing each event +multiple times. + +So, to implement this, we'd need to add a special source that captures events +from Kubernetes API, and provide a new workload configuration based on +[`Deployment`][k8s_api_deployment]. + +### Discover and gather Prometheus metrics for Kubernetes API resources + +Prometheus already has a built-in +[Kubernetes Service Discovery][prometheus_kubernetes_sd_config] support, so one +could just deploy a Prometheus server, make it discover and gather the metrics, +and the configure Vector to read metrics from it. + +However, to pursue our goal of making Vector the only agent one would need to +deploy - we can consider reimplementing what prometheus +[does][prometheus_kubernetes_sd_config] in Vector code, eliminate the need for +the intermediary. + +We don't aim to implement this in the initial Kubernetes integration. + +### Gather data from the host OS + +This is very useful for Kubernetes Cluster Operators willing to deploy Vector +for the purposes of gaining observability on what's going on with their cluster +nodes. + +Example use cases are: + +- reading `kubelet`/`docker` logs from `journald`; +- capturing `kubelet`/`docker` prometheus metrics; +- gathering system metrics from the node, things like `iostat -x`, `df -h`, + `uptime`, `free`, etc; +- gathering system logs, like `sshd`, `dmesg` and etc. + +There are countless use cases here, and good news Vector already well fit to +perform those kinds of tasks! Even without any Kubernetes integration +whatsoever, it's possible to just deploy Vector as a +[`DaemonSet`][k8s_api_daemon_set], expose the system data to it via +[`hostPath` volume][k8s_api_host_path_volume_source] mounts and/or enabling +`hostNetwork` at the [`PodSpec`][k8s_api_pod_spec]. + +#### Automatic discovery of things to monitor on the host OS + +While nothing prevents users from manually configuring Vector for gathering data +from the host OS, it's very hard for us to offer sane defaults that would work +out-of-the box for all clusters, since there's a miriad of configurations. + +We can consider offering some kind of user-selectable presets for well known +popular setups - like AWS and CGP. + +We can also solve this a general problem of automatic discovery of what we can +monitor on a given system - something similar to what [`netdata`][netdata] has. + +In the context of the current integration efforts, it doesn't make a lot of +sense to try to address this issue in Vector code or deployment configs: + +- gathering data from the host OS works with manual configuration; +- cluster operators mostly know what they're doing, and are capable to configure + Vector as they require; +- there's a myriad of configurations we'd have to support, and it'd be very hard + (if even possible) to come up with sane defaults. +- related to the point above, even with sane defaults, in 95% on cases, cluster + operators would want to tailor the configuration for their use case. + +What we can do, though, is provide guides, blog posts and explainers with +concrete examples for Vector usage for Kubernetes Cluster Operators. ### Windows support @@ -1248,8 +1331,12 @@ See [motivation](#motivation). [jsonlines]: http://jsonlines.org/ [k8s_api_config_map_volume_source]: https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#configmapvolumesource-v1-core [k8s_api_container]: https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#container-v1-core +[k8s_api_daemon_set]: https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#daemonset-v1-apps [k8s_api_daemon_set_update_strategy]: https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#daemonsetupdatestrategy-v1-apps +[k8s_api_deployment]: https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#deployment-v1-apps +[k8s_api_event]: https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#event-v1-core [k8s_api_host_path_volume_source]: https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#hostpathvolumesource-v1-core +[k8s_api_pod_spec]: https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#podspec-v1-core [k8s_api_resource_requirements]: https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#resourcerequirements-v1-core [k8s_docs_crds]: https://kubernetes.io/docs/tasks/access-kubernetes-api/custom-resources/custom-resource-definitions/ [k8s_docs_operator]: https://kubernetes.io/docs/concepts/extend-kubernetes/operator/ @@ -1267,8 +1354,10 @@ See [motivation](#motivation). [logdna k8s integration]: https://docs.logdna.com/docs/kubernetes [logdna_daemonset]: https://raw.githubusercontent.com/logdna/logdna-agent/master/logdna-agent-ds.yaml [metrics-server]: https://github.com/kubernetes-sigs/metrics-server +[netdata]: https://github.com/netdata/netdata [pr#2134]: https://github.com/timberio/vector/pull/2134 [pr#2188]: https://github.com/timberio/vector/pull/2188 +[prometheus_kubernetes_sd_config]: https://prometheus.io/docs/prometheus/latest/configuration/configuration/#kubernetes_sd_config [sidecar_container]: https://github.com/kubernetes/enhancements/blob/a8262db2ce38b2ec7941bdb6810a8d81c5141447/keps/sig-apps/sidecarcontainers.md [terraform]: https://www.terraform.io/ [the chart repository guide]: https://helm.sh/docs/topics/chart_repository/ From 0ed069783c52b73d75636993cc078161f97f690d Mon Sep 17 00:00:00 2001 From: MOZGIII Date: Tue, 21 Apr 2020 13:00:02 +0300 Subject: [PATCH 070/118] Add a dummy section on security Signed-off-by: MOZGIII --- rfcs/2020-04-04-2221-kubernetes-integration.md | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/rfcs/2020-04-04-2221-kubernetes-integration.md b/rfcs/2020-04-04-2221-kubernetes-integration.md index f3db69da75b1f..cf82630098db5 100644 --- a/rfcs/2020-04-04-2221-kubernetes-integration.md +++ b/rfcs/2020-04-04-2221-kubernetes-integration.md @@ -1156,6 +1156,10 @@ To sum up: if it works - it works, if it doesn't - we'll take care of it later. > If you're reading this and want to use Vector with Windows - please let us > know. +### Security + +TODO + ## Prior Art 1. [Filebeat k8s integration] From 8a875f518091cab71c77e00fae48ce2dfd57c517 Mon Sep 17 00:00:00 2001 From: MOZGIII Date: Tue, 21 Apr 2020 13:02:33 +0300 Subject: [PATCH 071/118] Correct the controller mention and added more links Signed-off-by: MOZGIII --- rfcs/2020-04-04-2221-kubernetes-integration.md | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/rfcs/2020-04-04-2221-kubernetes-integration.md b/rfcs/2020-04-04-2221-kubernetes-integration.md index cf82630098db5..d68d284fedec1 100644 --- a/rfcs/2020-04-04-2221-kubernetes-integration.md +++ b/rfcs/2020-04-04-2221-kubernetes-integration.md @@ -340,9 +340,10 @@ the generic concerns. We should provide enough flexibility at the Vector code level for those use cases to be possible. It is possible to implement a sidecar deployment via implementing an operator -to automatically inject Vector `Container` into `Pod`s (via admission -controller), but that doesn't make a lot of sense for us to work on, since -[`DaemonSet`][daemonset] works for most of the use cases already. +to automatically inject Vector [`Container`][k8s_api_container] into +[`Pod`s][k8s_api_pod] (via a special [controller][k8s_docs_controller]), +but that doesn't make a lot of sense for us to work on, since +[`DaemonSet`][k8s_api_daemon_set] works for most of the use cases already. Note that [`DaemonSet`][daemonset] deployment does require special support at Vector code (a dedicated `kubernetes` source), while a perfectly valid sidecar @@ -1341,7 +1342,9 @@ See [motivation](#motivation). [k8s_api_event]: https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#event-v1-core [k8s_api_host_path_volume_source]: https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#hostpathvolumesource-v1-core [k8s_api_pod_spec]: https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#podspec-v1-core +[k8s_api_pod]: https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#pod-v1-core [k8s_api_resource_requirements]: https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#resourcerequirements-v1-core +[k8s_docs_controller]: https://kubernetes.io/docs/concepts/architecture/controller/ [k8s_docs_crds]: https://kubernetes.io/docs/tasks/access-kubernetes-api/custom-resources/custom-resource-definitions/ [k8s_docs_operator]: https://kubernetes.io/docs/concepts/extend-kubernetes/operator/ [k8s_docs_persistent_volumes]: https://kubernetes.io/docs/concepts/storage/persistent-volumes From 402085992658ac8c6dd00b699b70d14acd5c9b81 Mon Sep 17 00:00:00 2001 From: MOZGIII Date: Tue, 21 Apr 2020 13:03:07 +0300 Subject: [PATCH 072/118] Add more links at the guide annotation Signed-off-by: MOZGIII --- rfcs/2020-04-04-2221-kubernetes-integration.md | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/rfcs/2020-04-04-2221-kubernetes-integration.md b/rfcs/2020-04-04-2221-kubernetes-integration.md index d68d284fedec1..7f32d38b922d1 100644 --- a/rfcs/2020-04-04-2221-kubernetes-integration.md +++ b/rfcs/2020-04-04-2221-kubernetes-integration.md @@ -47,8 +47,8 @@ Kubernetes logs and metrics to any destination you please. Our recommended strategy deploys Vector as a Kubernetes [`DaemonSet`][daemonset]. Vector is reading the logs files directly from the -file system, so to collect the logs from all the `Pod`s it has to be deployed -on every `Node` in your cluster. +file system, so to collect the logs from all the [`Pod`s][k8s_docs_pod] it has +to be deployed on every [`Node`][k8s_docs_node] in your cluster. The following diagram demonstrates how this works: @@ -1346,8 +1346,10 @@ See [motivation](#motivation). [k8s_api_resource_requirements]: https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#resourcerequirements-v1-core [k8s_docs_controller]: https://kubernetes.io/docs/concepts/architecture/controller/ [k8s_docs_crds]: https://kubernetes.io/docs/tasks/access-kubernetes-api/custom-resources/custom-resource-definitions/ +[k8s_docs_node]: https://kubernetes.io/docs/concepts/architecture/nodes/ [k8s_docs_operator]: https://kubernetes.io/docs/concepts/extend-kubernetes/operator/ [k8s_docs_persistent_volumes]: https://kubernetes.io/docs/concepts/storage/persistent-volumes +[k8s_docs_pod]: https://kubernetes.io/docs/concepts/workloads/pods/pod/ [k8s_docs_rolling_update]: https://kubernetes.io/docs/tasks/manage-daemon/update-daemon-set/ [k8s_log_path_location_docs]: https://kubernetes.io/docs/concepts/cluster-administration/logging/#logging-at-the-node-level [k8s_src_build_container_logs_directory]: https://github.com/kubernetes/kubernetes/blob/31305966789525fca49ec26c289e565467d1f1c4/pkg/kubelet/kuberuntime/helpers.go#L173 From 424a9e9cfb2a550e068284dd61ce61c55f34c932 Mon Sep 17 00:00:00 2001 From: MOZGIII Date: Tue, 21 Apr 2020 13:09:15 +0300 Subject: [PATCH 073/118] Correct DaemonSet links and specify links to the API and docs based on the context Signed-off-by: MOZGIII --- .../2020-04-04-2221-kubernetes-integration.md | 35 ++++++++++--------- 1 file changed, 18 insertions(+), 17 deletions(-) diff --git a/rfcs/2020-04-04-2221-kubernetes-integration.md b/rfcs/2020-04-04-2221-kubernetes-integration.md index 7f32d38b922d1..959aecf1f7fe4 100644 --- a/rfcs/2020-04-04-2221-kubernetes-integration.md +++ b/rfcs/2020-04-04-2221-kubernetes-integration.md @@ -46,9 +46,9 @@ Kubernetes logs and metrics to any destination you please. #### How This Guide Works Our recommended strategy deploys Vector as a Kubernetes -[`DaemonSet`][daemonset]. Vector is reading the logs files directly from the -file system, so to collect the logs from all the [`Pod`s][k8s_docs_pod] it has -to be deployed on every [`Node`][k8s_docs_node] in your cluster. +[`DaemonSet`][k8s_docs_daemon_set]. Vector is reading the logs files directly +from the file system, so to collect the logs from all the [`Pod`s][k8s_docs_pod] +it has to be deployed on every [`Node`][k8s_docs_node] in your cluster. The following diagram demonstrates how this works: @@ -324,11 +324,11 @@ repo at `https://charts.vector.dev` - short and easy to remember or even guess. We have two ways to deploy vector: -- as a [`DaemonSet`][daemonset]; +- as a [`DaemonSet`][k8s_docs_daemon_set]; - as a [sidecar `Container`][sidecar_container]. -Deploying as a [`DaemonSet`][daemonset] is trivial, applies cluster-wide and -makes sense to as default scenario for the most use cases. +Deployment as a [`DaemonSet`][k8s_docs_daemon_set] is trivial, applies +cluster-wide and makes sense to as default scenario for the most use cases. Sidecar container deployments make sense when cluster-wide deployment is not available. This can generally occur when users are not in control of the whole @@ -345,9 +345,9 @@ to automatically inject Vector [`Container`][k8s_api_container] into but that doesn't make a lot of sense for us to work on, since [`DaemonSet`][k8s_api_daemon_set] works for most of the use cases already. -Note that [`DaemonSet`][daemonset] deployment does require special support at -Vector code (a dedicated `kubernetes` source), while a perfectly valid sidecar -configuration can be implemented with just a simple `file` source. +Note that [`DaemonSet`][k8s_docs_daemon_set] deployment does require special +support at Vector code (a dedicated `kubernetes` source), while a perfectly +valid sidecar configuration can be implemented with just a simple `file` source. This is another reason why we don't pay as much attention to sidecar model. ### Deployment configuration @@ -366,7 +366,8 @@ of design considerations apply to both of them. #### Managing Object -For the reasons discussed above, we'll be using [`DaemonSet`][daemonset]. +For the reasons discussed above, we'll be using +[`DaemonSet`][k8s_api_daemon_set]. #### Data directory @@ -375,8 +376,8 @@ operation at runtime. This directory has to persist across restarts, since it's essential for some features to function (i.e. not losing buffered data if/while the sink is gone). -We'll be using [`DaemonSet`][daemonset], so, naturally, we can leverage -[`hostPath`][k8s_api_host_path_volume_source] volumes. +We'll be using [`DaemonSet`][k8s_api_daemon_set], so, naturally, we can +leverage [`hostPath`][k8s_api_host_path_volume_source] volumes. We'll be using `hostPath` volumes at our YAML config, and at the Helm Chart we'll be using this by default, but we'll also allow configuring this to provide @@ -1225,9 +1226,9 @@ See [motivation](#motivation). daemonset][fluentd_daemonset] and [LogDNA's daemonset][logdna_daemonset].~~ `RollingUpdate` is the default value for [`updateStrategy`][k8s_api_daemon_set_update_strategy] of the - [`DaemonSet`][daemonset]. Alternative is `OnDelete`. `RollingUpdate` makes - more sense for us to use as the default, more info on this is available at - the [docs][k8s_docs_rolling_update]. + [`DaemonSet`][k8s_api_daemon_set]. The only alternative is `OnDelete`. + `RollingUpdate` makes more sense for us to use as the default, more info on + this is available at the [docs][k8s_docs_rolling_update]. 1. ~~I've also noticed `resources` declarations in some of these config files. For example [LogDNA's daemonset][logdna_daemonset]. I assume this is limiting resources. Do we want to consider this?~~ @@ -1302,7 +1303,6 @@ See [motivation](#motivation). [configmap_updates]: https://kubernetes.io/docs/tasks/configure-pod-container/configure-pod-configmap/#mounted-configmaps-are-updated-automatically [container_runtimes]: https://kubernetes.io/docs/setup/production-environment/container-runtimes/ [cri_log_format]: https://github.com/kubernetes/community/blob/ee2abbf9dbfa4523b414f99a04ddc97bd38c74b2/contributors/design-proposals/node/kubelet-cri-logging.md -[daemonset]: https://kubernetes.io/docs/concepts/workloads/controllers/daemonset/ [downward api]: https://kubernetes.io/docs/tasks/inject-data-application/downward-api-volume-expose-pod-information/#store-pod-fields [filebeat k8s integration]: https://www.elastic.co/guide/en/beats/filebeat/master/running-on-kubernetes.html [firecracker]: https://github.com/firecracker-microvm/firecracker @@ -1336,8 +1336,8 @@ See [motivation](#motivation). [jsonlines]: http://jsonlines.org/ [k8s_api_config_map_volume_source]: https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#configmapvolumesource-v1-core [k8s_api_container]: https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#container-v1-core -[k8s_api_daemon_set]: https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#daemonset-v1-apps [k8s_api_daemon_set_update_strategy]: https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#daemonsetupdatestrategy-v1-apps +[k8s_api_daemon_set]: https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#daemonset-v1-apps [k8s_api_deployment]: https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#deployment-v1-apps [k8s_api_event]: https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#event-v1-core [k8s_api_host_path_volume_source]: https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#hostpathvolumesource-v1-core @@ -1346,6 +1346,7 @@ See [motivation](#motivation). [k8s_api_resource_requirements]: https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#resourcerequirements-v1-core [k8s_docs_controller]: https://kubernetes.io/docs/concepts/architecture/controller/ [k8s_docs_crds]: https://kubernetes.io/docs/tasks/access-kubernetes-api/custom-resources/custom-resource-definitions/ +[k8s_docs_daemon_set]: https://kubernetes.io/docs/concepts/workloads/controllers/daemonset/ [k8s_docs_node]: https://kubernetes.io/docs/concepts/architecture/nodes/ [k8s_docs_operator]: https://kubernetes.io/docs/concepts/extend-kubernetes/operator/ [k8s_docs_persistent_volumes]: https://kubernetes.io/docs/concepts/storage/persistent-volumes From 75d389f386f86adda7c7c3a2cd7972ee24807c54 Mon Sep 17 00:00:00 2001 From: MOZGIII Date: Tue, 21 Apr 2020 13:13:35 +0300 Subject: [PATCH 074/118] I hate grammarly sometimes Signed-off-by: MOZGIII --- rfcs/2020-04-04-2221-kubernetes-integration.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/rfcs/2020-04-04-2221-kubernetes-integration.md b/rfcs/2020-04-04-2221-kubernetes-integration.md index 959aecf1f7fe4..c360e81990de3 100644 --- a/rfcs/2020-04-04-2221-kubernetes-integration.md +++ b/rfcs/2020-04-04-2221-kubernetes-integration.md @@ -454,7 +454,7 @@ an example: We must be careful with our `.yaml` files to make them play well with not just `kubectl create -f`, but also with `kubectl apply -f`. There are often issues -with impotency when labels and selectors aren't configured properly and we +with idempotency when labels and selectors aren't configured properly and we should be wary of that. ##### Considered Alternatives From 4ee37c58d36f506a5db926ce2e04dfc5b2af54f7 Mon Sep 17 00:00:00 2001 From: MOZGIII Date: Tue, 21 Apr 2020 13:16:36 +0300 Subject: [PATCH 075/118] Container name is actually exposed too Signed-off-by: MOZGIII --- rfcs/2020-04-04-2221-kubernetes-integration.md | 1 + 1 file changed, 1 insertion(+) diff --git a/rfcs/2020-04-04-2221-kubernetes-integration.md b/rfcs/2020-04-04-2221-kubernetes-integration.md index c360e81990de3..a2cbc4dcfe3d1 100644 --- a/rfcs/2020-04-04-2221-kubernetes-integration.md +++ b/rfcs/2020-04-04-2221-kubernetes-integration.md @@ -556,6 +556,7 @@ following information via the file path: - `pod namespace` - `pod name` - `pod uuid` +- `container name` This is enough information for the basic filtering, and the best part is it's available to us without and extra work - we're reading the files anyways. From 69101d099ee5a0fb65706eb76c782ebd166287e0 Mon Sep 17 00:00:00 2001 From: MOZGIII Date: Tue, 21 Apr 2020 13:26:44 +0300 Subject: [PATCH 076/118] More clarity on why Pod is king Signed-off-by: MOZGIII --- rfcs/2020-04-04-2221-kubernetes-integration.md | 13 +++++++++---- 1 file changed, 9 insertions(+), 4 deletions(-) diff --git a/rfcs/2020-04-04-2221-kubernetes-integration.md b/rfcs/2020-04-04-2221-kubernetes-integration.md index a2cbc4dcfe3d1..6962823f9fc06 100644 --- a/rfcs/2020-04-04-2221-kubernetes-integration.md +++ b/rfcs/2020-04-04-2221-kubernetes-integration.md @@ -575,10 +575,15 @@ This means that there has to be a way to make the data from the k8s API related to the log file available to Vector. Based on the k8s API structure, it looks like we should aim for obtaining the -`Pod` object, since it contains essential information about the containers -that produced the log file. Also, is is the `Pod` objects that `kubelet` relies -on to manage the workloads on the node, so this makes `Pod` objects the best -option for our case, i.e. better than fetching `Deployment` objects. +[`Pod`][k8s_api_pod] object, since it contains essential information about the +containers that produced the log file. Also, it is the [`Pod`][k8s_api_pod] +objects that control the desired workload state that `kubelet` strives to +achieve on the node, which this makes [`Pod`][k8s_api_pod] objects the best +option for our case. In particular - better than +[`Deployment`][k8s_api_deployment] objects. Technically, everything that needs +to run containers will produce a [`Pod`][k8s_api_pod] object, and live +[`Container`s][k8s_api_container] can only exist inside of the +[`Pod`][k8s_api_pod]. There in a number of approaches to get the required `Pod` objects: From ae3cc32710f80660d46c53e803f55a43455424ed Mon Sep 17 00:00:00 2001 From: MOZGIII Date: Tue, 21 Apr 2020 13:37:06 +0300 Subject: [PATCH 077/118] More links and corrections at filtering by metadata Signed-off-by: MOZGIII --- .../2020-04-04-2221-kubernetes-integration.md | 19 +++++++++++-------- 1 file changed, 11 insertions(+), 8 deletions(-) diff --git a/rfcs/2020-04-04-2221-kubernetes-integration.md b/rfcs/2020-04-04-2221-kubernetes-integration.md index 6962823f9fc06..cc35aa2691884 100644 --- a/rfcs/2020-04-04-2221-kubernetes-integration.md +++ b/rfcs/2020-04-04-2221-kubernetes-integration.md @@ -585,19 +585,20 @@ to run containers will produce a [`Pod`][k8s_api_pod] object, and live [`Container`s][k8s_api_container] can only exist inside of the [`Pod`][k8s_api_pod]. -There in a number of approaches to get the required `Pod` objects: +There in a number of approaches to get the required [`Pod`][k8s_api_pod] +objects: 1. Per-file requests. The file paths provide enough data for us to make a query to the k8s API. In - fact, we only need a `pod namespace` and a `pod uuid` to successfully obtain - the `Pod` object. + fact, we only need a `pod namespace` and a `pod uuid` to successfully + [obtain][k8s_api_pod_read] the [`Pod`][k8s_api_pod] object. 2. Per-node requests. - This approach is to list all the pods that are running at the same node as - Vector runs. This effectively lists all the `Pod` objects we could possibly - care about. + This approach is to [list][k8s_api_pod_list_all_namespaces] all the pods that + are running at the same node as Vector runs. This effectively lists all the + [`Pod`][k8s_api_pod] objects we could possibly care about. One important thing to note is metadata for the given pod can change over time, and the implementation has to take that into account, and update the filtering @@ -606,8 +607,8 @@ state accordingly. We also can't overload the k8s API with requests. The general rule of thumb is we shouldn't do requests much more often that k8s itself generates events. -Each approach has very different properties. It is hard to estimate which ones -are a better fit. +Each approach has very different properties. It is hard to estimate which set is +is a better fit. A single watch call for a list of pods running per node (2) should generate less overhead and would probably be easier to implement. @@ -1347,6 +1348,8 @@ See [motivation](#motivation). [k8s_api_deployment]: https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#deployment-v1-apps [k8s_api_event]: https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#event-v1-core [k8s_api_host_path_volume_source]: https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#hostpathvolumesource-v1-core +[k8s_api_pod_list_all_namespaces]: https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#list-all-namespaces-pod-v1-core +[k8s_api_pod_read]: https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#read-pod-v1-core [k8s_api_pod_spec]: https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#podspec-v1-core [k8s_api_pod]: https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#pod-v1-core [k8s_api_resource_requirements]: https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#resourcerequirements-v1-core From 3175823c5769a933f361a7e70ddbd76ed0efc2de Mon Sep 17 00:00:00 2001 From: MOZGIII Date: Wed, 22 Apr 2020 07:05:56 +0300 Subject: [PATCH 078/118] Correct typing error Signed-off-by: MOZGIII --- rfcs/2020-04-04-2221-kubernetes-integration.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/rfcs/2020-04-04-2221-kubernetes-integration.md b/rfcs/2020-04-04-2221-kubernetes-integration.md index cc35aa2691884..1652a1b7c0574 100644 --- a/rfcs/2020-04-04-2221-kubernetes-integration.md +++ b/rfcs/2020-04-04-2221-kubernetes-integration.md @@ -1063,7 +1063,7 @@ but hold the adoption as integration tests. > This section is on gathering data other than container logs. While our main focus for the integration is collecting log data from the `Pod`s, -there are other possibilities to gain observaravibility in the Kubernetes +there are other possibilities to gain observability in the Kubernetes environment. #### Exposing Kubernetes [`Event`s][k8s_api_event] as Vector events From 08cef0767053fc9262676799d398db38c61a8f28 Mon Sep 17 00:00:00 2001 From: MOZGIII Date: Wed, 22 Apr 2020 07:08:10 +0300 Subject: [PATCH 079/118] Correct a typo Signed-off-by: MOZGIII --- rfcs/2020-04-04-2221-kubernetes-integration.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/rfcs/2020-04-04-2221-kubernetes-integration.md b/rfcs/2020-04-04-2221-kubernetes-integration.md index 1652a1b7c0574..e544d20f4882a 100644 --- a/rfcs/2020-04-04-2221-kubernetes-integration.md +++ b/rfcs/2020-04-04-2221-kubernetes-integration.md @@ -1129,7 +1129,7 @@ whatsoever, it's possible to just deploy Vector as a While nothing prevents users from manually configuring Vector for gathering data from the host OS, it's very hard for us to offer sane defaults that would work -out-of-the box for all clusters, since there's a miriad of configurations. +out-of-the-box for all clusters, since there's a miriad of configurations. We can consider offering some kind of user-selectable presets for well known popular setups - like AWS and CGP. From 9a95210218c84c3c448cc5f7919e8b239ebd20f0 Mon Sep 17 00:00:00 2001 From: MOZGIII Date: Wed, 22 Apr 2020 09:18:15 +0300 Subject: [PATCH 080/118] Improve operators and admission controllers info at the deployment varaiants section Signed-off-by: MOZGIII --- rfcs/2020-04-04-2221-kubernetes-integration.md | 13 +++++++------ 1 file changed, 7 insertions(+), 6 deletions(-) diff --git a/rfcs/2020-04-04-2221-kubernetes-integration.md b/rfcs/2020-04-04-2221-kubernetes-integration.md index e544d20f4882a..62835c1900bda 100644 --- a/rfcs/2020-04-04-2221-kubernetes-integration.md +++ b/rfcs/2020-04-04-2221-kubernetes-integration.md @@ -339,11 +339,12 @@ cases are often very custom, we probably don't have to go deeper than explaining the generic concerns. We should provide enough flexibility at the Vector code level for those use cases to be possible. -It is possible to implement a sidecar deployment via implementing an operator -to automatically inject Vector [`Container`][k8s_api_container] into -[`Pod`s][k8s_api_pod] (via a special [controller][k8s_docs_controller]), -but that doesn't make a lot of sense for us to work on, since -[`DaemonSet`][k8s_api_daemon_set] works for most of the use cases already. +It is possible to implement a sidecar deployment via implementing an +[operator][k8s_docs_operator] to automatically inject Vector +[`Container`][k8s_api_container] into [`Pod`s][k8s_api_pod], via a custom +[admission controller][k8s_docs_admission_controllers], but that doesn't make +a lot of sense for us to work on, since [`DaemonSet`][k8s_api_daemon_set] +works for most of the use cases already. Note that [`DaemonSet`][k8s_docs_daemon_set] deployment does require special support at Vector code (a dedicated `kubernetes` source), while a perfectly @@ -1353,7 +1354,7 @@ See [motivation](#motivation). [k8s_api_pod_spec]: https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#podspec-v1-core [k8s_api_pod]: https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#pod-v1-core [k8s_api_resource_requirements]: https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#resourcerequirements-v1-core -[k8s_docs_controller]: https://kubernetes.io/docs/concepts/architecture/controller/ +[k8s_docs_admission_controllers]: https://kubernetes.io/docs/reference/access-authn-authz/admission-controllers [k8s_docs_crds]: https://kubernetes.io/docs/tasks/access-kubernetes-api/custom-resources/custom-resource-definitions/ [k8s_docs_daemon_set]: https://kubernetes.io/docs/concepts/workloads/controllers/daemonset/ [k8s_docs_node]: https://kubernetes.io/docs/concepts/architecture/nodes/ From bc5525c757d84eda46bc8eddfed6625c4077e893 Mon Sep 17 00:00:00 2001 From: MOZGIII Date: Wed, 22 Apr 2020 09:19:35 +0300 Subject: [PATCH 081/118] Fill in the security section Signed-off-by: MOZGIII --- .../2020-04-04-2221-kubernetes-integration.md | 136 +++++++++++++++++- 1 file changed, 135 insertions(+), 1 deletion(-) diff --git a/rfcs/2020-04-04-2221-kubernetes-integration.md b/rfcs/2020-04-04-2221-kubernetes-integration.md index 62835c1900bda..55fdbc906e0fd 100644 --- a/rfcs/2020-04-04-2221-kubernetes-integration.md +++ b/rfcs/2020-04-04-2221-kubernetes-integration.md @@ -1168,7 +1168,117 @@ To sum up: if it works - it works, if it doesn't - we'll take care of it later. ### Security -TODO +There are different aspects of security. In this RFC we're going to focus on +Kubernetes specific aspects. + +Securing in Kubernetes environment plays a major role, and the more we do to +ensure our code and deployment recommendations are safe - the better. Big +deployments often have dedicated security teams that will be doing what we do +on their own - just to double check, but the majority of our people out there +don't have enough resources to dedicate enough attention to the security +aspects. This is why implementing security measures in our integration is +important. + +#### Vector Code Audit + +There have to be automated security audit of the Vector codebase, to ensure +we don't have easily detectable issues. Things like automated CVE checks and +static analyzers fall into this category. +We're already doing a good job in this aspect. + +#### Vector Docker Images Audit + +There has to be an automated security audit of the Vector docker images that we +ship. + +We should consider using tools like this: + +- [trivy](https://github.com/aquasecurity/trivy) +- [clair](https://github.com/quay/clair) +- [anchore-engine](https://github.com/anchore/anchore-engine) + +... and similar. + +#### Deployment Hardening + +We should harden the Vector deployment by default. This means that our suggested +YAML files should be hardened, and Helm Chart should be configurable, but also +hardened by default. + +- We should properly configure + [PodSecurityContext][k8s_api_pod_security_context] + ([docs][k8s_docs_security_context]): + + - properly configure [`sysctls`][k8s_api_sysctl]; + - `fsGroup` - should be unset. + +- We should properly configure [SecurityContext][k8s_api_security_context] + ([docs][k8s_docs_security_context]): + + - enable `readOnlyRootFilesystem` since we don't need to write to files at + rootfs; + - enable `runAsNonRoot` if possible - we shouldn't need root access to conduct + most of our operations, but this has to be validated in practice; the aim + is to enable it if possible; + - disable `allowPrivilegeEscalation` since we shouldn't need extra any special + privileges in the first place, and definitely we don't need escalation; + - properly configure [`seLinuxOptions`][k8s_api_se_linux_options]; + - properly configure [`capabilities`][k8s_api_capabilities] - see + [`man 7 capabilities`][man_7_capabilities] for more info; + - disable `privileged` - we shouldn't don't need privileged access, and it's + me a major security issue if we do. + +- We should properly use [`ServiceAccount`][k8s_api_service_account], + [`Role`][k8s_api_role], [`RoleBinding`][k8s_api_role_binding], + [`ClusterRole`][k8s_api_cluster_role] and + [`ClusterRoleBinding`][k8s_api_cluster_role_binding] ([docs][k8s_docs_rbac]). + + The service accounts at Kubernetes by default have no permissions, except for + the service accounts at the `kube-system` namespace. We'll be using a + dedicated `vector` namespace, so it's our responsibility to request the + required permissions. + + The exact set of permissions to request at default deployment configuration + depends on the implementation we'll land and the Vector settings of the + default deployment configuration. + The goal is to eliminate any non-required permissions - we don't have to keep + anything extra there for demonstration purposes. + + We also have to document all possible required permissions, so that users are + aware of the possible configuration options. At Helm Charts we should allow + configuring arbitrary permissions via values (while providing sane defaults). + +#### Securing secrets + +Vector sometimes needs access to secrets, like AWS API access tokens and so on. +That data has to be adequately protected. + +We should recommend users to use [`Secret`][k8s_api_secret] +([docs][k8s_docs_secret]) instead of [`ConfigMap`][k8s_api_config_map] if they +have secret data embedded in their Vector `.toml` config files. + +We should also consider integrating with tools like [Vault] and [redoctober]. + +#### Recommend users additional steps to secure the cluster + +- Suggest using [Falco]. +- Suggest setting up proper RBAC rules for cluster operators and users; + [`audit2rbac`](https://github.com/liggitt/audit2rbac) is a useful tool to + help with this. +- Suggest using [Pod Security Policies][k8s_docs_pod_security_policiy] + ([API][k8s_api_pod_security_policy]). +- Suggest using [NetworkPolicy][k8s_api_network_policy]. +- Suggest runnig [kube-bench]. +- Suggest reading the + [Kubernetes security documentation][k8s_docs_securing_a_cluster]. + +#### Automatic container rebuilds + +The ability to rebuild containers with a CVE fix automatically quickly is a very +important part of a successful vulnerability mitigation strategy. +We should prepare in advance and rollout the infrastructure and automation to +make it possible to rebuild the containers for _all_ (not just the latest or +nightly!) the supported Vector versions. ## Prior Art @@ -1312,6 +1422,7 @@ See [motivation](#motivation). [container_runtimes]: https://kubernetes.io/docs/setup/production-environment/container-runtimes/ [cri_log_format]: https://github.com/kubernetes/community/blob/ee2abbf9dbfa4523b414f99a04ddc97bd38c74b2/contributors/design-proposals/node/kubelet-cri-logging.md [downward api]: https://kubernetes.io/docs/tasks/inject-data-application/downward-api-volume-expose-pod-information/#store-pod-fields +[falco]: https://github.com/falcosecurity/falco [filebeat k8s integration]: https://www.elastic.co/guide/en/beats/filebeat/master/running-on-kubernetes.html [firecracker]: https://github.com/firecracker-microvm/firecracker [fluentbit k8s integration]: https://docs.fluentbit.io/manual/installation/kubernetes @@ -1342,45 +1453,68 @@ See [motivation](#motivation). [issue#2225]: https://github.com/timberio/vector/issues/2225 [json file logging driver]: https://docs.docker.com/config/containers/logging/json-file/ [jsonlines]: http://jsonlines.org/ +[k8s_api_capabilities]: https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#capabilities-v1-core +[k8s_api_cluster_role_binding]: https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#clusterrolebinding-v1-rbac-authorization-k8s-io +[k8s_api_cluster_role]: https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#clusterrole-v1-rbac-authorization-k8s-io [k8s_api_config_map_volume_source]: https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#configmapvolumesource-v1-core +[k8s_api_config_map]: https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#configmap-v1-core [k8s_api_container]: https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#container-v1-core [k8s_api_daemon_set_update_strategy]: https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#daemonsetupdatestrategy-v1-apps [k8s_api_daemon_set]: https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#daemonset-v1-apps [k8s_api_deployment]: https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#deployment-v1-apps [k8s_api_event]: https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#event-v1-core [k8s_api_host_path_volume_source]: https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#hostpathvolumesource-v1-core +[k8s_api_network_policy]: https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#networkpolicy-v1-networking-k8s-io [k8s_api_pod_list_all_namespaces]: https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#list-all-namespaces-pod-v1-core [k8s_api_pod_read]: https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#read-pod-v1-core +[k8s_api_pod_security_context]: https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#podsecuritycontext-v1-core +[k8s_api_pod_security_policy]: https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#podsecuritypolicy-v1beta1-policy [k8s_api_pod_spec]: https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#podspec-v1-core [k8s_api_pod]: https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#pod-v1-core [k8s_api_resource_requirements]: https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#resourcerequirements-v1-core +[k8s_api_role_binding]: https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#rolebinding-v1-rbac-authorization-k8s-io +[k8s_api_role]: https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#role-v1-rbac-authorization-k8s-io +[k8s_api_se_linux_options]: https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#selinuxoptions-v1-core +[k8s_api_secret]: https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#secret-v1-core +[k8s_api_security_context]: https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#securitycontext-v1-core +[k8s_api_service_account]: https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#serviceaccount-v1-core +[k8s_api_sysctl]: https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#sysctl-v1-core [k8s_docs_admission_controllers]: https://kubernetes.io/docs/reference/access-authn-authz/admission-controllers [k8s_docs_crds]: https://kubernetes.io/docs/tasks/access-kubernetes-api/custom-resources/custom-resource-definitions/ [k8s_docs_daemon_set]: https://kubernetes.io/docs/concepts/workloads/controllers/daemonset/ [k8s_docs_node]: https://kubernetes.io/docs/concepts/architecture/nodes/ [k8s_docs_operator]: https://kubernetes.io/docs/concepts/extend-kubernetes/operator/ [k8s_docs_persistent_volumes]: https://kubernetes.io/docs/concepts/storage/persistent-volumes +[k8s_docs_pod_security_policiy]: https://kubernetes.io/docs/concepts/policy/pod-security-policy/ [k8s_docs_pod]: https://kubernetes.io/docs/concepts/workloads/pods/pod/ +[k8s_docs_rbac]: https://kubernetes.io/docs/reference/access-authn-authz/rbac/ [k8s_docs_rolling_update]: https://kubernetes.io/docs/tasks/manage-daemon/update-daemon-set/ +[k8s_docs_secret]: https://kubernetes.io/docs/concepts/configuration/secret/ +[k8s_docs_securing_a_cluster]: https://kubernetes.io/docs/tasks/administer-cluster/securing-a-cluster/ +[k8s_docs_security_context]: https://kubernetes.io/docs/tasks/configure-pod-container/security-context [k8s_log_path_location_docs]: https://kubernetes.io/docs/concepts/cluster-administration/logging/#logging-at-the-node-level [k8s_src_build_container_logs_directory]: https://github.com/kubernetes/kubernetes/blob/31305966789525fca49ec26c289e565467d1f1c4/pkg/kubelet/kuberuntime/helpers.go#L173 [k8s_src_parse_funcs]: https://github.com/kubernetes/kubernetes/blob/e74ad388541b15ae7332abf2e586e2637b55d7a7/pkg/kubelet/kuberuntime/logs/logs.go#L116 [k8s_src_read_logs]: https://github.com/kubernetes/kubernetes/blob/e74ad388541b15ae7332abf2e586e2637b55d7a7/pkg/kubelet/kuberuntime/logs/logs.go#L277 [k8s_src_var_log_pods]: https://github.com/kubernetes/kubernetes/blob/58596b2bf5eb0d84128fa04d0395ddd148d96e51/pkg/kubelet/kuberuntime/kuberuntime_manager.go#L60 +[kube-bench]: https://github.com/aquasecurity/kube-bench [kubectl_rollout_restart]: https://kubernetes.io/docs/reference/generated/kubectl/kubectl-commands#-em-restart-em- [kubernetes version and version skew support policy]: https://kubernetes.io/docs/setup/release/version-skew-policy/ [kubernetes_version_comment]: https://github.com/timberio/vector/pull/2188#discussion_r403120481 [kustomize]: https://github.com/kubernetes-sigs/kustomize [logdna k8s integration]: https://docs.logdna.com/docs/kubernetes [logdna_daemonset]: https://raw.githubusercontent.com/logdna/logdna-agent/master/logdna-agent-ds.yaml +[man_7_capabilities]: http://man7.org/linux/man-pages/man7/capabilities.7.html [metrics-server]: https://github.com/kubernetes-sigs/metrics-server [netdata]: https://github.com/netdata/netdata [pr#2134]: https://github.com/timberio/vector/pull/2134 [pr#2188]: https://github.com/timberio/vector/pull/2188 [prometheus_kubernetes_sd_config]: https://prometheus.io/docs/prometheus/latest/configuration/configuration/#kubernetes_sd_config +[redoctober]: https://github.com/cloudflare/redoctober [sidecar_container]: https://github.com/kubernetes/enhancements/blob/a8262db2ce38b2ec7941bdb6810a8d81c5141447/keps/sig-apps/sidecarcontainers.md [terraform]: https://www.terraform.io/ [the chart repository guide]: https://helm.sh/docs/topics/chart_repository/ +[vault]: https://www.vaultproject.io/ [vector_daemonset]: 2020-04-04-2221-kubernetes-integration/vector-daemonset.yaml [why_so_much_configurations]: https://github.com/timberio/vector/pull/2134/files#r401634895 [windows_in_kubernetes]: https://kubernetes.io/docs/setup/production-environment/windows/intro-windows-in-kubernetes/ From ef9bc96a3bbb3eb6f448463090d1ee8f2d35cdb0 Mon Sep 17 00:00:00 2001 From: MOZGIII Date: Wed, 22 Apr 2020 09:20:24 +0300 Subject: [PATCH 082/118] Add a reference to the security section at deployment configuration section Signed-off-by: MOZGIII --- rfcs/2020-04-04-2221-kubernetes-integration.md | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/rfcs/2020-04-04-2221-kubernetes-integration.md b/rfcs/2020-04-04-2221-kubernetes-integration.md index 55fdbc906e0fd..dd63947616ac5 100644 --- a/rfcs/2020-04-04-2221-kubernetes-integration.md +++ b/rfcs/2020-04-04-2221-kubernetes-integration.md @@ -505,6 +505,11 @@ exchange this data with our partners and derive an even more realistic profile for Vector's runtime properties, based on real data from the multiple data sets. This worth a separate dedicated RFC though. +#### Security considerations on deployment configuration + +Security considerations on deployment configuration are grouped together with +other security related measures. See [here][#deployment-hardening]. + ### Annotating events with metadata from Kubernetes Kubernetes has a lot of metadata that can be associated with the logs, and most From 89b80c8887d789c197222a3173a577da3013d261 Mon Sep 17 00:00:00 2001 From: MOZGIII Date: Wed, 22 Apr 2020 09:20:56 +0300 Subject: [PATCH 083/118] Add a dummy section on Kubernetes audit logs at other data gathering section Signed-off-by: MOZGIII --- rfcs/2020-04-04-2221-kubernetes-integration.md | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/rfcs/2020-04-04-2221-kubernetes-integration.md b/rfcs/2020-04-04-2221-kubernetes-integration.md index dd63947616ac5..a95ce4d4c20e5 100644 --- a/rfcs/2020-04-04-2221-kubernetes-integration.md +++ b/rfcs/2020-04-04-2221-kubernetes-integration.md @@ -1157,6 +1157,12 @@ sense to try to address this issue in Vector code or deployment configs: What we can do, though, is provide guides, blog posts and explainers with concrete examples for Vector usage for Kubernetes Cluster Operators. +#### Kubernetes audit logs + +We can also collect [Kubernetes audit logs][k8s_docs_audit]. + +TODO: elaborate more. + ### Windows support We don't aim to support Windows Kubernetes clusters initially. The reason for @@ -1485,6 +1491,7 @@ See [motivation](#motivation). [k8s_api_service_account]: https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#serviceaccount-v1-core [k8s_api_sysctl]: https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#sysctl-v1-core [k8s_docs_admission_controllers]: https://kubernetes.io/docs/reference/access-authn-authz/admission-controllers +[k8s_docs_audit]: https://kubernetes.io/docs/tasks/debug-application-cluster/audit/ [k8s_docs_crds]: https://kubernetes.io/docs/tasks/access-kubernetes-api/custom-resources/custom-resource-definitions/ [k8s_docs_daemon_set]: https://kubernetes.io/docs/concepts/workloads/controllers/daemonset/ [k8s_docs_node]: https://kubernetes.io/docs/concepts/architecture/nodes/ From fe2e1e14d8fe41e67a18eb2faf97bee781f90e75 Mon Sep 17 00:00:00 2001 From: MOZGIII Date: Wed, 22 Apr 2020 09:21:22 +0300 Subject: [PATCH 084/118] Add a TODO section at the deployment configuration Signed-off-by: MOZGIII --- rfcs/2020-04-04-2221-kubernetes-integration.md | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) diff --git a/rfcs/2020-04-04-2221-kubernetes-integration.md b/rfcs/2020-04-04-2221-kubernetes-integration.md index a95ce4d4c20e5..df4986cccf132 100644 --- a/rfcs/2020-04-04-2221-kubernetes-integration.md +++ b/rfcs/2020-04-04-2221-kubernetes-integration.md @@ -510,6 +510,23 @@ This worth a separate dedicated RFC though. Security considerations on deployment configuration are grouped together with other security related measures. See [here][#deployment-hardening]. +#### TODO + +- https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#podspec-v1-core + + - terminationGracePeriodSeconds + - hostNetwork + - preemptionPolicy + - priorityClassName + - readinessGates + - runtimeClassName + +Add to resources sections: + +- We can pass thread limit to vector via env var, see + https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#envvarsource-v1-core + via https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#resourcefieldselector-v1-core + ### Annotating events with metadata from Kubernetes Kubernetes has a lot of metadata that can be associated with the logs, and most From 3197ae182ec0ef88683f6da75c60be914adf8650 Mon Sep 17 00:00:00 2001 From: MOZGIII Date: Wed, 22 Apr 2020 18:41:09 +0300 Subject: [PATCH 085/118] Add deployment topology diagram Signed-off-by: MOZGIII --- .../2020-04-04-2221-kubernetes-integration.md | 2 +- .../deployment-topology.pu | 41 + .../deployment-topology.svg | 2646 +++++++++++++++++ .../vector.svg | 13 + 4 files changed, 2701 insertions(+), 1 deletion(-) create mode 100644 rfcs/2020-04-04-2221-kubernetes-integration/deployment-topology.pu create mode 100644 rfcs/2020-04-04-2221-kubernetes-integration/deployment-topology.svg create mode 100644 rfcs/2020-04-04-2221-kubernetes-integration/vector.svg diff --git a/rfcs/2020-04-04-2221-kubernetes-integration.md b/rfcs/2020-04-04-2221-kubernetes-integration.md index df4986cccf132..e64ba2e6c7a7c 100644 --- a/rfcs/2020-04-04-2221-kubernetes-integration.md +++ b/rfcs/2020-04-04-2221-kubernetes-integration.md @@ -52,7 +52,7 @@ it has to be deployed on every [`Node`][k8s_docs_node] in your cluster. The following diagram demonstrates how this works: -TODO: add deployment topology diagram here. + ### What We'll Accomplish diff --git a/rfcs/2020-04-04-2221-kubernetes-integration/deployment-topology.pu b/rfcs/2020-04-04-2221-kubernetes-integration/deployment-topology.pu new file mode 100644 index 0000000000000..d51ef254188c5 --- /dev/null +++ b/rfcs/2020-04-04-2221-kubernetes-integration/deployment-topology.pu @@ -0,0 +1,41 @@ +@startuml +!include + +sprite $vector ./vector.svg + +skinparam ComponentFontColor MediumBlue + +cloud "Kubernetens\nCluster" as cluster { + node "Worker\nNode" as node1 { + component "<$vector>" as node1_vector + + component "<$pod>" as node1_pod1 + component "<$pod>" as node1_pod2 + component "<$pod>" as node1_pod3 + + node1_vector <-up- node1_pod1 + node1_vector <-up- node1_pod2 + node1_vector <-up- node1_pod3 + } + + node "Worker\nNode" as node2 { + component "<$vector>" as node2_vector + + component "<$pod>" as node2_pod1 + component "<$pod>" as node2_pod2 + component "<$pod>" as node2_pod3 + + node2_vector <-up- node2_pod1 + node2_vector <-up- node2_pod2 + node2_vector <-up- node2_pod3 + } +} + +cloud { + component "Sink" as sink + + node1_vector -down-> sink + node2_vector -down-> sink +} + +@enduml diff --git a/rfcs/2020-04-04-2221-kubernetes-integration/deployment-topology.svg b/rfcs/2020-04-04-2221-kubernetes-integration/deployment-topology.svg new file mode 100644 index 0000000000000..0e90bcb9b925c --- /dev/null +++ b/rfcs/2020-04-04-2221-kubernetes-integration/deployment-topology.svg @@ -0,0 +1,2646 @@ +KubernetensClusterWorkerNodeWorkerNode Custom Preset Created with Sketch. Custom Preset Created with Sketch. Sink \ No newline at end of file diff --git a/rfcs/2020-04-04-2221-kubernetes-integration/vector.svg b/rfcs/2020-04-04-2221-kubernetes-integration/vector.svg new file mode 100644 index 0000000000000..15876de70b22c --- /dev/null +++ b/rfcs/2020-04-04-2221-kubernetes-integration/vector.svg @@ -0,0 +1,13 @@ + + + + Custom Preset + Created with Sketch. + + + + + \ No newline at end of file From b220af175d313497d285c42230bdb4ee121d9298 Mon Sep 17 00:00:00 2001 From: MOZGIII Date: Wed, 22 Apr 2020 18:45:52 +0300 Subject: [PATCH 086/118] Correct the security considerations on deployment configuration Signed-off-by: MOZGIII --- rfcs/2020-04-04-2221-kubernetes-integration.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/rfcs/2020-04-04-2221-kubernetes-integration.md b/rfcs/2020-04-04-2221-kubernetes-integration.md index e64ba2e6c7a7c..b85ca4f0ae11b 100644 --- a/rfcs/2020-04-04-2221-kubernetes-integration.md +++ b/rfcs/2020-04-04-2221-kubernetes-integration.md @@ -508,7 +508,7 @@ This worth a separate dedicated RFC though. #### Security considerations on deployment configuration Security considerations on deployment configuration are grouped together with -other security related measures. See [here][#deployment-hardening]. +other security-related measures. See [here](#deployment-hardening). #### TODO From cdc313433bd77030b49ec6938d131be05320e758 Mon Sep 17 00:00:00 2001 From: MOZGIII Date: Wed, 22 Apr 2020 18:46:17 +0300 Subject: [PATCH 087/118] Correct the link to kubectl interface Signed-off-by: MOZGIII --- rfcs/2020-04-04-2221-kubernetes-integration.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/rfcs/2020-04-04-2221-kubernetes-integration.md b/rfcs/2020-04-04-2221-kubernetes-integration.md index b85ca4f0ae11b..433467fd5cc3e 100644 --- a/rfcs/2020-04-04-2221-kubernetes-integration.md +++ b/rfcs/2020-04-04-2221-kubernetes-integration.md @@ -157,7 +157,7 @@ The following diagram demonstrates how this works: 1. Prepare `kustomization.yaml`. - Use the same config as in [Kubectl Interface]. + Use the same config as in [Kubectl Interface](#kubectl-interface). ```yaml # kustomization.yaml From 95693a079533ce15615bdade3265ef601d076e81 Mon Sep 17 00:00:00 2001 From: MOZGIII Date: Wed, 22 Apr 2020 18:50:18 +0300 Subject: [PATCH 088/118] Rename guide sections Signed-off-by: MOZGIII --- rfcs/2020-04-04-2221-kubernetes-integration.md | 18 ++++++++++-------- 1 file changed, 10 insertions(+), 8 deletions(-) diff --git a/rfcs/2020-04-04-2221-kubernetes-integration.md b/rfcs/2020-04-04-2221-kubernetes-integration.md index 433467fd5cc3e..1ad8d03b1b498 100644 --- a/rfcs/2020-04-04-2221-kubernetes-integration.md +++ b/rfcs/2020-04-04-2221-kubernetes-integration.md @@ -64,7 +64,7 @@ The following diagram demonstrates how this works: ### Tutorial -#### Kubectl Interface +#### Deploy using `kubectl` 1. Configure Vector: @@ -114,7 +114,7 @@ The following diagram demonstrates how this works: That's it! -#### Helm Interface +#### Deploy using Helm 1. Install [`helm`][helm_install]. @@ -151,13 +151,13 @@ The following diagram demonstrates how this works: vector/vector ``` -#### Install using Kustomize +#### Deploy using Kustomize 1. Install [`kustomize`][kustomize]. 1. Prepare `kustomization.yaml`. - Use the same config as in [Kubectl Interface](#kubectl-interface). + Use the same config as in [`kubectl` guide][anchor_tutorial_kubectl]. ```yaml # kustomization.yaml @@ -431,9 +431,9 @@ the new config) via YAML files storing Kubernetes API objects configuration can be grouped differently. -The layout proposed in [guide above](#kubectl-interface) is what we're planing -to use. It is in line with the sections above on Vector configuration splitting -into the common and custom parts. +The layout proposed in [guide above][anchor_tutorial_kubectl] is what we're +planing to use. It is in line with the sections above on Vector configuration +splitting into the common and custom parts. The idea is to have a single file with a namespaced configuration (`DaemonSet`, `ServiceAccount`, `ClusterRoleBinding`, common `ConfigMap`, etc), a single file @@ -1344,7 +1344,8 @@ See [motivation](#motivation). See the [Minimal supported Kubernetes version][anchor_minimal_supported_kubernetes_version] section. 1. ~~What is the best to avoid Vector from ingesting it's own logs? I'm assuming - that my [`kubectl` tutorial](#kubectl-interface) handles this with namespaces? + that my [`kubectl` tutorial][anchor_tutorial_kubectl] handles this with + namespaces? We'd just need to configure Vector to exclude this namespace?~~ See the [Origin filtering][anchor_origin_filtering] section. 1. ~~I've seen two different installation strategies. For example, Fluentd offers @@ -1442,6 +1443,7 @@ See [motivation](#motivation). [anchor_origin_filtering]: #origin-filtering [anchor_resource_limits]: #resource-limits [anchor_strategy_on_yaml_file_grouping]: #strategy-on-yaml-file-grouping +[anchor_tutorial_kubectl]: #deploy-using-kubectl [awesome operators list]: https://github.com/operator-framework/awesome-operators [bonzai logging operator]: https://github.com/banzaicloud/logging-operator [chartmuseum]: https://chartmuseum.com/ From 214a80ce4b3a91cc31b4b265b412b3cfc7c6a486 Mon Sep 17 00:00:00 2001 From: MOZGIII Date: Wed, 22 Apr 2020 19:33:04 +0300 Subject: [PATCH 089/118] Add a section on container probes Signed-off-by: MOZGIII --- .../2020-04-04-2221-kubernetes-integration.md | 41 +++++++++++++++++++ 1 file changed, 41 insertions(+) diff --git a/rfcs/2020-04-04-2221-kubernetes-integration.md b/rfcs/2020-04-04-2221-kubernetes-integration.md index 1ad8d03b1b498..b3ddcedc3e296 100644 --- a/rfcs/2020-04-04-2221-kubernetes-integration.md +++ b/rfcs/2020-04-04-2221-kubernetes-integration.md @@ -351,6 +351,44 @@ support at Vector code (a dedicated `kubernetes` source), while a perfectly valid sidecar configuration can be implemented with just a simple `file` source. This is another reason why we don't pay as much attention to sidecar model. +#### Container probes + +Kubernetes allows configuring a number of [`Probe`s][k8s_api_probe] on +[`Container`][k8s_api_container], and taking action based on those probes. +See the [documentation](k8s_docs_pod_lifecycle_container_probes) to learn more. + +- `readinessProbe` + + Periodic probe of container service readiness. Container will be removed from + service endpoints if the probe fails. + +- `livenessProbe` + + Periodic probe of container liveness. Container will be restarted if the probe + fails. + +- `startupProbe` + + Startup probe indicates that the container has successfully initialized. If + specified, no other probes are executed until this completes successfully. If + this probe fails, the container will be restarted, just as if the + `livenessProbe` failed. + +Vector should implement proper support for all of those one way or another at +the code level. + +- `startupProbe` can be tight to the initial topology healthcheck - i.e. we + consider it failed until the initial topology health check is complete, and + consider it ok at any moment after that; + +- `livenessProbe` should probably be tied to the async executor threadpool + responsiveness - i.e. if we can handle an HTTP request in a special liveness + server we expose in Vector - consider the probe ok, else something's very + wrong, and we should consider the probe failed; + +- `readinessProbe` is the most tricky one; it is unclear what the semantics + makes sense there. + ### Deployment configuration It is important that provide a well-thought deployment configuration for the @@ -1402,6 +1440,7 @@ See [motivation](#motivation). might be differences in how different container runtimes handle logs. 1. How do we want to approach Helm Chart Repository management. 1. How do we implement liveness, readiness and startup probes? + Readiness probe is a tricky one. See [Container probes](#container-probes). 1. Can we populate file at `terminationMessagePath` with some meaningful information when we exit or crash? 1. Can we allow passing arbitrary fields from the `Pod` object to the event? @@ -1501,6 +1540,7 @@ See [motivation](#motivation). [k8s_api_pod_security_policy]: https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#podsecuritypolicy-v1beta1-policy [k8s_api_pod_spec]: https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#podspec-v1-core [k8s_api_pod]: https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#pod-v1-core +[k8s_api_probe]: https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#probe-v1-core [k8s_api_resource_requirements]: https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#resourcerequirements-v1-core [k8s_api_role_binding]: https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#rolebinding-v1-rbac-authorization-k8s-io [k8s_api_role]: https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#role-v1-rbac-authorization-k8s-io @@ -1516,6 +1556,7 @@ See [motivation](#motivation). [k8s_docs_node]: https://kubernetes.io/docs/concepts/architecture/nodes/ [k8s_docs_operator]: https://kubernetes.io/docs/concepts/extend-kubernetes/operator/ [k8s_docs_persistent_volumes]: https://kubernetes.io/docs/concepts/storage/persistent-volumes +[k8s_docs_pod_lifecycle_container_probes]: https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle#container-probes [k8s_docs_pod_security_policiy]: https://kubernetes.io/docs/concepts/policy/pod-security-policy/ [k8s_docs_pod]: https://kubernetes.io/docs/concepts/workloads/pods/pod/ [k8s_docs_rbac]: https://kubernetes.io/docs/reference/access-authn-authz/rbac/ From d71e4be3a10a847136232c94957a938e051c6fe2 Mon Sep 17 00:00:00 2001 From: MOZGIII Date: Wed, 22 Apr 2020 19:35:53 +0300 Subject: [PATCH 090/118] Add a section on other notable PodSpec properties Signed-off-by: MOZGIII --- .../2020-04-04-2221-kubernetes-integration.md | 29 +++++++++++++------ 1 file changed, 20 insertions(+), 9 deletions(-) diff --git a/rfcs/2020-04-04-2221-kubernetes-integration.md b/rfcs/2020-04-04-2221-kubernetes-integration.md index b3ddcedc3e296..9c1dcdc700673 100644 --- a/rfcs/2020-04-04-2221-kubernetes-integration.md +++ b/rfcs/2020-04-04-2221-kubernetes-integration.md @@ -548,16 +548,25 @@ This worth a separate dedicated RFC though. Security considerations on deployment configuration are grouped together with other security-related measures. See [here](#deployment-hardening). -#### TODO - -- https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#podspec-v1-core +#### Other notable [`PodSpec`][k8s_api_pod_spec] properties + +- `terminationGracePeriodSeconds` - we should set this to a value slightly + bigger than Vector topology grace termination period; +- `hostNetwork` - we shouldn't use host network since we need access to + `kube-apiserver`, and the easiest way to get that is to use cluster network; +- `preemptionPolicy` - our default deployment mode - aggregating logs from + pods - is not considered critical for cluster itself, so we should _not_ + disable preemption; +- `priorityClassName` - see [`PriorityClass` docs][k8s_docs_priority_class]; we + could ship a [`PriorityClass`][k8s_api_priority_class] and set this value, but + the priority value is not normalized, so it's probably not a good idea to + provide a default our of the box, and leave it for cluster operator to + configure; +- `runtimeClassName` - we'll be using this value in tests to validate that + Vector works with non-standard runtime; we shouldn't set it in our default + YAMLs, nor set it at Helm by default; - - terminationGracePeriodSeconds - - hostNetwork - - preemptionPolicy - - priorityClassName - - readinessGates - - runtimeClassName +#### TODO Add to resources sections: @@ -1540,6 +1549,7 @@ See [motivation](#motivation). [k8s_api_pod_security_policy]: https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#podsecuritypolicy-v1beta1-policy [k8s_api_pod_spec]: https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#podspec-v1-core [k8s_api_pod]: https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#pod-v1-core +[k8s_api_priority_class]: https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#priorityclass-v1-scheduling-k8s-io [k8s_api_probe]: https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#probe-v1-core [k8s_api_resource_requirements]: https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#resourcerequirements-v1-core [k8s_api_role_binding]: https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#rolebinding-v1-rbac-authorization-k8s-io @@ -1559,6 +1569,7 @@ See [motivation](#motivation). [k8s_docs_pod_lifecycle_container_probes]: https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle#container-probes [k8s_docs_pod_security_policiy]: https://kubernetes.io/docs/concepts/policy/pod-security-policy/ [k8s_docs_pod]: https://kubernetes.io/docs/concepts/workloads/pods/pod/ +[k8s_docs_priority_class]: https://kubernetes.io/docs/concepts/configuration/pod-priority-preemption/#priorityclass [k8s_docs_rbac]: https://kubernetes.io/docs/reference/access-authn-authz/rbac/ [k8s_docs_rolling_update]: https://kubernetes.io/docs/tasks/manage-daemon/update-daemon-set/ [k8s_docs_secret]: https://kubernetes.io/docs/concepts/configuration/secret/ From 14f8216801dcdea2bde86c8b8a7aaf355a10f77b Mon Sep 17 00:00:00 2001 From: MOZGIII Date: Wed, 22 Apr 2020 19:39:17 +0300 Subject: [PATCH 091/118] Removed a TODO on setting thread count via env var This is not required, since our core count logic will play nicely with cgroups and do the right thing automatically. Signed-off-by: MOZGIII --- rfcs/2020-04-04-2221-kubernetes-integration.md | 8 -------- 1 file changed, 8 deletions(-) diff --git a/rfcs/2020-04-04-2221-kubernetes-integration.md b/rfcs/2020-04-04-2221-kubernetes-integration.md index 9c1dcdc700673..6e6eb3b3ded64 100644 --- a/rfcs/2020-04-04-2221-kubernetes-integration.md +++ b/rfcs/2020-04-04-2221-kubernetes-integration.md @@ -566,14 +566,6 @@ other security-related measures. See [here](#deployment-hardening). Vector works with non-standard runtime; we shouldn't set it in our default YAMLs, nor set it at Helm by default; -#### TODO - -Add to resources sections: - -- We can pass thread limit to vector via env var, see - https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#envvarsource-v1-core - via https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#resourcefieldselector-v1-core - ### Annotating events with metadata from Kubernetes Kubernetes has a lot of metadata that can be associated with the logs, and most From 56e519295def3cdf280ef486f767916d768db4dc Mon Sep 17 00:00:00 2001 From: MOZGIII Date: Wed, 22 Apr 2020 19:58:48 +0300 Subject: [PATCH 092/118] Fill the section on k8s audit logs collection Signed-off-by: MOZGIII --- .../2020-04-04-2221-kubernetes-integration.md | 19 ++++++++++++++++++- 1 file changed, 18 insertions(+), 1 deletion(-) diff --git a/rfcs/2020-04-04-2221-kubernetes-integration.md b/rfcs/2020-04-04-2221-kubernetes-integration.md index 6e6eb3b3ded64..8bd34f09a0ef3 100644 --- a/rfcs/2020-04-04-2221-kubernetes-integration.md +++ b/rfcs/2020-04-04-2221-kubernetes-integration.md @@ -1152,6 +1152,9 @@ So, to implement this, we'd need to add a special source that captures events from Kubernetes API, and provide a new workload configuration based on [`Deployment`][k8s_api_deployment]. +See also a section on collecting +[Kubernetes audit logs][anchor_kubernetes_audit_logs]. + ### Discover and gather Prometheus metrics for Kubernetes API resources Prometheus already has a built-in @@ -1217,7 +1220,18 @@ concrete examples for Vector usage for Kubernetes Cluster Operators. We can also collect [Kubernetes audit logs][k8s_docs_audit]. -TODO: elaborate more. +This is very similar to +[collecting Kubernetes Events][anchor_collecting_kubernetes_events], but +provides a more fine-grained control over what events are audited. + +It's important to understand that events, unfiltered, should be considered very +sensitive and privileged data. + +Kubernetes audit [`Policy`][k8s_api_policy] allows cluster operator to configure +`kubelet`s to manage the audit data with a high degree of flexibility. + +The best part is this is something that should already work great with Vector - +we can already support operation via both log and webhook backends. ### Windows support @@ -1477,8 +1491,10 @@ See [motivation](#motivation). - [ ] Add Kubernetes setup/integration guide. - [ ] Release `0.10.0` and announce. +[anchor_collecting_kubernetes_events]: #exposing-kubernetes-event-s-k8s-api-event-as-vector-events [anchor_file_locations]: #file-locations [anchor_helm_vs_raw_yaml_files]: #helm-vs-raw-yaml-files +[anchor_kubernetes_audit_logs]: #kubernetes-audit-logs [anchor_minimal_supported_kubernetes_version]: #minimal-supported-kubernetes-version [anchor_origin_filtering]: #origin-filtering [anchor_resource_limits]: #resource-limits @@ -1541,6 +1557,7 @@ See [motivation](#motivation). [k8s_api_pod_security_policy]: https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#podsecuritypolicy-v1beta1-policy [k8s_api_pod_spec]: https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#podspec-v1-core [k8s_api_pod]: https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#pod-v1-core +[k8s_api_policy]: https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#policy-v1alpha1-auditregistration-k8s-io [k8s_api_priority_class]: https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#priorityclass-v1-scheduling-k8s-io [k8s_api_probe]: https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#probe-v1-core [k8s_api_resource_requirements]: https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#resourcerequirements-v1-core From 677a6ee30cbc49487a012fce47ad3e951c69f0d0 Mon Sep 17 00:00:00 2001 From: MOZGIII Date: Thu, 23 Apr 2020 13:57:59 +0300 Subject: [PATCH 093/118] Add link the other data gathering at the questions section Signed-off-by: MOZGIII --- rfcs/2020-04-04-2221-kubernetes-integration.md | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/rfcs/2020-04-04-2221-kubernetes-integration.md b/rfcs/2020-04-04-2221-kubernetes-integration.md index 8bd34f09a0ef3..6704f384f1943 100644 --- a/rfcs/2020-04-04-2221-kubernetes-integration.md +++ b/rfcs/2020-04-04-2221-kubernetes-integration.md @@ -1441,9 +1441,10 @@ See [motivation](#motivation). Kubernetes now solves it's internal autoscaling pipelines needs with [`metrics-server`][metrics-server] - a similar idea yet much more lightweight implementation. -1. How are we collecting Kubernetes system events? Is that outside of the +1. ~~How are we collecting Kubernetes system events? Is that outside of the scope of this RFC? And why does this take an entirely different path? - (ref [issue#1293]) + (ref [issue#1293])~~ + See the [Other data gathering][anchor_other_data_gathering] section. 1. What are some of the details that set Vector's Kubernetes integration apart? This is for marketing purposes and also helps us "raise the bar". @@ -1497,6 +1498,7 @@ See [motivation](#motivation). [anchor_kubernetes_audit_logs]: #kubernetes-audit-logs [anchor_minimal_supported_kubernetes_version]: #minimal-supported-kubernetes-version [anchor_origin_filtering]: #origin-filtering +[anchor_other_data_gathering]: #anchor-other-data-gathering [anchor_resource_limits]: #resource-limits [anchor_strategy_on_yaml_file_grouping]: #strategy-on-yaml-file-grouping [anchor_tutorial_kubectl]: #deploy-using-kubectl From 36b9a2314807dc9b20455a2b62a7c9509e86b8d5 Mon Sep 17 00:00:00 2001 From: MOZGIII Date: Thu, 23 Apr 2020 14:06:06 +0300 Subject: [PATCH 094/118] Correct header levels Signed-off-by: MOZGIII --- rfcs/2020-04-04-2221-kubernetes-integration.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/rfcs/2020-04-04-2221-kubernetes-integration.md b/rfcs/2020-04-04-2221-kubernetes-integration.md index 6704f384f1943..da5fc12b3f8e0 100644 --- a/rfcs/2020-04-04-2221-kubernetes-integration.md +++ b/rfcs/2020-04-04-2221-kubernetes-integration.md @@ -1155,7 +1155,7 @@ from Kubernetes API, and provide a new workload configuration based on See also a section on collecting [Kubernetes audit logs][anchor_kubernetes_audit_logs]. -### Discover and gather Prometheus metrics for Kubernetes API resources +#### Discover and gather Prometheus metrics for Kubernetes API resources Prometheus already has a built-in [Kubernetes Service Discovery][prometheus_kubernetes_sd_config] support, so one @@ -1169,7 +1169,7 @@ the intermediary. We don't aim to implement this in the initial Kubernetes integration. -### Gather data from the host OS +#### Gather data from the host OS This is very useful for Kubernetes Cluster Operators willing to deploy Vector for the purposes of gaining observability on what's going on with their cluster @@ -1190,7 +1190,7 @@ whatsoever, it's possible to just deploy Vector as a [`hostPath` volume][k8s_api_host_path_volume_source] mounts and/or enabling `hostNetwork` at the [`PodSpec`][k8s_api_pod_spec]. -#### Automatic discovery of things to monitor on the host OS +##### Automatic discovery of things to monitor on the host OS While nothing prevents users from manually configuring Vector for gathering data from the host OS, it's very hard for us to offer sane defaults that would work From f3c64219008e8eac6c10bb246bfe7b4dfd9ef0d0 Mon Sep 17 00:00:00 2001 From: MOZGIII Date: Thu, 23 Apr 2020 14:10:29 +0300 Subject: [PATCH 095/118] Move container probes under the deploynment configuration Signed-off-by: MOZGIII --- .../2020-04-04-2221-kubernetes-integration.md | 76 +++++++++---------- 1 file changed, 38 insertions(+), 38 deletions(-) diff --git a/rfcs/2020-04-04-2221-kubernetes-integration.md b/rfcs/2020-04-04-2221-kubernetes-integration.md index da5fc12b3f8e0..be125632ada81 100644 --- a/rfcs/2020-04-04-2221-kubernetes-integration.md +++ b/rfcs/2020-04-04-2221-kubernetes-integration.md @@ -351,44 +351,6 @@ support at Vector code (a dedicated `kubernetes` source), while a perfectly valid sidecar configuration can be implemented with just a simple `file` source. This is another reason why we don't pay as much attention to sidecar model. -#### Container probes - -Kubernetes allows configuring a number of [`Probe`s][k8s_api_probe] on -[`Container`][k8s_api_container], and taking action based on those probes. -See the [documentation](k8s_docs_pod_lifecycle_container_probes) to learn more. - -- `readinessProbe` - - Periodic probe of container service readiness. Container will be removed from - service endpoints if the probe fails. - -- `livenessProbe` - - Periodic probe of container liveness. Container will be restarted if the probe - fails. - -- `startupProbe` - - Startup probe indicates that the container has successfully initialized. If - specified, no other probes are executed until this completes successfully. If - this probe fails, the container will be restarted, just as if the - `livenessProbe` failed. - -Vector should implement proper support for all of those one way or another at -the code level. - -- `startupProbe` can be tight to the initial topology healthcheck - i.e. we - consider it failed until the initial topology health check is complete, and - consider it ok at any moment after that; - -- `livenessProbe` should probably be tied to the async executor threadpool - responsiveness - i.e. if we can handle an HTTP request in a special liveness - server we expose in Vector - consider the probe ok, else something's very - wrong, and we should consider the probe failed; - -- `readinessProbe` is the most tricky one; it is unclear what the semantics - makes sense there. - ### Deployment configuration It is important that provide a well-thought deployment configuration for the @@ -566,6 +528,44 @@ other security-related measures. See [here](#deployment-hardening). Vector works with non-standard runtime; we shouldn't set it in our default YAMLs, nor set it at Helm by default; +#### Container probes + +Kubernetes allows configuring a number of [`Probe`s][k8s_api_probe] on +[`Container`][k8s_api_container], and taking action based on those probes. +See the [documentation](k8s_docs_pod_lifecycle_container_probes) to learn more. + +- `readinessProbe` + + Periodic probe of container service readiness. Container will be removed from + service endpoints if the probe fails. + +- `livenessProbe` + + Periodic probe of container liveness. Container will be restarted if the probe + fails. + +- `startupProbe` + + Startup probe indicates that the container has successfully initialized. If + specified, no other probes are executed until this completes successfully. If + this probe fails, the container will be restarted, just as if the + `livenessProbe` failed. + +Vector should implement proper support for all of those one way or another at +the code level. + +- `startupProbe` can be tight to the initial topology healthcheck - i.e. we + consider it failed until the initial topology health check is complete, and + consider it ok at any moment after that; + +- `livenessProbe` should probably be tied to the async executor threadpool + responsiveness - i.e. if we can handle an HTTP request in a special liveness + server we expose in Vector - consider the probe ok, else something's very + wrong, and we should consider the probe failed; + +- `readinessProbe` is the most tricky one; it is unclear what the semantics + makes sense there. + ### Annotating events with metadata from Kubernetes Kubernetes has a lot of metadata that can be associated with the logs, and most From 5cb20c489971538a99f07f8ff9bca32290323ffe Mon Sep 17 00:00:00 2001 From: MOZGIII Date: Thu, 23 Apr 2020 14:26:26 +0300 Subject: [PATCH 096/118] Add a section on automatic partial event merging Signed-off-by: MOZGIII --- rfcs/2020-04-04-2221-kubernetes-integration.md | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/rfcs/2020-04-04-2221-kubernetes-integration.md b/rfcs/2020-04-04-2221-kubernetes-integration.md index be125632ada81..a9639040df88b 100644 --- a/rfcs/2020-04-04-2221-kubernetes-integration.md +++ b/rfcs/2020-04-04-2221-kubernetes-integration.md @@ -280,6 +280,17 @@ following formats: We have to support both formats. +#### Automatic partial events merging + +Kubernetes uses two log file formats, and both split log messages that are too +long into multiple log records. + +It makes sense to automatically merge the log records that were split back +together, similarly to how we do in the `docker` source. + +We will implement automatic partial event merging and enable it by default, +while allowing users to opt-out of it if they need to. + ### Helm vs raw YAML files We consider both raw YAML files and Helm Chart officially supported installation From 7139fe0e48c2ff15249425661fd44c86d6ab2d26 Mon Sep 17 00:00:00 2001 From: MOZGIII Date: Fri, 24 Apr 2020 14:20:41 +0300 Subject: [PATCH 097/118] Add a link to Deployment Hardening section at the answers Signed-off-by: MOZGIII --- rfcs/2020-04-04-2221-kubernetes-integration.md | 1 + 1 file changed, 1 insertion(+) diff --git a/rfcs/2020-04-04-2221-kubernetes-integration.md b/rfcs/2020-04-04-2221-kubernetes-integration.md index a9639040df88b..bbe690b6f13d0 100644 --- a/rfcs/2020-04-04-2221-kubernetes-integration.md +++ b/rfcs/2020-04-04-2221-kubernetes-integration.md @@ -1431,6 +1431,7 @@ See [motivation](#motivation). permissions is to be determined at YAML files design stage - after we complete the implementation. It's really trivial to determine from a set of API calls used. + See the [Deployment Hardening](#deployment-hardening) section. 1. ~~What is `updateStrategy` ... `RollingUpdate`? This is not included in [our daemonset][vector_daemonset] or in [any of Fluentbit's config files][fluentbit_installation]. But it is included in both [Fluentd's From 3689585acbf5e731c091ee78b8af4bbacd73dfed Mon Sep 17 00:00:00 2001 From: MOZGIII Date: Fri, 24 Apr 2020 14:23:48 +0300 Subject: [PATCH 098/118] Remove the ref to the outstanding questions Signed-off-by: MOZGIII --- rfcs/2020-04-04-2221-kubernetes-integration.md | 2 -- 1 file changed, 2 deletions(-) diff --git a/rfcs/2020-04-04-2221-kubernetes-integration.md b/rfcs/2020-04-04-2221-kubernetes-integration.md index bbe690b6f13d0..d08124bdb4a02 100644 --- a/rfcs/2020-04-04-2221-kubernetes-integration.md +++ b/rfcs/2020-04-04-2221-kubernetes-integration.md @@ -110,8 +110,6 @@ The following diagram demonstrates how this works: kubectl apply --namespace vector -f https://packages.timber.io/vector/latest/kubernetes/vector-namespaced.yaml ``` - - _See [outstanding questions 3, 4, 5, 6, and 7](#outstanding-questions)._ - That's it! #### Deploy using Helm From 3592f141faffb757fbea3b3e6f3bef000fe86fde Mon Sep 17 00:00:00 2001 From: MOZGIII Date: Fri, 24 Apr 2020 19:21:00 +0300 Subject: [PATCH 099/118] Add a section on filtering by namespace Signed-off-by: MOZGIII --- .../2020-04-04-2221-kubernetes-integration.md | 20 +++++++++++++++++++ 1 file changed, 20 insertions(+) diff --git a/rfcs/2020-04-04-2221-kubernetes-integration.md b/rfcs/2020-04-04-2221-kubernetes-integration.md index d08124bdb4a02..ba19a23253387 100644 --- a/rfcs/2020-04-04-2221-kubernetes-integration.md +++ b/rfcs/2020-04-04-2221-kubernetes-integration.md @@ -759,6 +759,23 @@ reading logs files into events and then filtering them out. This is also a perfectly valid way of filtering out logs of Vector itself. +##### Filtering by namespaces annotations + +There is a [demand](https://github.com/fluent/fluent-bit/issues/1140) for +filtering by namespace via namespace annotations. This is an additional concern +to filtering by just the `Pod` object data that was already described above. + +The idea is that all `Pod`s belong to [`Namespace`s][k8s_api_namespace] ([docs][k8s_docs_namespaces]), and users want to be able +to annotate the `Namespace` itself for exclusion, effectively excluding all the +`Pod` belonging to it from collection. + +To support this, we'll have to maintain the list of excluded `Namespace`s, and +filter `Pod`s against that list. + +Listing the `Namespace`s can be done via the +[corresponding API][k8s_api_namespace_list] in a similar manner to how we do it +for `Pod`s. Came concerns regarding caching and load limiting apply. + #### Filtering based on event fields after annotation This is an alternative approach to the previous implementation. @@ -1562,6 +1579,8 @@ See [motivation](#motivation). [k8s_api_deployment]: https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#deployment-v1-apps [k8s_api_event]: https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#event-v1-core [k8s_api_host_path_volume_source]: https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#hostpathvolumesource-v1-core +[k8s_api_namespace_list]: https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#list-namespace-v1-core +[k8s_api_namespace]: https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#namespace-v1-core [k8s_api_network_policy]: https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#networkpolicy-v1-networking-k8s-io [k8s_api_pod_list_all_namespaces]: https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#list-all-namespaces-pod-v1-core [k8s_api_pod_read]: https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#read-pod-v1-core @@ -1584,6 +1603,7 @@ See [motivation](#motivation). [k8s_docs_audit]: https://kubernetes.io/docs/tasks/debug-application-cluster/audit/ [k8s_docs_crds]: https://kubernetes.io/docs/tasks/access-kubernetes-api/custom-resources/custom-resource-definitions/ [k8s_docs_daemon_set]: https://kubernetes.io/docs/concepts/workloads/controllers/daemonset/ +[k8s_docs_namespaces]: https://kubernetes.io/docs/concepts/overview/working-with-objects/namespaces/ [k8s_docs_node]: https://kubernetes.io/docs/concepts/architecture/nodes/ [k8s_docs_operator]: https://kubernetes.io/docs/concepts/extend-kubernetes/operator/ [k8s_docs_persistent_volumes]: https://kubernetes.io/docs/concepts/storage/persistent-volumes From f7f45c896b9f9fc5914528f987bae149886e66e5 Mon Sep 17 00:00:00 2001 From: MOZGIII Date: Fri, 24 Apr 2020 19:40:37 +0300 Subject: [PATCH 100/118] Remove resolved questions Signed-off-by: MOZGIII --- .../2020-04-04-2221-kubernetes-integration.md | 40 ------------------- 1 file changed, 40 deletions(-) diff --git a/rfcs/2020-04-04-2221-kubernetes-integration.md b/rfcs/2020-04-04-2221-kubernetes-integration.md index ba19a23253387..4bacd4793823a 100644 --- a/rfcs/2020-04-04-2221-kubernetes-integration.md +++ b/rfcs/2020-04-04-2221-kubernetes-integration.md @@ -1418,26 +1418,11 @@ See [motivation](#motivation). ### From Ben -1. ~~What is the minimal Kubernetes version that we want to support. See - [this comment][kubernetes_version_comment].~~ - See the [Minimal supported Kubernetes version][anchor_minimal_supported_kubernetes_version] - section. 1. ~~What is the best to avoid Vector from ingesting it's own logs? I'm assuming that my [`kubectl` tutorial][anchor_tutorial_kubectl] handles this with namespaces? We'd just need to configure Vector to exclude this namespace?~~ See the [Origin filtering][anchor_origin_filtering] section. -1. ~~I've seen two different installation strategies. For example, Fluentd offers - a [single daemonset configuration file][fluentd_daemonset] while Fluentbit - offers [four separate configuration files][fluentbit_installation] - (`service-account.yaml`, `role.yaml`, `role-binding.yaml`, `configmap.yaml`). - Which approach is better? Why are they different?~~ - See the - [Strategy on YAML file grouping][anchor_strategy_on_yaml_file_grouping] - section. -1. ~~Should we prefer `kubectl create ...` or `kubectl apply ...`? The examples - in the [prior art](#prior-art) section use both.~~ - See [Helm vs raw YAML files][anchor_helm_vs_raw_yaml_files] section. 1. ~~From what I understand, Vector requires the Kubernetes `watch` verb in order to receive updates to k8s cluster changes. This is required for the `kubernetes_pod_metadata` transform. Yet, Fluentbit [requires the `get`, @@ -1447,31 +1432,6 @@ See [motivation](#motivation). complete the implementation. It's really trivial to determine from a set of API calls used. See the [Deployment Hardening](#deployment-hardening) section. -1. ~~What is `updateStrategy` ... `RollingUpdate`? This is not included in - [our daemonset][vector_daemonset] or in [any of Fluentbit's config - files][fluentbit_installation]. But it is included in both [Fluentd's - daemonset][fluentd_daemonset] and [LogDNA's daemonset][logdna_daemonset].~~ - `RollingUpdate` is the default value for - [`updateStrategy`][k8s_api_daemon_set_update_strategy] of the - [`DaemonSet`][k8s_api_daemon_set]. The only alternative is `OnDelete`. - `RollingUpdate` makes more sense for us to use as the default, more info on - this is available at the [docs][k8s_docs_rolling_update]. -1. ~~I've also noticed `resources` declarations in some of these config files. - For example [LogDNA's daemonset][logdna_daemonset]. I assume this is limiting - resources. Do we want to consider this?~~ - See the [Resource Limits][anchor_resource_limits] section of this RFC. -1. ~~What the hell is going on with [Honeycomb's integration - strategy][honeycomb integration]? :) It seems like the whole "Heapster" - pipeline is specifically for system events, but Heapster is deprecated? - This leads me to my next question...~~ - Heapster is indeed outdated, as well as Honeycomb integration guide. - Kubernetes now solves it's internal autoscaling pipelines needs with - [`metrics-server`][metrics-server] - a similar idea yet much more lightweight - implementation. -1. ~~How are we collecting Kubernetes system events? Is that outside of the - scope of this RFC? And why does this take an entirely different path? - (ref [issue#1293])~~ - See the [Other data gathering][anchor_other_data_gathering] section. 1. What are some of the details that set Vector's Kubernetes integration apart? This is for marketing purposes and also helps us "raise the bar". From f017f34d1163c3410c711812847f2040acf1594c Mon Sep 17 00:00:00 2001 From: MOZGIII Date: Fri, 24 Apr 2020 20:08:01 +0300 Subject: [PATCH 101/118] Add a section on Potential Windows-specific Issues Signed-off-by: MOZGIII --- .../2020-04-04-2221-kubernetes-integration.md | 23 +++++++++++++++++++ 1 file changed, 23 insertions(+) diff --git a/rfcs/2020-04-04-2221-kubernetes-integration.md b/rfcs/2020-04-04-2221-kubernetes-integration.md index 4bacd4793823a..d1e18eec7ce6d 100644 --- a/rfcs/2020-04-04-2221-kubernetes-integration.md +++ b/rfcs/2020-04-04-2221-kubernetes-integration.md @@ -1273,6 +1273,29 @@ To sum up: if it works - it works, if it doesn't - we'll take care of it later. > If you're reading this and want to use Vector with Windows - please let us > know. +#### Potential Windows-specific Issues + +Windows has it's own specifics. We can learn from the past experience of other +implementations to avoid the problems they encounter. + +- https://github.com/fluent/fluent-bit/issues/2027 + + This issue is on what seems to be a resource management problem with files on + Windows - their implementation doesn't let go of the log file in time when the + container (along with it's log files) is about to be removed. + This is a non-issue in a typical linux deployment because it's not the path at + the filesystem, but the inode that FD bind to. On Windows it's the other way. + + There's actually a workaround for that: it's possible request Windows to allow + deletion of the opened file - by specifying the `FILE_SHARE_DELETE` flag at + [`CreateFileA`](https://docs.microsoft.com/en-us/windows/win32/api/fileapi/nf-fileapi-createfilea) + call. + + See more details: + + - https://stackoverflow.com/questions/3202329/will-we-ever-be-able-to-delete-an-open-file-in-windows + - https://boostgsoc13.github.io/boost.afio/doc/html/afio/FAQ/deleting_open_files.html + ### Security There are different aspects of security. In this RFC we're going to focus on From 35e93601a2edc0313357b808dee05b82a79c008b Mon Sep 17 00:00:00 2001 From: MOZGIII Date: Fri, 24 Apr 2020 20:30:53 +0300 Subject: [PATCH 102/118] Improve the ChartMuseum mentions Signed-off-by: MOZGIII --- rfcs/2020-04-04-2221-kubernetes-integration.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/rfcs/2020-04-04-2221-kubernetes-integration.md b/rfcs/2020-04-04-2221-kubernetes-integration.md index d1e18eec7ce6d..31b50cef4ad99 100644 --- a/rfcs/2020-04-04-2221-kubernetes-integration.md +++ b/rfcs/2020-04-04-2221-kubernetes-integration.md @@ -321,10 +321,10 @@ Everything we need to do to achieve this is outlined at the [The Chart Repository Guide]. We can use a tool like [ChartMuseum] to manage our repo. Alternatively, we can -use a bare HTTP server, like AWS S3 or Github Pages. A tool like like -[ChartMuseum] has the benefit of doing some things for us. It can use S3 -for storage, and offers a convenient [helm plugin][helm_push] to release charts, -so the release process should be very simple. +use a bare HTTP server, like AWS S3 or Github Pages. +[ChartMuseum] has the benefit of doing some things for us. It can use S3 for +storage, and offers a convenient [helm plugin][helm_push] to release charts, so +the release process should be very simple. From the user experience perspective, it would be cool if we expose our chart repo at `https://charts.vector.dev` - short and easy to remember or even guess. From ae5030d0e149e1126bbba7290920388fb8a0f90f Mon Sep 17 00:00:00 2001 From: MOZGIII Date: Fri, 24 Apr 2020 21:10:04 +0300 Subject: [PATCH 103/118] Add a note on non-RBAC clusters Signed-off-by: MOZGIII --- rfcs/2020-04-04-2221-kubernetes-integration.md | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/rfcs/2020-04-04-2221-kubernetes-integration.md b/rfcs/2020-04-04-2221-kubernetes-integration.md index 31b50cef4ad99..8fd8712b69691 100644 --- a/rfcs/2020-04-04-2221-kubernetes-integration.md +++ b/rfcs/2020-04-04-2221-kubernetes-integration.md @@ -1378,6 +1378,12 @@ hardened by default. aware of the possible configuration options. At Helm Charts we should allow configuring arbitrary permissions via values (while providing sane defaults). + We can optionally support non-[RBAC][k8s_docs_rbac] clusters in the Helm + Chart. + In the real world, the non-RBAC clusters should be very rare, since RBAC has + been recommended for a very long time, and it's the default for the fresh + `kubeadm` installations. It's probably not a major concern. + #### Securing secrets Vector sometimes needs access to secrets, like AWS API access tokens and so on. From 30d2b543b9583d74bba80447982c5c6043bf9673 Mon Sep 17 00:00:00 2001 From: MOZGIII Date: Fri, 24 Apr 2020 21:16:41 +0300 Subject: [PATCH 104/118] Update the plan of attack on kubernetes source Signed-off-by: MOZGIII --- rfcs/2020-04-04-2221-kubernetes-integration.md | 13 +++++-------- 1 file changed, 5 insertions(+), 8 deletions(-) diff --git a/rfcs/2020-04-04-2221-kubernetes-integration.md b/rfcs/2020-04-04-2221-kubernetes-integration.md index 8fd8712b69691..92a2fe2d4604d 100644 --- a/rfcs/2020-04-04-2221-kubernetes-integration.md +++ b/rfcs/2020-04-04-2221-kubernetes-integration.md @@ -1490,12 +1490,11 @@ See [motivation](#motivation). and [issue#1635]. - [ ] Ensure we are testing all supported minor versions. See [issue#2223]. -- [ ] Audit and improve the `kubernetes` source. - - [ ] Handle the log recursion problem where Vector ingests it's own logs. - See [issue#2218] and [issue#2171]. - - [ ] Audit the `file` source strategy. See [issue#2199] and [issue#1910]. - - [ ] Merge split logs. See [pr#2134]. -- [ ] Audit and improve the `kubernetes_pod_matadata` transform. +- [ ] Rework the `kubernetes` source. + - [ ] Merge the `kubernetes` source with the `kubernetes_pod_matadata` + transform. + - [ ] Implement origin filtering. + - [ ] Merge split logs [pr#2134]. - [ ] Use the `log_schema.kubernetes_key` setting. See [issue#1867]. - [ ] Add a way to load optional config files (i.e. load config file if it exists, and ignore it if it doesn't). Required to elegantly load multiple @@ -1545,11 +1544,9 @@ See [motivation](#motivation). [issue#1635]: https://github.com/timberio/vector/issues/1635 [issue#1816]: https://github.com/timberio/vector/issues/1867 [issue#1867]: https://github.com/timberio/vector/issues/1867 -[issue#1910]: https://github.com/timberio/vector/issues/1910 [issue#2170]: https://github.com/timberio/vector/issues/2170 [issue#2171]: https://github.com/timberio/vector/issues/2171 [issue#2193]: https://github.com/timberio/vector/issues/2193 -[issue#2199]: https://github.com/timberio/vector/issues/2199 [issue#2216]: https://github.com/timberio/vector/issues/2216 [issue#2218]: https://github.com/timberio/vector/issues/2218 [issue#2223]: https://github.com/timberio/vector/issues/2223 From f59e6da41d4576049e649f628e873e2bc637ab02 Mon Sep 17 00:00:00 2001 From: MOZGIII Date: Fri, 24 Apr 2020 21:27:38 +0300 Subject: [PATCH 105/118] Add a consideration on deriving YAMLs from Helm Charts Signed-off-by: MOZGIII --- rfcs/2020-04-04-2221-kubernetes-integration.md | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/rfcs/2020-04-04-2221-kubernetes-integration.md b/rfcs/2020-04-04-2221-kubernetes-integration.md index 92a2fe2d4604d..06670556a7613 100644 --- a/rfcs/2020-04-04-2221-kubernetes-integration.md +++ b/rfcs/2020-04-04-2221-kubernetes-integration.md @@ -312,6 +312,16 @@ work without tweaking with as much cluster setups as possible. We should support both `kubectl create` and `kubectl apply` flows. `kubectl apply` is generally more limiting than `kubectl create`. +We can derive our YAML files from the Helm Charts to fold to a single source of +truth for the configuration. To do that we'd need a `values.yaml`, suitable +for rendering the Helm Chart template into a set of YAML files, and a script +combine/regroup/reformat the rendered templates for better usability. + +Alternatively, we can hand-write the YAML files. This has the benefit of making +them more user-friendly. It's unclear if this is provides a real value compared +to deriving them from Helm Charts - since the ultimate user-friendly way is to +use Helm Charts. + ### Helm Chart Repository We should not just maintain a Helm Chart, we also should offer Helm repo to make From 5740f89a79acc13e89e9aaecfaa3cb754d232c02 Mon Sep 17 00:00:00 2001 From: MOZGIII Date: Fri, 24 Apr 2020 21:30:20 +0300 Subject: [PATCH 106/118] Order preparing YAML deployment config after Helm Charts at plan of attack Signed-off-by: MOZGIII --- rfcs/2020-04-04-2221-kubernetes-integration.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/rfcs/2020-04-04-2221-kubernetes-integration.md b/rfcs/2020-04-04-2221-kubernetes-integration.md index 06670556a7613..42a9bb88a39e6 100644 --- a/rfcs/2020-04-04-2221-kubernetes-integration.md +++ b/rfcs/2020-04-04-2221-kubernetes-integration.md @@ -1510,8 +1510,8 @@ See [motivation](#motivation). exists, and ignore it if it doesn't). Required to elegantly load multiple files so that we can split the configuration. - [ ] Add `kubernetes` source reference documentation. -- [ ] Prepare YAML deployment config. - [ ] Prepare Heml Chart. +- [ ] Prepare YAML deployment config. - [ ] Prepare Heml Chart Repository. - [ ] Integrate kubernetes configuration snapshotting into the release process. - [ ] Add Kubernetes setup/integration guide. From 26dbe9867865a8cfb2b2726f52c3730a3f621208 Mon Sep 17 00:00:00 2001 From: MOZGIII Date: Fri, 24 Apr 2020 21:32:11 +0300 Subject: [PATCH 107/118] Fix some errors Signed-off-by: MOZGIII --- rfcs/2020-04-04-2221-kubernetes-integration.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/rfcs/2020-04-04-2221-kubernetes-integration.md b/rfcs/2020-04-04-2221-kubernetes-integration.md index 42a9bb88a39e6..d9fe54c90fe17 100644 --- a/rfcs/2020-04-04-2221-kubernetes-integration.md +++ b/rfcs/2020-04-04-2221-kubernetes-integration.md @@ -1049,7 +1049,7 @@ Let's outline the requirements on the properties of the solution: we employ should allow third-parties to bring their own resources. Things that are local in essence (like `minikube`) should just work. There shouldn't be a situation where one can't run tests in `minikube` because cloud parts aren't - available. We already have a similar constraints at the Vector Test Harness. + available. We already have similar constraints at the Vector Test Harness. - We need the required efforts to managements the solution to be low, and the price to be relatively small. This means that the solution has to be simple. @@ -1314,7 +1314,7 @@ Kubernetes specific aspects. Securing in Kubernetes environment plays a major role, and the more we do to ensure our code and deployment recommendations are safe - the better. Big deployments often have dedicated security teams that will be doing what we do -on their own - just to double check, but the majority of our people out there +on their own - just to double-check, but the majority of our people out there don't have enough resources to dedicate enough attention to the security aspects. This is why implementing security measures in our integration is important. From f34c2b568555679cf7947fdb658fd577badcbbbf Mon Sep 17 00:00:00 2001 From: MOZGIII Date: Fri, 24 Apr 2020 21:52:36 +0300 Subject: [PATCH 108/118] Fix a typo Signed-off-by: MOZGIII --- rfcs/2020-04-04-2221-kubernetes-integration.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/rfcs/2020-04-04-2221-kubernetes-integration.md b/rfcs/2020-04-04-2221-kubernetes-integration.md index d9fe54c90fe17..badf66d32526d 100644 --- a/rfcs/2020-04-04-2221-kubernetes-integration.md +++ b/rfcs/2020-04-04-2221-kubernetes-integration.md @@ -762,7 +762,7 @@ The `vector.dev/exclude: "true"` `annotation` at the `PodTemplateSpec` is intended to let Vector know that it shouldn't collect logs from the relevant `Pod`s. -Upon picking us a new log file for processing, Vector is intended to read the +Upon picking up a new log file for processing, Vector is intended to read the `Pod` object, see the `vector.dev/exclude: "true"` annotation and ignore the log file altogether. This should save take much less resources compared to reading logs files into events and then filtering them out. From 832ce06c31e255b2215a1bfc257342cd2e022164 Mon Sep 17 00:00:00 2001 From: MOZGIII Date: Fri, 24 Apr 2020 21:58:28 +0300 Subject: [PATCH 109/118] Generate TOC Signed-off-by: MOZGIII --- .../2020-04-04-2221-kubernetes-integration.md | 83 +++++++++++++++++++ 1 file changed, 83 insertions(+) diff --git a/rfcs/2020-04-04-2221-kubernetes-integration.md b/rfcs/2020-04-04-2221-kubernetes-integration.md index badf66d32526d..7166879e0e4f5 100644 --- a/rfcs/2020-04-04-2221-kubernetes-integration.md +++ b/rfcs/2020-04-04-2221-kubernetes-integration.md @@ -13,6 +13,89 @@ UX. Such as how to properly deploy Vector and exclude it's own logs ([pr#2188]). We had planned to perform a 3rd party audit on the integration before announcement and we've decided to align this RFC with that process.** +## Table of contents + + + +- [RFC 2221 - 2020-04-04 - Kubernetes Integration](#rfc-2221---2020-04-04---kubernetes-integration) + - [Table of contents](#table-of-contents) + - [Motivation](#motivation) + - [Guide-level Proposal](#guide-level-proposal) + - [Strategy](#strategy) + - [How This Guide Works](#how-this-guide-works) + - [What We'll Accomplish](#what-well-accomplish) + - [Tutorial](#tutorial) + - [Deploy using `kubectl`](#deploy-using-kubectl) + - [Deploy using Helm](#deploy-using-helm) + - [Deploy using Kustomize](#deploy-using-kustomize) + - [Design considerations](#design-considerations) + - [Minimal supported Kubernetes version](#minimal-supported-kubernetes-version) + - [Initial Minimal Supported Kubernetes Version](#initial-minimal-supported-kubernetes-version) + - [Reading container logs](#reading-container-logs) + - [Kubernetes logging architecture](#kubernetes-logging-architecture) + - [File locations](#file-locations) + - [Log file format](#log-file-format) + - [Automatic partial events merging](#automatic-partial-events-merging) + - [Helm vs raw YAML files](#helm-vs-raw-yaml-files) + - [Helm Chart Repository](#helm-chart-repository) + - [Deployment Variants](#deployment-variants) + - [Deployment configuration](#deployment-configuration) + - [Managing Object](#managing-object) + - [Data directory](#data-directory) + - [Vector config files](#vector-config-files) + - [Vector config file reloads](#vector-config-file-reloads) + - [Strategy on YAML file grouping](#strategy-on-yaml-file-grouping) + - [Considered Alternatives](#considered-alternatives) + - [Resource Limits](#resource-limits) + - [Vector Runtime Properties Bulletin](#vector-runtime-properties-bulletin) + - [Security considerations on deployment configuration](#security-considerations-on-deployment-configuration) + - [Other notable [`PodSpec`][k8sapipodspec] properties](#other-notable-podspeck8sapipodspec-properties) + - [Container probes](#container-probes) + - [Annotating events with metadata from Kubernetes](#annotating-events-with-metadata-from-kubernetes) + - [Origin filtering](#origin-filtering) + - [Filtering based on the log file path](#filtering-based-on-the-log-file-path) + - [Filtering based on Kubernetes API metadata](#filtering-based-on-kubernetes-api-metadata) + - [A note on k8s API server availability and `Pod` objects cache](#a-note-on-k8s-api-server-availability-and-pod-objects-cache) + - [Practical example of filtering by annotation](#practical-example-of-filtering-by-annotation) + - [Filtering by namespaces annotations](#filtering-by-namespaces-annotations) + - [Filtering based on event fields after annotation](#filtering-based-on-event-fields-after-annotation) + - [Configuring Vector via Kubernetes API](#configuring-vector-via-kubernetes-api) + - [Annotations and labels on vector pod via downward API](#annotations-and-labels-on-vector-pod-via-downward-api) + - [Custom CRDs](#custom-crds) + - [Changes to Vector release process](#changes-to-vector-release-process) + - [Testing](#testing) + - [Unit tests](#unit-tests) + - [Integration tests](#integration-tests) + - [Test targets](#test-targets) + - [Where to keep and how to manage integration infrastructure config](#where-to-keep-and-how-to-manage-integration-infrastructure-config) + - [What to assert/verify in integration tests](#what-to-assertverify-in-integration-tests) + - [Existing k8s tests](#existing-k8s-tests) + - [Other data gathering](#other-data-gathering) + - [Exposing Kubernetes [`Event`s][k8sapievent] as Vector events](#exposing-kubernetes-eventsk8sapievent-as-vector-events) + - [Discover and gather Prometheus metrics for Kubernetes API resources](#discover-and-gather-prometheus-metrics-for-kubernetes-api-resources) + - [Gather data from the host OS](#gather-data-from-the-host-os) + - [Automatic discovery of things to monitor on the host OS](#automatic-discovery-of-things-to-monitor-on-the-host-os) + - [Kubernetes audit logs](#kubernetes-audit-logs) + - [Windows support](#windows-support) + - [Potential Windows-specific Issues](#potential-windows-specific-issues) + - [Security](#security) + - [Vector Code Audit](#vector-code-audit) + - [Vector Docker Images Audit](#vector-docker-images-audit) + - [Deployment Hardening](#deployment-hardening) + - [Securing secrets](#securing-secrets) + - [Recommend users additional steps to secure the cluster](#recommend-users-additional-steps-to-secure-the-cluster) + - [Automatic container rebuilds](#automatic-container-rebuilds) + - [Prior Art](#prior-art) + - [Sales Pitch](#sales-pitch) + - [Drawbacks](#drawbacks) + - [Alternatives](#alternatives) + - [Outstanding Questions](#outstanding-questions) + - [From Ben](#from-ben) + - [From Mike](#from-mike) + - [Plan Of Attack](#plan-of-attack) + + + ## Motivation Kubernetes is arguably the most popular container orchestration framework at From c5d1b276fb6c73bc24d9de163588e2622cf6a901 Mon Sep 17 00:00:00 2001 From: binarylogic Date: Sat, 25 Apr 2020 18:50:59 -0400 Subject: [PATCH 110/118] Update requirements Signed-off-by: binarylogic --- .../2020-04-04-2221-kubernetes-integration.md | 29 ++++++++++--------- 1 file changed, 16 insertions(+), 13 deletions(-) diff --git a/rfcs/2020-04-04-2221-kubernetes-integration.md b/rfcs/2020-04-04-2221-kubernetes-integration.md index 7166879e0e4f5..15c8a97bff172 100644 --- a/rfcs/2020-04-04-2221-kubernetes-integration.md +++ b/rfcs/2020-04-04-2221-kubernetes-integration.md @@ -1573,22 +1573,25 @@ See [motivation](#motivation). ## Plan Of Attack -- [ ] Agree on minimal Kubernetes version. -- [ ] Agree on a list of Kubernetes cluster flavors we want to test against. +- [x] Agree on minimal Kubernetes version. (1.14) +- [x] Agree on a list of Kubernetes cluster flavors we want to test against. - [ ] Setup a proper testing suite for k8s. - - [ ] Support for customizable k8s clusters. See [issue#2170]. - - [ ] Look into [issue#2225] and see if we can include it as part of this - work. - - [ ] Stabilize k8s integration tests. See [issue#2193], [issue#2216], - and [issue#1635]. - - [ ] Ensure we are testing all supported minor versions. See - [issue#2223]. -- [ ] Rework the `kubernetes` source. - - [ ] Merge the `kubernetes` source with the `kubernetes_pod_matadata` - transform. + - [ ] Local testing via `make test-integration-kubernetes`. + - [ ] Ability to "bring your own cluster". See [issue#2170]. + - [ ] Add `make test-integration-kubernetes` to the `ci.yaml` workflow. + - [ ] Ensure these tests are stable. See [issue#2193], [issue#2216], + and [issue#1635]. + - [ ] Ensure we are testing all supported minor versions. See + [issue#2223]. + - [ ] Run `make test-integration-kubernetes` against AWS' EKS platform in + Vector's Github actions. +- [ ] Finalize the `kubernetes` source. + - [ ] Audit the code and ensure the base is high-quality and correct. + - [ ] Merge in the `kubernetes_pod_matadata` transform. - [ ] Implement origin filtering. - [ ] Merge split logs [pr#2134]. - - [ ] Use the `log_schema.kubernetes_key` setting. See [issue#1867]. + - [ ] Use the `log_schema.kubernetes_key` setting for context fields. + See [issue#1867]. - [ ] Add a way to load optional config files (i.e. load config file if it exists, and ignore it if it doesn't). Required to elegantly load multiple files so that we can split the configuration. From 4461755573c71748e9d3522c7f91ee4b128479ca Mon Sep 17 00:00:00 2001 From: MOZGIII Date: Mon, 27 Apr 2020 16:46:13 +0300 Subject: [PATCH 111/118] Fix a typo Signed-off-by: MOZGIII --- rfcs/2020-04-04-2221-kubernetes-integration.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/rfcs/2020-04-04-2221-kubernetes-integration.md b/rfcs/2020-04-04-2221-kubernetes-integration.md index 15c8a97bff172..c4b81ac01907e 100644 --- a/rfcs/2020-04-04-2221-kubernetes-integration.md +++ b/rfcs/2020-04-04-2221-kubernetes-integration.md @@ -551,7 +551,7 @@ We can offer some simple "typical custom configurations" at our documentation as an example: - with a sink to push data to our Alloy; -- with a cluster-agnosic `elasticsearch` sink; +- with a cluster-agnostic `elasticsearch` sink; - for AWS clusters, with a `cloudwatch` sink; - etc... From 300106dce9260383b29b7e456724f0ad0800e605 Mon Sep 17 00:00:00 2001 From: MOZGIII Date: Mon, 27 Apr 2020 20:47:36 +0300 Subject: [PATCH 112/118] Fix the unfinished phrase Signed-off-by: MOZGIII --- rfcs/2020-04-04-2221-kubernetes-integration.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/rfcs/2020-04-04-2221-kubernetes-integration.md b/rfcs/2020-04-04-2221-kubernetes-integration.md index c4b81ac01907e..215d7715ab924 100644 --- a/rfcs/2020-04-04-2221-kubernetes-integration.md +++ b/rfcs/2020-04-04-2221-kubernetes-integration.md @@ -316,7 +316,7 @@ Log file format can vary per container runtime, and we have to support all the formats that Kubernetes itself supports. Generally, most Kubernetes setups will put the logs at the `kubelet`-configured -locations in a . +locations in a `/var/log` directory on the host. There is [official documentation][k8s_log_path_location_docs] at Kubernetes project regarding logging. I had a misconception that it specifies reading these From e9b024eea1d0ad9304f9ad5edb0620243672c80a Mon Sep 17 00:00:00 2001 From: MOZGIII Date: Wed, 29 Apr 2020 19:40:25 +0300 Subject: [PATCH 113/118] Add a remark on kubernetes_pod_metadata being useful for sidecars Signed-off-by: MOZGIII --- rfcs/2020-04-04-2221-kubernetes-integration.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/rfcs/2020-04-04-2221-kubernetes-integration.md b/rfcs/2020-04-04-2221-kubernetes-integration.md index 215d7715ab924..994078329f22f 100644 --- a/rfcs/2020-04-04-2221-kubernetes-integration.md +++ b/rfcs/2020-04-04-2221-kubernetes-integration.md @@ -683,6 +683,9 @@ perform log filtering. So, if we'll be obtaining pod metadata at the `kubernetes` source, we might as well enhance the event right there. This would render `kubernetes_pod_metadata` useless, as there would be no use case for it that wouldn't be covered by `kubernetes` source. +Of course, `kubernetes_pod_metadata` would still make sense if used not in +conjunction with `kubernetes` source - which is the case, for instance, in a +sidecar deployment - where `file` source is used directly with in-pod logs file. What parts of metadata we inject into events should be configurable, but we can and want to offer sane defaults here. From a6f76222766f893f23028e05f4d9049be6dcad7c Mon Sep 17 00:00:00 2001 From: MOZGIII Date: Wed, 29 Apr 2020 19:56:26 +0300 Subject: [PATCH 114/118] Switch vector config at the guide to Secret and manage it via kubectl Signed-off-by: MOZGIII --- .../2020-04-04-2221-kubernetes-integration.md | 35 ++++++++----------- 1 file changed, 14 insertions(+), 21 deletions(-) diff --git a/rfcs/2020-04-04-2221-kubernetes-integration.md b/rfcs/2020-04-04-2221-kubernetes-integration.md index 994078329f22f..611b75af9ab17 100644 --- a/rfcs/2020-04-04-2221-kubernetes-integration.md +++ b/rfcs/2020-04-04-2221-kubernetes-integration.md @@ -156,28 +156,21 @@ The following diagram demonstrates how this works: ...insert selector to select any of Vector's sinks... - ```bash - cat <<-CONFIG > vector-configmap.yaml - apiVersion: v1 - kind: ConfigMap - metadata: - name: vector-config - labels: - k8s-app: vector - data: - vector.toml: | - # Docs: https://vector.dev/docs/ - # Container logs are available from "kubernetes" input. - - # Send data to one or more sinks! - [sinks.aws_s3] - type = "aws_s3" - inputs = ["kubernetes"] - bucket = "my-bucket" - compression = "gzip" - region = "us-east-1" - key_prefix = "date=%F/" + ```shell + cat <<-CONFIG > vector.toml + # Docs: https://vector.dev/docs/ + # Container logs are available from "kubernetes" input. + + # Send data to one or more sinks! + [sinks.aws_s3] + type = "aws_s3" + inputs = ["kubernetes"] + bucket = "my-bucket" + compression = "gzip" + region = "us-east-1" + key_prefix = "date=%F/" CONFIG + kubectl create secret generic vector-config --from-file=vector.toml=vector.toml ``` 2. Deploy Vector! From 172105c5100cfb40a7bdbaacf73a1f7ffe6c4b71 Mon Sep 17 00:00:00 2001 From: MOZGIII Date: Wed, 29 Apr 2020 21:20:28 +0300 Subject: [PATCH 115/118] Remove a step to add optional configs loading since we have globs I.e. --config *.toml Signed-off-by: MOZGIII --- rfcs/2020-04-04-2221-kubernetes-integration.md | 3 --- 1 file changed, 3 deletions(-) diff --git a/rfcs/2020-04-04-2221-kubernetes-integration.md b/rfcs/2020-04-04-2221-kubernetes-integration.md index 611b75af9ab17..61f2285a13450 100644 --- a/rfcs/2020-04-04-2221-kubernetes-integration.md +++ b/rfcs/2020-04-04-2221-kubernetes-integration.md @@ -1588,9 +1588,6 @@ See [motivation](#motivation). - [ ] Merge split logs [pr#2134]. - [ ] Use the `log_schema.kubernetes_key` setting for context fields. See [issue#1867]. -- [ ] Add a way to load optional config files (i.e. load config file if it - exists, and ignore it if it doesn't). Required to elegantly load multiple - files so that we can split the configuration. - [ ] Add `kubernetes` source reference documentation. - [ ] Prepare Heml Chart. - [ ] Prepare YAML deployment config. From 82378ac1c2811762ba67f03cf4fed4dddf200171 Mon Sep 17 00:00:00 2001 From: MOZGIII Date: Wed, 29 Apr 2020 21:21:51 +0300 Subject: [PATCH 116/118] Remove checkmarks from the plan of attack as it's not a good fit for an RFC Signed-off-by: MOZGIII --- .../2020-04-04-2221-kubernetes-integration.md | 52 +++++++++---------- 1 file changed, 26 insertions(+), 26 deletions(-) diff --git a/rfcs/2020-04-04-2221-kubernetes-integration.md b/rfcs/2020-04-04-2221-kubernetes-integration.md index 61f2285a13450..6bca879f77add 100644 --- a/rfcs/2020-04-04-2221-kubernetes-integration.md +++ b/rfcs/2020-04-04-2221-kubernetes-integration.md @@ -1569,32 +1569,32 @@ See [motivation](#motivation). ## Plan Of Attack -- [x] Agree on minimal Kubernetes version. (1.14) -- [x] Agree on a list of Kubernetes cluster flavors we want to test against. -- [ ] Setup a proper testing suite for k8s. - - [ ] Local testing via `make test-integration-kubernetes`. - - [ ] Ability to "bring your own cluster". See [issue#2170]. - - [ ] Add `make test-integration-kubernetes` to the `ci.yaml` workflow. - - [ ] Ensure these tests are stable. See [issue#2193], [issue#2216], - and [issue#1635]. - - [ ] Ensure we are testing all supported minor versions. See - [issue#2223]. - - [ ] Run `make test-integration-kubernetes` against AWS' EKS platform in - Vector's Github actions. -- [ ] Finalize the `kubernetes` source. - - [ ] Audit the code and ensure the base is high-quality and correct. - - [ ] Merge in the `kubernetes_pod_matadata` transform. - - [ ] Implement origin filtering. - - [ ] Merge split logs [pr#2134]. - - [ ] Use the `log_schema.kubernetes_key` setting for context fields. - See [issue#1867]. -- [ ] Add `kubernetes` source reference documentation. -- [ ] Prepare Heml Chart. -- [ ] Prepare YAML deployment config. -- [ ] Prepare Heml Chart Repository. -- [ ] Integrate kubernetes configuration snapshotting into the release process. -- [ ] Add Kubernetes setup/integration guide. -- [ ] Release `0.10.0` and announce. +- Agree on minimal Kubernetes version. (1.14) +- Agree on a list of Kubernetes cluster flavors we want to test against. +- Setup a proper testing suite for k8s. + - Local testing via `make test-integration-kubernetes`. + - Ability to "bring your own cluster". See [issue#2170]. + - Add `make test-integration-kubernetes` to the `ci.yaml` workflow. + - Ensure these tests are stable. See [issue#2193], [issue#2216], + and [issue#1635]. + - Ensure we are testing all supported minor versions. See + [issue#2223]. + - Run `make test-integration-kubernetes` against AWS' EKS platform in + Vector's Github actions. +- Finalize the `kubernetes` source. + - Audit the code and ensure the base is high-quality and correct. + - Merge in the `kubernetes_pod_matadata` transform. + - Implement origin filtering. + - Merge split logs [pr#2134]. + - Use the `log_schema.kubernetes_key` setting for context fields. + See [issue#1867]. +- Add `kubernetes` source reference documentation. +- Prepare Heml Chart. +- Prepare YAML deployment config. +- Prepare Heml Chart Repository. +- Integrate kubernetes configuration snapshotting into the release process. +- Add Kubernetes setup/integration guide. +- Release `0.10.0` and announce. [anchor_collecting_kubernetes_events]: #exposing-kubernetes-event-s-k8s-api-event-as-vector-events [anchor_file_locations]: #file-locations From 0f0786ee1dd3a37a6bff03b14ede84b3a998f010 Mon Sep 17 00:00:00 2001 From: MOZGIII Date: Wed, 29 Apr 2020 21:23:26 +0300 Subject: [PATCH 117/118] Remove recursive steps from the plan of attack The should've been at the outstanding questions in the first place Signed-off-by: MOZGIII --- rfcs/2020-04-04-2221-kubernetes-integration.md | 2 -- 1 file changed, 2 deletions(-) diff --git a/rfcs/2020-04-04-2221-kubernetes-integration.md b/rfcs/2020-04-04-2221-kubernetes-integration.md index 6bca879f77add..59d622dbbbc65 100644 --- a/rfcs/2020-04-04-2221-kubernetes-integration.md +++ b/rfcs/2020-04-04-2221-kubernetes-integration.md @@ -1569,8 +1569,6 @@ See [motivation](#motivation). ## Plan Of Attack -- Agree on minimal Kubernetes version. (1.14) -- Agree on a list of Kubernetes cluster flavors we want to test against. - Setup a proper testing suite for k8s. - Local testing via `make test-integration-kubernetes`. - Ability to "bring your own cluster". See [issue#2170]. From 2a7978068fa215e55861e60d47e6c3c498d05a7e Mon Sep 17 00:00:00 2001 From: MOZGIII Date: Wed, 29 Apr 2020 21:37:05 +0300 Subject: [PATCH 118/118] Add more steps to the attack plan Signed-off-by: MOZGIII --- rfcs/2020-04-04-2221-kubernetes-integration.md | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/rfcs/2020-04-04-2221-kubernetes-integration.md b/rfcs/2020-04-04-2221-kubernetes-integration.md index 59d622dbbbc65..ce7c844dbeb63 100644 --- a/rfcs/2020-04-04-2221-kubernetes-integration.md +++ b/rfcs/2020-04-04-2221-kubernetes-integration.md @@ -1593,6 +1593,18 @@ See [motivation](#motivation). - Integrate kubernetes configuration snapshotting into the release process. - Add Kubernetes setup/integration guide. - Release `0.10.0` and announce. +- Prepare additional guides and blog posts. + - Vector deployment for Kubernetes Cluster Operators. + - Vector deployment as a sidecar. +- Revisit this RFC - see what we can focus on next. +- Start the RFC of the Vector performance properies bulletin. + To include things like: + - Establish continius data gathering of performance characteristics of + the bare Vector event pipeline (i.e. raw speed) and the impact of adding + each of it's components - sources, transforms, sinks - and their + combinations. + - Prepare the format (and, if possible, automate the release of) Vector + performance bulletin. [anchor_collecting_kubernetes_events]: #exposing-kubernetes-event-s-k8s-api-event-as-vector-events [anchor_file_locations]: #file-locations