Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(kubernetes source)!: Use Pod's UID to exclude Vector's logs #2188

Closed
wants to merge 2 commits into from
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,7 @@ jobs:
runs-on: ubuntu-latest
strategy:
matrix:
kube: [1.14.10, 1.17.2]
kube: [1.13.12, 1.14.10, 1.17.2]
steps:
- name: Load vector
uses: actions/download-artifact@v1
Expand Down
5 changes: 5 additions & 0 deletions config/kubernetes/vector-daemonset.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -84,3 +84,8 @@ spec:
- name: config-dir
mountPath: /etc/vector
readOnly: true
env:
- name: VECTOR_POD_UID
valueFrom:
fieldRef:
fieldPath: metadata.uid
2 changes: 1 addition & 1 deletion docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -459,7 +459,7 @@ services:
environment:
CARGO_TERM_COLOR: always
RUST_BACKTRACE: full
# TEST_LOG: debug
TEST_LOG: debug
volumes:
- $PWD:$PWD
- ./target/x86_64-unknown-linux-musl/cargo/registry:/opt/rust/cargo/registry
Expand Down
11 changes: 6 additions & 5 deletions src/sources/kubernetes/file_source_builder.rs
Original file line number Diff line number Diff line change
Expand Up @@ -28,14 +28,15 @@ impl<'a> FileSourceBuilder<'a> {
kube_name: &str,
globals: &GlobalOptions,
shutdown: ShutdownSignal,
vector_pod_uid: &str,
) -> crate::Result<(mpsc::Receiver<Event>, Source)> {
self.file_config.include.extend(
Self::file_source_include(self.config)?
.into_iter()
.map(Into::into),
);
self.file_config.exclude.extend(
Self::file_source_exclude(self.config)
Self::file_source_exclude(self.config, vector_pod_uid)
.into_iter()
.map(Into::into),
);
Expand Down Expand Up @@ -251,7 +252,7 @@ impl<'a> FileSourceBuilder<'a> {
/// with include, so exclude isn't necessary.
/// b) if user has included "kube-system" or "vector*", then that is a sign that user wants
/// to log it so excluding it is not valid.
fn file_source_exclude(config: &KubernetesConfig) -> Vec<String> {
fn file_source_exclude(config: &KubernetesConfig, vector_pod_uid: &str) -> Vec<String> {
// True if there is no includes
let no_include = config.include_container_names.is_empty()
&& config.include_namespaces.is_empty()
Expand All @@ -266,11 +267,11 @@ impl<'a> FileSourceBuilder<'a> {
// This is correct, but on best effort basis filtering out of logs from kuberentes system components.
// More specificly, it will work for all Kubernetes 1.14 and higher, and for some bellow that.
exclude.push((LOG_DIRECTORY.to_owned() + r"kube-system_*").into());

// NOTE: for now exclude images with name vector, it's a rough solution, but necessary for now
exclude.push((LOG_DIRECTORY.to_owned() + r"*/vector*").into());
}

// Always exclude vector
exclude.push((LOG_DIRECTORY.to_owned() + &format!("*{}/*", vector_pod_uid)).into());

exclude
}
}
Expand Down
22 changes: 20 additions & 2 deletions src/sources/kubernetes/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ use chrono::{DateTime, Utc};
use futures01::{sync::mpsc, Future, Sink, Stream};
use serde::{Deserialize, Serialize};
use snafu::Snafu;
use std::env::{self, VarError};

// ?NOTE
// Original proposal: https://github.com/kubernetes/kubernetes/blob/release-1.5/docs/proposals/kubelet-cri-logging.md#proposed-solution
Expand All @@ -31,12 +32,21 @@ use snafu::Snafu;
/// Location in which by Kubernetes CRI, container runtimes are to store logs.
const LOG_DIRECTORY: &str = r"/var/log/pods/";

/// Enviorment variable through which we are receiving uid of this vector's pod.
const VECTOR_POD_UID_ENV: &str = "VECTOR_POD_UID";

#[derive(Debug, Snafu)]
enum BuildError {
#[snafu(display("To large UID: {:?}", uid))]
UidToLarge { uid: String },
#[snafu(display("UID contains illegal characters: {:?}", uid))]
IllegalCharacterInUid { uid: String },
#[snafu(display(
"Enviorment variable {}, that must be defined with this Vector's Pod's UID, is {:?}",
env,
error
))]
PodUid { env: &'static str, error: VarError },
}

#[derive(Deserialize, Serialize, Debug, Clone, Default)]
Expand Down Expand Up @@ -65,8 +75,16 @@ impl SourceConfig for KubernetesConfig {

let now = TimeFilter::new();

let (file_recv, file_source) =
file_source_builder::FileSourceBuilder::new(self).build(name, globals, shutdown)?;
let vector_pod_uid = env::var(VECTOR_POD_UID_ENV).map_err(|error| BuildError::PodUid {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know I asked this before but is this env var always available? If it is lets make sure we document this very clearly, if not we need to find a way to work around this and that may be just ignoring this and letting vector logs come through.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also curious.

Copy link
Contributor Author

@ktff ktff Apr 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is present if .yaml is configured with

       env:
        - name: VECTOR_POD_UID
          valueFrom:
            fieldRef:
              fieldPath: metadata.uid

, and if it isn't configured like that, I wouldn't consider it a proper configuration. In the same way it isn't a proper configuration to run vector without mapping the folders necessary for it's operation.

Passing the logs would make it easy for us brick(EDIT: to strong of a word, the Github runners became unresponsive so they were stopped automatically after some time) the node, as it happened with Github's nodes, if we accidentally, or forget, add a info log somewhere that depends on the number of logs passing. And even worse if the size of the info depends on the size of the log.

For other ways, we filtered out logs for containers which had names starting with vector, which could be suprising for users that name their containers that way, but we could name vector container in some unique way, but then we are back at the beginning where we need the .yaml to be configured in a proper way.

There is also another way, that just poped into my head, we could with intention log a message with unique string of characters and once we detect that string as a message in the log we would know which UID is ours and could filter them out, although file source would still unecessary be picking them up from the log file.

What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not really happy with this required env var personally. I think its a pretty bad UX hazard.

Do we know what other solutions do here? What does fluentd do?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! I am very much in favor of anything that improves reliability and confidence for v1 of our Kubernetes integration. I do not want to compromise that supporting old versions right now. If requiring >= 1.14 achieves that then let's do it.

AWS EKS still supports v1.13.

That's ok. EKS also offers 1.14 and 1.15.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, simplifying the source would be great. It seems like 1.14 is over a year old and technically not supported anymore (see here and here).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add my 2 cents, we've been using in production for quite some time fluentd and there has been an attempt to use fluent-bit, and now we're replacing fluent-bit with vector. In all our configurations we've been deploying the entire log-related stack in its own namespace. Mostly because with an appropriated Role and PSP it can be easily restricted.
Even with the current (0.8.2) kubernetes source it works perfectly fine (because you can just whitelist any namespace except vector's).
Just saying that it can be a suggested pattern of deployment and in this case the solution can be less invasive. Otherwise filtering using Pod's UID can be opted-in.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Alexx-G we really appreciate your feedback! So if I understand correctly for you all it would make sense to have some way in the vector config to set the namespace you are deploying your log stack in. From this we can ignore all logs from that namespace? This to me seems like the ideal solution.

Copy link
Contributor

@MOZGIII MOZGIII Apr 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using a separate namespace is a particular deployment pattern, and not everyone can use that. So we can't solely rely on that - it would be a very significant limiting factor.

As an alternative to env vars, we could use downward API mounts: https://kubernetes.io/docs/tasks/inject-data-application/downward-api-volume-expose-pod-information/

We can also communicate with the cluster API to determine the current container info: https://kubernetes.io/docs/tasks/administer-cluster/access-cluster-api/#accessing-the-api-from-within-a-pod

Also, a hostname in the container environment is by default set to the pod name.

That said, I like the explicit configuration as this PR proposes. This, being explicit, is much easier to understand and maintain for infra teams. I.e. when they operate on a configuration for vector - it becomes very simple to reason which logs are ignored and which are picked up. And this is, arguably, a better tradeoff when we're talking k8s.
Also, since the industry standard in k8s-compatible apps is to ship the deployment configuration as part of the release, I don't see this configuration detail as particularly problematic. We should provide a helm chart for our stuff anyway, as well as applicable daemonset/sidecar deployment files. This would be a proper implementation of our "good defaults" paradigm in the k8s sense.

env: VECTOR_POD_UID_ENV,
error,
})?;
let (file_recv, file_source) = file_source_builder::FileSourceBuilder::new(self).build(
name,
globals,
shutdown,
&vector_pod_uid,
)?;

let mut transform_file = transform_file()?;
let mut transform_pod_uid = transform_pod_uid()?;
Expand Down
7 changes: 6 additions & 1 deletion src/sources/kubernetes/test.rs
Original file line number Diff line number Diff line change
Expand Up @@ -134,6 +134,11 @@ spec:
readOnly: true
- name: tmp
mountPath: /tmp/vector/
env:
- name: VECTOR_POD_UID
valueFrom:
fieldRef:
fieldPath: metadata.uid
"#;

static ECHO_YAML: &'static str = r#"
Expand Down Expand Up @@ -527,7 +532,7 @@ fn kube_multi_log() {

#[test]
fn kube_object_uid() {
let namespace = "kube-object-uid".to_owned(); //format!("object-uid-{}", Uuid::new_v4());
let namespace = format!("object-uid-{}", Uuid::new_v4());
let message = random_string(300);
let user_namespace = user_namespace(&namespace);

Expand Down