Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add otel-cloud-stack chart #1058

Closed
wants to merge 22 commits into from

Conversation

jaronoff97
Copy link
Contributor

@jaronoff97 jaronoff97 commented Mar 1, 2024

Why make this?

This PR serves to finally close #562. This PR creates a helm chart that takes some inspiration from the kube-prometheus-stack chart from the prometheus community repo to create a fully-fledged collector starter for Kubernetes. Prometheus services and CRDs are optionally included to allow users a near-drop-in replacement for the prometheus experience. For non-prometheus users, this provides an easy way to get started with opentelemetry's Kubernetes offerings. This is accomplished by utilizing some clever templating for the configuration for the collector, which I found to be easier to reason about than embedding config in a helper tpl.

Features

  • Centralized tracing collector
  • Autoinstrumentation
  • Infrastructure metrics and logs collector
  • standalone kubernetes cluster stats collector
  • Allows for drop-in replacement for kube-prometheus-stack with a sharded statefulset collector
  • Allows for custom collectors to be set and validated against an exhaustive schema

Next steps

  • end to end testing to verify telemetry appears as expected between versions
  • documentation for what to expect OOTB
  • documentation for example configs (i include two samples below that are specific for my company)
For a config that gets you tracing, infra logs and metrics, kube cluster stats
clusterName: jacob-kind
opentelemetry-operator:
  enabled: true
collectors:
  tracing:
    env:
      - name: LS_TOKEN
        valueFrom:
          secretKeyRef:
            key: LS_TOKEN
            name: otel-collector-secret
    config:
      exporters:
        otlp:
          endpoint: ingest.lightstep.com:443
          headers:
            "lightstep-access-token": "${LS_TOKEN}"
      service:
        pipelines:
          traces:
            exporters: [otlp]
  daemon:
    env:
      - name: LS_TOKEN
        valueFrom:
          secretKeyRef:
            key: LS_TOKEN
            name: otel-collector-secret
    config:
      exporters:
        otlp:
          endpoint: ingest.lightstep.com:443
          headers:
            "lightstep-access-token": "${LS_TOKEN}"
      service:
        pipelines:
          metrics:
            exporters: [otlp]
          logs:
            exporters: [otlp]
  cluster:
    env:
      - name: LS_TOKEN
        valueFrom:
          secretKeyRef:
            key: LS_TOKEN
            name: otel-collector-secret
    config:
      exporters:
        otlp:
          endpoint: ingest.lightstep.com:443
          headers:
            "lightstep-access-token": "${LS_TOKEN}"
      service:
        pipelines:
          metrics:
            exporters: [otlp]
opAMPBridge:
  enabled: false

For a config that's a near drop in for kube-prometheus-stack
clusterName: jacob-kind
opentelemetry-operator:
  enabled: true
  manager:
    serviceMonitor:
      enabled: true
    extraArgs:
      - "--feature-gates=operator.observability.prometheus"
    image:
      tag: "v0.95.0"
collectors:
  tracing:
    enabled: false
  daemon:
    enabled: false
  cluster:
    enabled: false
  metrics:
    enabled: true
    targetAllocator:
      observability:
        metrics:
          enableMetrics: true
    observability:
      metrics:
        enableMetrics: true
    env:
      - name: LS_TOKEN
        valueFrom:
          secretKeyRef:
            key: LS_TOKEN
            name: otel-collector-secret
    config:
      exporters:
        otlp:
          endpoint: ingest.lightstep.com:443
          headers:
            "lightstep-access-token": "${LS_TOKEN}"
      service:
        pipelines:
          metrics:
            exporters: [otlp]
opAMPBridge:
  enabled: false
prometheus:
  customResources:
    enabled: true
kubernetesServiceMonitors:
  enabled: true
kubeApiServer:
  enabled: true
kubelet:
  enabled: true
kubeControllerManager:
  enabled: true
kubeDns:
  enabled: true
kubeEtcd:
  enabled: true
kubeScheduler:
  enabled: true
kubeProxy:
  enabled: true
kubeStateMetrics:
  enabled: true
nodeExporter:
  enabled: true

@jaronoff97
Copy link
Contributor Author

jaronoff97 commented Mar 1, 2024

okay. so now this thing actually does something!! Next week I have to test out the kube-otel-stack functionality. But you can try for yourself by layering this values file over the existing values. (obviously this works for LS and youll needd to change to your vendor of choice ;))

Here's that config
clusterName: jacob-kind
opentelemetry-operator:
  enabled: true
collectors:
  tracing:
    env:
      - name: LS_TOKEN
        valueFrom:
          secretKeyRef:
            key: LS_TOKEN
            name: otel-collector-secret
    config:
      exporters:
        otlp:
          endpoint: ingest.lightstep.com:443
          headers:
            "lightstep-access-token": "${LS_TOKEN}"
      service:
        pipelines:
          traces:
            exporters: [otlp]
  daemon:
    env:
      - name: LS_TOKEN
        valueFrom:
          secretKeyRef:
            key: LS_TOKEN
            name: otel-collector-secret
    config:
      exporters:
        otlp:
          endpoint: ingest.lightstep.com:443
          headers:
            "lightstep-access-token": "${LS_TOKEN}"
      service:
        pipelines:
          metrics:
            exporters: [otlp]
          logs:
            exporters: [otlp]
  cluster:
    env:
      - name: LS_TOKEN
        valueFrom:
          secretKeyRef:
            key: LS_TOKEN
            name: otel-collector-secret
    config:
      exporters:
        otlp:
          endpoint: ingest.lightstep.com:443
          headers:
            "lightstep-access-token": "${LS_TOKEN}"
      service:
        pipelines:
          metrics:
            exporters: [otlp]
opAMPBridge:
  enabled: false

gen_schema/main.go Outdated Show resolved Hide resolved
@jaronoff97 jaronoff97 marked this pull request as ready for review March 4, 2024 22:49
@jaronoff97 jaronoff97 requested a review from a team March 4, 2024 22:49
@@ -0,0 +1,3 @@
apiVersion: v2
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this matches what's done in kube-prometheus-stack such that we can install the necessary prometheus CRDs in the right order. the installation of this is entirely optional and disabled by default.

@bismarck
Copy link

bismarck commented Mar 5, 2024

@jaronoff97 it would be nice to have some self-monitoring of the operator and collectors? Not sure what form that would take (e.g. ServiceMonitor or static scrape job).

@jaronoff97
Copy link
Contributor Author

this is actually already possible and you can take a look at that in my sample configs in the For a config that's a near drop in for kube-prometheus-stack section :)

@bismarck
Copy link

bismarck commented Mar 6, 2024

@jaronoff97 I see you can bring your own ServiceAccount for your collector and target allocator. However, the ClusterRoleBinding is using the ServiceAccount that the operator is generating, not your override.

@jaronoff97
Copy link
Contributor Author

ah great catch! thank you.

@jaronoff97 jaronoff97 requested a review from Allex1 March 11, 2024 15:19
Copy link
Member

@TylerHelmuth TylerHelmuth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jaronoff97 thanks for following through with this, I am very excited about it.

This is a large PR adding a lot of features. Is there anything we can do to make it smaller? My initial thought is we could start with it only installing the operator/collector and then in followup PRs introduce the target allocator and all the prom stuff.

Please add:

  1. a README for the chart
  2. a ci folder with an empty values.yaml so we can test the default install
  3. please update the github workflow CI to test the chart.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing you used this to generate the schema file, do we need to check it in?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nope, i was thinking it may be helpful for future schema generation! but am happy to leave it out :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that it could be helpful in the future for all the charts. Let's add that in a separate pr.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good!

@@ -0,0 +1,29 @@
apiVersion: v2
name: otel-cloud-stack
version: 0.1.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather start with 0.0.1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good!

Comment on lines +10 to +11
- name: jaronoff97
- name: anammedina21
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please list @TylerHelmuth @dmitryax and @Allex1. I'd love if you wanted to help maintain this chart and become an approver after it is merged, we did something similar with @puckpuck and the demo chart.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can do!

version: "0.0.0"
condition: prometheus.customResources.enabled
- name: opentelemetry-operator
repository: "file://../opentelemetry-operator"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use https://open-telemetry.github.io/opentelemetry-helm-charts, thats what we do for the demo.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can do!

- name: jaronoff97
- name: anammedina21
icon: https://raw.githubusercontent.com/cncf/artwork/a718fa97fffec1b9fd14147682e9e3ac0c8817cb/projects/opentelemetry/icon/color/opentelemetry-icon-color.png
appVersion: 0.94.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This appversion corresponds with the collector, but the sources only list the operator.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah i was thinking this would determine the version for everything this chart creates (read: collectors) but i also should pipe this in to the image used for the operator too

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this files do? It isn't part of template so it isn't included in any install of the chart. Is it an expected scrape config to use with the chart?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is actually included in the chart via the logic in the collector.yaml. The chart dynamically pulls these values in and appends them (or creates a new) prometheus scrape config in the prometheus receiver

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we move them in a configs folder?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this files do? It isn't part of template so it isn't included in any install of the chart. Is it an expected scrape config to use with the chart?

# Top level field related to the OpenTelemetry Operator
opentelemetry-operator:
# Field indicating whether the operator is enabled or not
enabled: false
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be enabled by default? The goal of the chart is to install an operator/collector all in one go right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep probably! for my testing it was easier to not have it by default initially, but i can change that.


# This is the default configuration for all collectors generated by the chart.
# Any collectors in the `collectors` are overlayed on top of this configuration.
defaultCollectorConfig:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this name is misleading - it implies to me it configures the collector's configuration when it is configuring the OpenTelemetryCollector resource. Could we name it opentelemetry-collector?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see comment below for why this is done, i agree this could probably have a better name. baseCollectorCRConfig?

# mountPath: /etc/config

# Top level field specifying collectors configuration
collectors:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to nest all things related to the collector under 1 root section.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the benefit of doing it this way is that anyone can add any collectors they want created to this key and they will be created as part of the chart. It also allows them to easily overlay values for specific collectors in different environments. i.e. if someone wanted to make a custom collector xcol and have a different sample rate in each env/region combo ([dev,prod] x [us-east-1, us-east-2]) they would be able to have a single xcol definition and only overlay the new sample rate rather than redefine the entire configuration.

@jaronoff97
Copy link
Contributor Author

@TylerHelmuth i think i could split this up in to the following sections:

  • hierarchical things (chart.yaml, unfilled values-yaml, some templates with nothing)
  • embed operator deployment
  • default collector configuration
  • autoinstrumentation
  • opamp bridge
  • prometheus features
  • some sample collector configurations for a true quickstart

do i need to also make a rendered folder to test that?

@TylerHelmuth
Copy link
Member

@jaronoff97 I like that approach.

some sample collector configurations for a true quickstart

We can handle this via an example folder like the other charts.

@jaronoff97
Copy link
Contributor Author

@TylerHelmuth I see, so we would have some files like example/kube-prom-stack which has the values files overrides for doing that? This, rather than baking in these defaults?

@TylerHelmuth
Copy link
Member

I see, so we would have some files like example/kube-prom-stack which has the values files overrides for doing that? This, rather than baking in these defaults?

I am not sure yet, I need to understand the how the default config files are used. The examples folder is strictly for viewing, they shouldn't be used or referenced in any templates

@jaronoff97
Copy link
Contributor Author

i see, in this case the chart is meant to serve as a quickstart with opentelemetry. Hopefully giving users full visibility in to their kube clusters the "otel" way. Or allowing people on prometheus with the kube-prometheus-stack a near exact replica but with the collector

@TylerHelmuth
Copy link
Member

@jaronoff97 yup, I was referring to those yaml files. I may not understand what you meant by rendered folder.

Regardless, let's move forward with incremental PRs.

@jaronoff97
Copy link
Contributor Author

Closing this for now in favor of the approach specified here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Helm chart for Kubernetes metrics quickstart
4 participants