Support multiple deployments of the Helm chart to the same K8s cluster #572

matthewmodestino · 2022-10-28T20:16:39Z

Hey Team!

I have a client that has a use case where deploying the helm chart multiple times to the same cluster is required to service different business units.

This is to accommodate sending logs to different Splunk instances, as well as to configure the filelog receiver in the charts to focus on only specific namespaces.

While a single OTel agent likely could be customized with multiple pipelines, using one collector for all of their use cases risks a single point of failure. It also requires complicated customization of our current chart, which I am not sure we even expose, tho I assume we can use the agent.config section.

Today, you cannot successfully deploy twice due to a few conflicts, mainly:

health_check extension defaults to 13133.
zpages extention uses default port 55679,
telemetry port 8889
fluentd/OTLP ports - in pure logging scenarios with the OTel agent and filelog, these ports are not needed, so I disabled them.
config server port 55554 - this conflict doesnt stop the deployment, but should be resolved.
file_storage extention - I didnt need to change this to get OTel to run, but we should double check we dont step on the checkpoints, etc in the /var/addon/splunk/otel_pos files from each instance of the collector.

After customizing the ports on the conflicting items, I was able to successfully deploy 2 instances of the OTel Helm chart configured with logging enabled and metrics disabled. I don't expect the need for multiple instances for metrics to be as common, but ensuring all conflicts have a way for the user to adjust them would be nice.

Whether we document how to customize the helm chart to accomplish this, or ensure any conflicting settings are exposed in the values.yaml, I am looking for guidance on how this can best be approached for users looking to run multiple deploys in the same cluster.

Thanks team!

The text was updated successfully, but these errors were encountered:

matthewmodestino · 2022-11-16T14:27:55Z

I started inserting these customizations into my values.yaml so figured I'd capture progress in this thread.

I have hit hardcoded ports in the daemonset when updating healthcheck extension that makes exposing these in the helm chart necessary vs just agent level configs.

  # mattymo splunk custom
agent:  
  config: 
    extensions:
      health_check:
        endpoint: "0.0.0.0:13134"
      zpages:
        endpoint: "0.0.0.0:55680"

Pod comes up and changes are seen in the otel startup logs, but never ready because the port has been changed in agent but not in daemonset config:

kubectl -n otel logs -f otel-1-splunk-otel-collector-agent-7v4gq

2022-11-16T14:19:04.449Z	info	[email protected]/healthcheckextension.go:44	Starting health_check extension	{"kind": "extension", "name": "health_check", "config": {"Endpoint":"0.0.0.0:13134","TLSSetting":null,"CORS":null,"Auth":null,"MaxRequestBodySize":0,"IncludeMetadata":false,"Path":"/","CheckCollectorPipeline":{"Enabled":false,"Interval":"5m","ExporterFailureThreshold":5}}}
...
...
2022-11-16T14:25:06.479Z	info	zpagesextension/zpagesextension.go:86	Starting zPages extension	{"kind": "extension", "name": "zpages", "config": {"TCPAddr":{"Endpoint":"0.0.0.0:55680"}}}

kubectl -n otel get ds otel-1-splunk-otel-collector-agent -o yaml

...
        livenessProbe:
          failureThreshold: 3
          httpGet:
            path: /
            port: 13133
            scheme: HTTP
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 1
        name: otel-collector
        readinessProbe:
          failureThreshold: 3
          httpGet:
            path: /
            port: 13133
            scheme: HTTP
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 1
...

The need for exposure in the helm chart also impacts ability to customize otlp receivers (disable or change ports) or to adjust the telemetry port. These are higher level changes that need to accompany the agent customization.

matthewmodestino · 2022-11-16T17:57:55Z

Figured out how to access the pipeline config..forgot the service field before pipelines. Handy to work backward from the configmap that is generated by default!

 # mattymo splunk custom
  config:
    service:
      pipelines:
        logs:
          receivers:
          - filelog
    telemetry:
        metrics:
          address: 0.0.0.0:8890
    extensions:
      health_check:
        endpoint: "0.0.0.0:13134"
      zpages:
        endpoint: "0.0.0.0:55680"

so this turns off the otlp and fluentdforward receivers and changes the port on telemetry, health check and zpages to try and avoid conflict.

matthewmodestino · 2022-12-13T13:40:48Z

Update on this...

Have gotten 2 approaches working:

Deploy a single helm chart and use agent.overrides to add a second pipeline.

still needs k8s secrets implemented to avoid hec token in configmap
default pipeline uses "exclude annotation logic" new pipeline uses "include"
-minor pipeline edits to avoid collision of filters/features

Deploy 2 helm charts to the cluster.

Requires resolving all port collisions as mentioned, as well as hardcoded health probes and checkpoint file paths.
Also must watch for annotation logic collisions.

Will update further as we progress. Option 1 looking real slick tho! We should think about how we want customers to add or customize the overall pipelines as they get more savvy or want to use contrib features.

github-actions · 2023-10-10T03:56:35Z

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

matthewmodestino changed the title ~~Support multiple deployments of the Helm chart to the same K8s cluster by making all colliding configs/ports user customizable in values.yaml~~ Support multiple deployments of the Helm chart to the same K8s cluster Oct 28, 2022

atoulme added the enhancement New feature or request label Nov 11, 2022

jvoravong mentioned this issue Apr 20, 2023

AKS - splunk-otel-collector-agent failed to start on AKS cluster #746

Closed

mgherman mentioned this issue Jul 4, 2023

Pods entering CrashLoopBackoff due to port conflict #841

Closed

github-actions bot added the Stale label Oct 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support multiple deployments of the Helm chart to the same K8s cluster #572

Support multiple deployments of the Helm chart to the same K8s cluster #572

matthewmodestino commented Oct 28, 2022 •

edited

Loading

matthewmodestino commented Nov 16, 2022 •

edited

Loading

matthewmodestino commented Nov 16, 2022 •

edited

Loading

matthewmodestino commented Dec 13, 2022

github-actions bot commented Oct 10, 2023

Support multiple deployments of the Helm chart to the same K8s cluster #572

Support multiple deployments of the Helm chart to the same K8s cluster #572

Comments

matthewmodestino commented Oct 28, 2022 • edited Loading

matthewmodestino commented Nov 16, 2022 • edited Loading

matthewmodestino commented Nov 16, 2022 • edited Loading

matthewmodestino commented Dec 13, 2022

github-actions bot commented Oct 10, 2023

matthewmodestino commented Oct 28, 2022 •

edited

Loading

matthewmodestino commented Nov 16, 2022 •

edited

Loading

matthewmodestino commented Nov 16, 2022 •

edited

Loading