Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support multiple deployments of the Helm chart to the same K8s cluster #572

Open
matthewmodestino opened this issue Oct 28, 2022 · 4 comments
Labels
enhancement New feature or request Stale

Comments

@matthewmodestino
Copy link

matthewmodestino commented Oct 28, 2022

Hey Team!

I have a client that has a use case where deploying the helm chart multiple times to the same cluster is required to service different business units.

This is to accommodate sending logs to different Splunk instances, as well as to configure the filelog receiver in the charts to focus on only specific namespaces.

While a single OTel agent likely could be customized with multiple pipelines, using one collector for all of their use cases risks a single point of failure. It also requires complicated customization of our current chart, which I am not sure we even expose, tho I assume we can use the agent.config section.

Today, you cannot successfully deploy twice due to a few conflicts, mainly:

  • health_check extension defaults to 13133.
  • zpages extention uses default port 55679,
  • telemetry port 8889
  • fluentd/OTLP ports - in pure logging scenarios with the OTel agent and filelog, these ports are not needed, so I disabled them.
  • config server port 55554 - this conflict doesnt stop the deployment, but should be resolved.
  • file_storage extention - I didnt need to change this to get OTel to run, but we should double check we dont step on the checkpoints, etc in the /var/addon/splunk/otel_pos files from each instance of the collector.

After customizing the ports on the conflicting items, I was able to successfully deploy 2 instances of the OTel Helm chart configured with logging enabled and metrics disabled. I don't expect the need for multiple instances for metrics to be as common, but ensuring all conflicts have a way for the user to adjust them would be nice.

Whether we document how to customize the helm chart to accomplish this, or ensure any conflicting settings are exposed in the values.yaml, I am looking for guidance on how this can best be approached for users looking to run multiple deploys in the same cluster.

Thanks team!

@matthewmodestino matthewmodestino changed the title Support multiple deployments of the Helm chart to the same K8s cluster by making all colliding configs/ports user customizable in values.yaml Support multiple deployments of the Helm chart to the same K8s cluster Oct 28, 2022
@atoulme atoulme added the enhancement New feature or request label Nov 11, 2022
@matthewmodestino
Copy link
Author

matthewmodestino commented Nov 16, 2022

I started inserting these customizations into my values.yaml so figured I'd capture progress in this thread.

I have hit hardcoded ports in the daemonset when updating healthcheck extension that makes exposing these in the helm chart necessary vs just agent level configs.

  # mattymo splunk custom
agent:  
  config: 
    extensions:
      health_check:
        endpoint: "0.0.0.0:13134"
      zpages:
        endpoint: "0.0.0.0:55680"

Pod comes up and changes are seen in the otel startup logs, but never ready because the port has been changed in agent but not in daemonset config:

kubectl -n otel logs -f otel-1-splunk-otel-collector-agent-7v4gq

2022-11-16T14:19:04.449Z	info	[email protected]/healthcheckextension.go:44	Starting health_check extension	{"kind": "extension", "name": "health_check", "config": {"Endpoint":"0.0.0.0:13134","TLSSetting":null,"CORS":null,"Auth":null,"MaxRequestBodySize":0,"IncludeMetadata":false,"Path":"/","CheckCollectorPipeline":{"Enabled":false,"Interval":"5m","ExporterFailureThreshold":5}}}
...
...
2022-11-16T14:25:06.479Z	info	zpagesextension/zpagesextension.go:86	Starting zPages extension	{"kind": "extension", "name": "zpages", "config": {"TCPAddr":{"Endpoint":"0.0.0.0:55680"}}}

kubectl -n otel get ds otel-1-splunk-otel-collector-agent -o yaml

...
        livenessProbe:
          failureThreshold: 3
          httpGet:
            path: /
            port: 13133
            scheme: HTTP
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 1
        name: otel-collector
        readinessProbe:
          failureThreshold: 3
          httpGet:
            path: /
            port: 13133
            scheme: HTTP
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 1
...

The need for exposure in the helm chart also impacts ability to customize otlp receivers (disable or change ports) or to adjust the telemetry port. These are higher level changes that need to accompany the agent customization.

@matthewmodestino
Copy link
Author

matthewmodestino commented Nov 16, 2022

Figured out how to access the pipeline config..forgot the service field before pipelines. Handy to work backward from the configmap that is generated by default!

 # mattymo splunk custom
  config:
    service:
      pipelines:
        logs:
          receivers:
          - filelog
    telemetry:
        metrics:
          address: 0.0.0.0:8890
    extensions:
      health_check:
        endpoint: "0.0.0.0:13134"
      zpages:
        endpoint: "0.0.0.0:55680"
        

so this turns off the otlp and fluentdforward receivers and changes the port on telemetry, health check and zpages to try and avoid conflict.

@matthewmodestino
Copy link
Author

Update on this...

Have gotten 2 approaches working:

  1. Deploy a single helm chart and use agent.overrides to add a second pipeline.
  • still needs k8s secrets implemented to avoid hec token in configmap
  • default pipeline uses "exclude annotation logic" new pipeline uses "include"
    -minor pipeline edits to avoid collision of filters/features
  1. Deploy 2 helm charts to the cluster.
  • Requires resolving all port collisions as mentioned, as well as hardcoded health probes and checkpoint file paths.
  • Also must watch for annotation logic collisions.

Will update further as we progress. Option 1 looking real slick tho! We should think about how we want customers to add or customize the overall pipelines as they get more savvy or want to use contrib features.

@github-actions
Copy link
Contributor

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

@github-actions github-actions bot added the Stale label Oct 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request Stale
Projects
None yet
Development

No branches or pull requests

2 participants