Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Target allocator #261

Open
wants to merge 20 commits into
base: main
Choose a base branch
from
Open

Target allocator #261

wants to merge 20 commits into from

Conversation

okankoAMZ
Copy link
Contributor

@okankoAMZ okankoAMZ commented Oct 25, 2024

Enhanced CloudWatch Agent Operator for Kubernetes

Summary

This PR introduces significant improvements to the CloudWatch Agent Operator for Kubernetes, addressing limitations in the previous Daemonset implementation and enhancing scalability and efficiency.

Key Changes

  1. Deploying as Statefulset
  2. Integration of Target Allocator component
  3. Dynamic sharding of Prometheus targets

Detailed Description

Background

The Amazon CloudWatch Agent allows customers to collect and publish metrics in Prometheus format across various compute environments, including Kubernetes clusters. The CloudWatch Agent Operator simplifies the onboarding process for Prometheus scraping.

Previous Limitations

  • Metric duplication due to multiple agent instances scraping the same endpoints
  • Lack of horizontal scaling capability

New Features

  1. Statefulset Deployment:

  2. Target Allocator Integration:

    • Watches for Prometheus targets in the cluster
    • Dynamically shards targets across multiple CloudWatch Agent replicas

Benefits:

  • Configurable number of agent replicas
  • Automatic distribution of Prometheus scrape targets
  • Improved efficiency and scalability in metric collection
  • Customizable Prometheus scrape configuration via custom resource

Automatic Updates

The operator automatically applies changes to the scrape configuration, updating both the Target Allocator and CloudWatch Agent instances.

Testing

TBA

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

okankoAMZ and others added 17 commits September 11, 2024 10:45
NodeJS merging-in from main
* Ta https server (#2921)

* Added https server, tests, secret marshalling


---------

Co-authored-by: ItielOlenick <[email protected]>
* Reconciler now removes un-used managed resources for CWA collector
* Adding support for NodeJS auto instrumentation and integ tests (#220)

* Support configurable resources for NodeJS. (#225)

* Supporting JMX annotations (#240)

* Add support for a supplemental YAML configuration for the CloudWatchAgent (#241)

* Changed naming for OTLP container ports from agent JSON (#252)

* Updated Release Notes for 1.8.0 (#251)

* Adjust EKS add-on integration test service count expectations (#256)

* Add integration tests for JMX. (#250)

* Implemented Target Allocator Container (#214)

* Implemented TargetAllocator resource deployments. (#208)

* Update cmd/amazon-cloudwatch-agent-target-allocator/config/config.go

Co-authored-by: Musa <[email protected]>

* Update internal/config/main.go

Co-authored-by: Musa <[email protected]>

---------

Co-authored-by: Parampreet Singh <[email protected]>
Co-authored-by: Musa <[email protected]>
Co-authored-by: Mitali Salvi <[email protected]>
Co-authored-by: Jeffrey Chien <[email protected]>
@okankoAMZ okankoAMZ marked this pull request as ready for review November 12, 2024 20:40
@okankoAMZ okankoAMZ force-pushed the target-allocator branch 3 times, most recently from d59fed3 to a1a0f2c Compare November 13, 2024 16:26
lisguo
lisguo previously approved these changes Nov 14, 2024
cmd/amazon-cloudwatch-agent-target-allocator/Dockerfile Outdated Show resolved Hide resolved
versions.txt Outdated Show resolved Hide resolved
main.go Show resolved Hide resolved
@@ -9,6 +9,16 @@ func ConfigMap(otelcol string) string {
return DNSName(Truncate("%s", 63, otelcol))
}

// TAConfigMap returns the name for the config map used in the TargetAllocator.
func TAConfigMap(otelcol string) string {
return DNSName(Truncate("%s-target-allocator", 63, otelcol))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: %s-targetallocator to be consistent with upstream

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we are following the cloudwatch-agent format here but I am open to changing


// PrometheusConfigMap returns the name for the prometheus config map.
func PrometheusConfigMap(otelcol string) string {
return DNSName(Truncate("%s-prometheus-config", 63, otelcol))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: %s-prometheus is probably good enough to be consistent with the other names and its better to keep these short to avoid hitting the max length of 64.

internal/manifests/manifestutils/labels.go Show resolved Hide resolved
)

// Labels return the common labels to all TargetAllocator objects that are part of a managed AmazonCloudWatchAgent.
func Labels(instance v1alpha1.AmazonCloudWatchAgent, name string) map[string]string {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Do we need this?
Cant we just call manifestutils.Labels with a new component name like ComponentAmazonCloudWatchAgentTargetAllocator?

Copy link
Contributor

@musa-asad musa-asad Nov 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need it, we can remove it and use manifestutils.Labels instead if that's preferable.

internal/manifests/targetallocator/volume.go Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants