Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alerts duplicate error messages in slack channel #445

Open
dimakyriakov opened this issue Nov 23, 2022 · 8 comments
Open

Alerts duplicate error messages in slack channel #445

dimakyriakov opened this issue Nov 23, 2022 · 8 comments

Comments

@dimakyriakov
Copy link

Problem:
I created an alert to monitor all helmreleases in a specific namespace and it's making huge traffic of errors in a slack channel.
It duplicates errors every certain period and alerting despite no changes to helmreleases.

Here is alert file:

apiVersion: notification.toolkit.fluxcd.io/v1beta1
kind: Alert
metadata:
  name: integration
  namespace: flux-system
spec:
  summary: "integration"
  providerRef:
    name: slack
  eventSeverity: info
  eventSources:
    - kind: HelmRelease
      namespace: integration
      name: '*'

Question:
Is it possible to trigger an alert only if we made changes to helmrelease, not the status of an existing one?
Is it possible to not duplicate alert message that we already received after certain period of the time?

@darkowlzz
Copy link
Contributor

darkowlzz commented Nov 23, 2022

Hi, can you share some example error messages to help understand what type of events from helm-controller are causing this?
Notification-controller already has rate limiting to prevent duplicate events for a period of 5 minutes by default. After 5 minutes, you'll receive an alert if the same event is received again. I think that's what's happening in this case.
It may be an issue in the helm-controller which is sending such error events, which may need attention. Maybe some change in helm-controller or HelmRelease would help suppress or fix the errors.
If these errors aren't actionable, you can ignore them in notification-controller Alerts by defining an ExclusionList, see https://fluxcd.io/flux/components/notification/alert/#specification .

@dimakyriakov
Copy link
Author

image
some of our helmreleases has error "reconciliation failed: install retries exhausted" and mostly we are ok with it
it would be nice to only get this error once when it appears

@stefanprodan
Copy link
Member

stefanprodan commented Nov 23, 2022

@dimakyriakov by design, error alerts are sent every 5 minutes until they are resolved. You can increase the interval with --rate-limit-interval, flags docs here https://fluxcd.io/flux/components/notification/options/

@dimakyriakov
Copy link
Author

thank you for response Guys, i will close the ticket

@dimakyriakov
Copy link
Author

@stefanprodan, hey, i just want to ask where exactly I can set --rate-limit-interval option?
I created provider and alert in yaml file. For me it looks like it's options for cli.

@stefanprodan
Copy link
Member

That’s a controller flag, see here how to change them https://fluxcd.io/flux/cheatsheets/bootstrap/

@dimakyriakov
Copy link
Author

This is my kustomization.yaml file. You mean I have to increase --rate-limit-interval for name: notification-controller?

apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- gotk-components.yaml
- gotk-sync.yaml
patchesStrategicMerge:  # these are tuned for demonstration and debugging
- |-
  apiVersion: kustomize.toolkit.fluxcd.io/v1beta2
  kind: Kustomization
  metadata:
    name: flux-system
    namespace: flux-system
  spec:
    patches:
    - target:
        version: v1
        group: apps
        kind: Deployment
        name: notification-controller
        namespace: flux-system
      patch: |-
        - op: add
          path: /spec/template/spec/containers/0/args/-
          value: --rate-limit-interval=10s  # do not discard messages that are sent again after 10s+
    - target:
        version: v1
        group: apps
        kind: Deployment
        name: kustomize-controller
        namespace: flux-system
      patch: |-
        - op: add
          path: /spec/template/spec/containers/0/args/0
          value: --concurrent=5             # increase the number of Kustomizations processed at once
        - op: replace
          path: /spec/template/spec/containers/0/resources/limits/cpu
          value: "2"                        # allow KC access to more CPU
        - op: replace
          path: /spec/template/spec/containers/0/resources/limits/memory
          value: "2Gi"                      # allow KC access to more memory
    - target:
        version: v1
        group: apps
        kind: Deployment
        name: source-controller
        namespace: flux-system
      patch: |-
        - op: replace
          path: /spec/template/spec/containers/0/resources/limits/cpu
          value: "2"                        # allow KC access to more CPU
        - op: replace
          path: /spec/template/spec/containers/0/resources/limits/memory
          value: "2Gi"                      # allow KC access to more memory
    - target:
        version: v1
        group: apps
        kind: Deployment
        name: helm-controller
        namespace: flux-system
      patch: |-
        - op: add
          path: /spec/template/spec/containers/0/args/0
          value: --concurrent=12             # increase the number of HelmReleases processed at once
        - op: replace
          path: /spec/template/spec/containers/0/resources/limits/cpu
          value: "2"                        # allow KC access to more CPU
        - op: replace
          path: /spec/template/spec/containers/0/resources/limits/memory
          value: "2Gi"                      # allow KC access to more memory

@stefanprodan
Copy link
Member

--rate-limit-interval=10s  # do not discard messages that are sent again after 10s+

No wander you get alert spam, the default is 5m, you can increase it to a value to fits for you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants