Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modifications to http output plugin #40

Merged
merged 4 commits into from
Dec 13, 2024
Merged

Conversation

arnelplim
Copy link

Change max_retries=0 t o mean NEVER retry on non-success HTTP response code.
Add HTTP retry list to allow retrying ONLY on specific HTTP response codes.

Summary

Prevent sending certain metrics if server sends back an error. These are typically for recurring stats that would get resent after some period anyway.
Allow for dropping buffer on specific HTTP response codes.

Checklist

  • No AI generated code was used in this PR

Change max_retries=0 t o mean NEVER retry on non-success HTTP response
code.
}

// EXTR - Only retry if specific retry list is provided HTTP response code is present in list
if len(h.RetryableStatusCodes) > 0 {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is backward compatibility handled ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We release telegraf with NOS build, so older NOS builds will use older telegraf binary which retains older behaviour.

`

const (
defaultContentType = "text/plain; charset=utf-8"
defaultMethod = http.MethodPost
defaultUseBatchFormat = true
defaultMaxRetries = 0
defaultMaxRetries = -1

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is backward compatibility handled?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We release telegraf with NOS build, so older NOS builds will use older telegraf binary which retains older behaviour.

h.Log.Errorf("%s Received status %v not found in retryable code list. Metrics are dropped", h.URL, resp.StatusCode)

if h.MaxRetries > 0 {
atomic.StoreInt32(&h.FailCount, 0)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be good to count the error codes in this failcount category for post-mortem troubleshooting? It would be good to know how many of each status code we are getting if that is not already counted elsewhere.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like a good idea. Given this is going in 33.2, I'll looking into adding failed response code counters in the next iteration .

`

const (
defaultContentType = "text/plain; charset=utf-8"
defaultMethod = http.MethodPost
defaultUseBatchFormat = true
defaultMaxRetries = 0
defaultMaxRetries = -1
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume that defaultMaxRetries is NEVER used? Seems like we wouldn't ever want "retry forever" behavior based on feedback thus far. Seems like defaultMaxRetries=0 is better to safe guard

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

defaultMaxRetries = -1 is equivalent to not setting max_retries field in the telegraf config, which would match the original telegraf behaviour. Recall, original telegraf did not have a max_retries config option.

Telegraf will continue to attempt to send the batch until it receives HTTP success. If the buffer fills up because it keeps getting failed HTTP responses, it throws away the older metrics and adds the newer ones to the end of the buffer. When that happens, the next "batch" telegraf attempts to send will have dropped the oldest metrics.

Copy link

@dgrosser dgrosser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good - a couple of minor comments for troubleshooting.

@arnelplim arnelplim requested a review from vullanatt December 12, 2024 01:45
@arnelplim arnelplim merged commit 723569d into extr-1.21 Dec 13, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants