-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Modifications to http output plugin #40
Conversation
Change max_retries=0 t o mean NEVER retry on non-success HTTP response code.
} | ||
|
||
// EXTR - Only retry if specific retry list is provided HTTP response code is present in list | ||
if len(h.RetryableStatusCodes) > 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is backward compatibility handled ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We release telegraf with NOS build, so older NOS builds will use older telegraf binary which retains older behaviour.
` | ||
|
||
const ( | ||
defaultContentType = "text/plain; charset=utf-8" | ||
defaultMethod = http.MethodPost | ||
defaultUseBatchFormat = true | ||
defaultMaxRetries = 0 | ||
defaultMaxRetries = -1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is backward compatibility handled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We release telegraf with NOS build, so older NOS builds will use older telegraf binary which retains older behaviour.
h.Log.Errorf("%s Received status %v not found in retryable code list. Metrics are dropped", h.URL, resp.StatusCode) | ||
|
||
if h.MaxRetries > 0 { | ||
atomic.StoreInt32(&h.FailCount, 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be good to count the error codes in this failcount category for post-mortem troubleshooting? It would be good to know how many of each status code we are getting if that is not already counted elsewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds like a good idea. Given this is going in 33.2, I'll looking into adding failed response code counters in the next iteration .
` | ||
|
||
const ( | ||
defaultContentType = "text/plain; charset=utf-8" | ||
defaultMethod = http.MethodPost | ||
defaultUseBatchFormat = true | ||
defaultMaxRetries = 0 | ||
defaultMaxRetries = -1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume that defaultMaxRetries is NEVER used? Seems like we wouldn't ever want "retry forever" behavior based on feedback thus far. Seems like defaultMaxRetries=0 is better to safe guard
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
defaultMaxRetries = -1 is equivalent to not setting max_retries field in the telegraf config, which would match the original telegraf behaviour. Recall, original telegraf did not have a max_retries config option.
Telegraf will continue to attempt to send the batch until it receives HTTP success. If the buffer fills up because it keeps getting failed HTTP responses, it throws away the older metrics and adds the newer ones to the end of the buffer. When that happens, the next "batch" telegraf attempts to send will have dropped the oldest metrics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes look good - a couple of minor comments for troubleshooting.
Change max_retries=0 t o mean NEVER retry on non-success HTTP response code.
Add HTTP retry list to allow retrying ONLY on specific HTTP response codes.
Summary
Prevent sending certain metrics if server sends back an error. These are typically for recurring stats that would get resent after some period anyway.
Allow for dropping buffer on specific HTTP response codes.
Checklist