-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clean up / document metrics monitor fields #39413
Conversation
This pull request does not have a backport label.
To fixup this pull request, you need to add the backport labels for the needed
|
💔 Tests Failed
Expand to view the summary
Build stats
Test stats 🧪
Test errorsExpand to view the tests failures
|
This pull request is now in conflicts. Could you fix it? 🙏
|
This pull request is now in conflicts. Could you fix it? 🙏
|
Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane) |
// 409 is used to indicate an event with same ID already exists if | ||
// `create` op_type is used. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// 409 is used to indicate an event with same ID already exists if | |
// `create` op_type is used. | |
// 409 is used to indicate an event with same ID already exists if | |
// `create` op_type is used. |
You now also get a 409 if the TSDB dimensions are the same for two events, they form an implicit ID. This is probably the more common reason to hit this now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added
const ( | ||
defaultEventType = "doc" | ||
) | ||
|
||
var bulkRequestParams = map[string]string{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we get some comments explaining this ES param? It's a bit odd without any context
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
// which are also reported in events.dropped. | ||
queueACKed: monitoring.NewUint(reg, "queue.acked"), | ||
|
||
// (Gauge) queue.filled.pct.events measures the fraction (from 0 to 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we drop the .events
from this, ahead of adding support for the queue size in bytes?
People will want to alert on this, they shouldn't have to define it twice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Already part of #39774
Document the semantics of many metrics monitoring variables, and rename some metrics APIs to more clearly indicate their function. As a side effect, fix several metrics reporting / publishing bugs in the Elasticsearch output, including #39146.
Add the
output.events.dead_letter
metric to distinguish events that were ingested to the dead letter index after a fatal error (previously these events were just reported as "acked").This could have been a shorter fix, but it was hard to properly test since the metrics were changed from two separate functions with a lot of special cases. I ended up reorganizing the Elasticsearch
Publish
helpers to make the logic more clear. The new layout makes it much easier to test the error handling and metrics reporting.The bugs fixed by this refactor are:
RetryableErrors
. This causedactiveEvents
to increase permanently even after the events were handled.Acked
(success). The new logic creates a newdead_letter
metric specifically for this case.Checklist
I have made corresponding changes to the documentationI have made corresponding change to the default configuration filesI have added an entry inCHANGELOG.next.asciidoc
orCHANGELOG-developer.next.asciidoc
.Related issues