Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[elastic_agent] Add additional output metrics to dashboards #8834

Closed

Conversation

leehinman
Copy link
Contributor

Proposed commit message

Add dashboards for the following metrics to the [Elastic Agent] Agent metrics Dashboard.

  • beat.stats.libbeat.output.events.failed
  • beat.stats.libbeat.output.events.toomany
  • beat.stats.libbeat.output.events.dropped
  • beat.stats.libbeat.output.batches.split

All four of these are useful to see if your agent is having non-fatal errors when using the output which can result in degraded performance.

Checklist

  • I have reviewed tips for building integrations and this pull request is aligned with them.
  • I have verified that all data streams collect metrics or logs.
  • I have added an entry to my package's changelog.yml file.
  • I have verified that Kibana version constraints are current according to guidelines.

Author's Checklist

  • [ ]

How to test this PR locally

  • checkout PR
  • cd packages/elastic_agent
  • elastic-package stack up --version <8.11.2 or greater> -d

View the [Elastic Agent] Agent metrics Dashboard

Related issues

Screenshots

Screenshot 2024-01-05 at 11 43 03

- beat.stats.libbeat.output.events.failed
- beat.stats.libbeat.output.events.toomany
- beat.stats.libbeat.output.events.dropped
- beat.stats.libbeat.output.batches.split

All four of these are useful to see if your agent is having non-fatal
errors when using the output which can result in degraded performance.
@leehinman leehinman force-pushed the 3854_elastic_agent_output_metrics branch from ea690d4 to 9120e2f Compare January 5, 2024 18:09
@leehinman leehinman marked this pull request as ready for review January 5, 2024 18:47
@leehinman leehinman requested a review from a team as a code owner January 5, 2024 18:47
@leehinman leehinman requested a review from P1llus January 5, 2024 18:47
packages/elastic_agent/changelog.yml Outdated Show resolved Hide resolved
@pierrehilbert pierrehilbert added the Team:Elastic-Agent Label for the Agent team label Jan 6, 2024
@elasticmachine
Copy link

Pinging @elastic/elastic-agent (Team:Elastic-Agent)

Copy link
Member

@P1llus P1llus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments even though I approve:

  1. I find it a bit weird they are all 0 if you are getting data in, was it possible to test that the graphs are actually reading the stats correctly?
  2. These are metrics from elastic_agent? Because you only modified elastic_agent_metrics datastream, but there are other metrics data_streams inside the elastic_agent package, like filebeat_metrics etc?
  3. I would add a filter to exclude (empty) from these, there should be an option to exclude null values I believe?

@cmacknz
Copy link
Member

cmacknz commented Jan 12, 2024

These are metrics from elastic_agent? Because you only modified elastic_agent_metrics datastream, but there are other metrics data_streams inside the elastic_agent package, like filebeat_metrics etc?

Thanks for catching this, these changes should be applied to each individual Beat metrics data stream. The agent itself doesn't have output metrics.

Copy link
Member

@cmacknz cmacknz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above, these should be on the Beat metrics data streams.

@nimarezainia
Copy link
Contributor

@leehinman thanks for these dashboards. Is there any chance we could also add the recent output latency metrics (histogram) that were added? (elastic/beats#37258). If not I can create a new issue. thanks

@leehinman
Copy link
Contributor Author

@leehinman thanks for these dashboards. Is there any chance we could also add the recent output latency metrics (histogram) that were added? (elastic/beats#37258). If not I can create a new issue. thanks

yep, I'll add it. When I started the latency histogram wasn't done yet.

@leehinman
Copy link
Contributor Author

A few comments even though I approve:

1. I find it a bit weird they are all 0 if you are getting data in, was it possible to test that the graphs are actually reading the stats correctly?

these are error conditions, dropped, failed, toomany & split. So under normal operation these should be 0. If they go up then back to 0, you have a transient error you should look at, and if they stay up well, then you have a persistent problem to investigate. I'll see if I can induce some errors to get these populated.

2. These are metrics from elastic_agent? Because you only modified elastic_agent_metrics datastream, but there are other metrics data_streams inside the elastic_agent package, like filebeat_metrics etc?

So this brings up an interesting question. I cloned the existing "Total events rate /s" panel. Does this mean that the Total events rate doesn't include everything? They both use the filter data_stream.dataset: elastic_agent.*, so it does pick up filebeat & metricbeat.

3. I would add a filter to exclude `(empty)` from these, there should be an option to exclude null values I believe?

I didn't see a filter to exclude empty in Lens, but I did find visual option to use a straight line when no data, which looks more like I would expect.

@botelastic
Copy link

botelastic bot commented Feb 24, 2024

Hi! We just realized that we haven't looked into this PR in a while. We're sorry! We're labeling this issue as Stale to make it hit our filters and make sure we get back to it as soon as possible. In the meantime, it'd be extremely helpful if you could take a look at it as well and confirm its relevance. A simple comment with a nice emoji will be enough :+1. Thank you for your contribution!

@botelastic botelastic bot added the Stalled label Feb 24, 2024
@botelastic
Copy link

botelastic bot commented Mar 25, 2024

Hi! This PR has been stale for a while and we're going to close it as part of our cleanup procedure. We appreciate your contribution and would like to apologize if we have not been able to review it, due to the current heavy load of the team. Feel free to re-open this PR if you think it should stay open and is worth rebasing. Thank you for your contribution!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Stalled Team:Elastic-Agent Label for the Agent team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants