diff --git a/docs/en/observability/apm/apm-performance-diagnostic.asciidoc b/docs/en/observability/apm/apm-performance-diagnostic.asciidoc index caab1a04f7..b4b554d3c3 100644 --- a/docs/en/observability/apm/apm-performance-diagnostic.asciidoc +++ b/docs/en/observability/apm/apm-performance-diagnostic.asciidoc @@ -7,7 +7,7 @@ When {es} is under excessive load or indexing pressure, APM Server could experience the downstream backpressure when indexing new documents into {es}. Most commonly, backpressure from {es} will manifest itself in the form of higher indexing latency and/or rejected requests, which in return could lead APM Server to deny incoming requests. -As a result APM agents connected to the affected APM Server will suffer from throttling and/or request timeout when shipping APM events. +As a result, APM agents connected to the affected APM Server will suffer from throttling and/or request timeout when shipping APM events. To quickly identify possible issues try looking for similar error logs lines in APM Server logs: @@ -19,25 +19,33 @@ To quickly identify possible issues try looking for similar error logs lines in ... ---- -To gain better insight into APM Server health and performance, consider enabling the monitoring feature by following the steps in <>. -When enabled APM Server will additionally report a set of vital metrics to help you identify any performance degradation. +To gain better insight into APM Server health and performance, consider enabling the monitoring feature by following the steps in <>. +When enabled, APM Server will additionally report a set of vital metrics to help you identify any performance degradation. Pay careful attention to the next metric fields: -* `beats_stats.metrics.libbeat.output.events.active` that represents the number of buffered pending documents waiting for indexing; -(_if this value is increasing rapidly it indicates {es} backpressure_) -* `beats_stats.metrics.libbeat.output.events.acked` that represents the number of indexing operations that have completed successfully; -* `beats_stats.metrics.libbeat.output.events.failed` that represents the number of indexing operations that failed, it includes all failures; -(_if this value is increasing rapidly it indicates {es} backpressure_) -* `beats_stats.metrics.libbeat.output.events.toomany` that represents the number of indexing operations that failed due to {es} responding with 429 Too many Requests; -(_if this value is increasing rapidly it indicates {es} backpressure_) +* `beats_stats.metrics.libbeat.output.events.active` that represents the number of buffered pending documents waiting to be ingested; +(_if this value is increasing rapidly it may indicate {es} backpressure_) +* `beats_stats.metrics.libbeat.output.events.acked` that represents the total number of documents that have been ingested successfully; +* `beats_stats.metrics.libbeat.output.events.failed` that represents the total number of documents that failed to ingest; +(_if this value is increasing rapidly it may indicate {es} backpressure_) +* `beats_stats.metrics.libbeat.output.events.toomany` that represents the number of documents that failed to ingest due to {es} responding with 429 Too many Requests; +(_if this value is increasing rapidly it may indicate {es} backpressure_) * `beats_stats.output.elasticsearch.bulk_requests.available` that represents the number of bulk indexers available for making bulk index requests; -(_if this value is equal to 0 it indicates {es} backpressure_) +(_if this value is equal to 0 it may indicate {es} backpressure_) * `beats_stats.output.elasticsearch.bulk_requests.completed` that represents the number of already completed bulk requests; * `beats_stats.metrics.output.elasticsearch.indexers.active` that represents the number of active bulk indexers that are concurrently processing batches; -See https://www.elastic.co/guide/en/beats/metricbeat/current/exported-fields-beat.html[{metricbeat} documentation] for the full list of exported metric fields. +See {metricbeat-ref}/exported-fields-beat.html[{metricbeat} documentation] for the full list of exported metric fields. One likely cause of excessive indexing pressure or rejected requests is undersized {es}. To mitigate this, follow the guidance in {ref}/rejected-requests.html[Rejected requests]. -If scaling {es} resources up is not an option, you can try to workaround by adjusting `flush_bytes`, `flush_interval`, `max_retries` and `timeout` settings described in <> to reduce APM Server indexing pressure. -However, consider that increasing number of buffered documents and/or reducing retries may lead to a higher rate of dropped APM events. \ No newline at end of file + +If scaling {es} resources up is not an option, you can adjust the `flush_bytes`, `flush_interval`, `max_retries` and `timeout` settings described in <> to reduce APM Server indexing pressure. However, consider that increasing number of buffered documents and/or reducing retries may lead to a higher rate of dropped APM events. Down bellow a custom configuration example is listed where the number of default buffered documents is roughly doubled while {es} indexing retries are decreased simultaneously. This configuration provides a generic example and might not be applicable to your situation. Try adjusting the settings further to see what works for you. +[source,yaml] +---- +output.elasticsearch: + flush_bytes: "2MB" # double the default value + flush_interval: "2s" # double the default value + max_retries: 1 # decrease the default value + timeout: 60 # decrease the default value +---- \ No newline at end of file