Skip to content

Metrics

lyx edited this page Jan 17, 2025 · 1 revision

This article will introduce the observability metrics definitions for AutoMQ, helping you better understand AutoMQ's performance and operational status.

AutoMQ metrics are defined and presented in Prometheus format. If you need other protocol formats, you will need to convert them yourself.

General Metrics

Kafka_server_connection_count

The current number of established connections on a node.

  • Type: Gauge

Kafka_network_threads_idle_rate

Idle rate of Kafka SocketServer network threads, range: [0, 1.0].

  • Type: Gauge

Kafka_io_threads_idle_time_nanoseconds_total

Idle time of Kafka request handler threads, which is a cumulative value of Apache Kafka's native metric RequestHandlerAvgIdlePercent, measured in nanoseconds. By differentiating this value over time (in nanoseconds), you can obtain the thread idle rate. Note that if the node is a combined node (i.e., serves as both Controller and Broker), the request handler idle rate for both Controller and Broker are summed, making the maximum thread idle rate 2.0.

  • Type: Counter

Controller Metrics

Kafka_controller_active_count

Indicates whether the current Controller node is the active Controller, with a value of 1 indicating active and 0 indicating inactive.

  • Type: Gauge

Kafka_broker_active_count

The number of active Brokers in the current cluster.

  • Type: Gauge

Kafka_broker_fenced_count

The number of Brokers fenced in the current cluster.

  • Type: Gauge

Kafka_topic_count

The total number of Topics in the current cluster.

  • Type: Gauge

Kafka_partition_total_count

The total number of partitions in the current cluster.

  • Type: Gauge

Kafka_partition_offline_count

The total number of partitions without a leader in the current cluster.

  • Type: Gauge

Kafka_stream_auto_balancer_metrics_time_delay_milliseconds

The latency of AutoBalancer monitoring metrics reported by each Broker node in the cluster. If the latency exceeds a certain threshold, the Broker node is considered out-of-sync by AutoBalancer and will no longer participate in partition reassignment.

  • Type: Gauge

  • Labels:

    • node_id: The node ID reporting AutoBalancer monitoring metrics.

Kafka_stream_s3_object_count

The total number of Objects uploaded to Object storage in the current cluster, categorized by Object status.

  • Type: Gauge

  • Labels:

    • state: Object states are categorized into three types:

      • prepared: Objects that have not yet completed writing and have not been committed

      • committed: Objects that have completed writing and have been committed

      • mark_destroyed: Objects marked for deletion, which will be removed from the object storage after a certain delay

Kafka_stream_s3_object_size_bytes

The total size of objects uploaded to object storage by the current cluster

  • Type: Gauge

Kafka_stream_stream_object_num

The number of StreamObjects uploaded to object storage by the current cluster

  • Type: Gauge

Kafka_stream_stream_set_object_num

The number of StreamSetObjects uploaded to object storage by each Broker in the current cluster

  • Type: Gauge

  • Labels:

    • node_id: The corresponding Broker node ID

Broker Metrics

Kafka_message_count_total

The total number of messages received by the Broker node; measuring this over time provides the message throughput.

  • Type: Counter

  • Labels:

    • topic

Kafka_network_io_bytes_total

The total size of messages received and sent by the Broker node; measuring this over time provides the message size throughput.

  • Type: Counter

  • Labels:

    • topic

    • partition

    • direction:

      • in: indicates receiving messages

      • out: indicates sending messages

Kafka_topic_request_count_total

The total number of requests received for each Topic on the Broker node, including only produce and fetch request types.

  • Type: Counter

  • Labels:

    • topic

    • type: request type

      • produce

      • fetch

Kafka_topic_request_failed_total

The total number of failed requests for each Topic on the Broker node, including only produce and fetch request types.

  • Type: Counter

  • Labels:

    • topic

    • type: Request Type

      • produce

      • fetch

Kafka_request_count_total

The total number of requests received by the Broker nodes.

  • Type: Counter

  • Labels:

    • type: Request Type

    • version: The API version of the request type.

Kafka_request_error_count_total

The total number of failed requests at the Broker nodes. Note that even successful requests are counted in this metric, with the error code for successful requests being NONE.

  • Type: Counter

  • Labels:

    • type: Request Type

    • error: Error code, NONE indicates that the request was successful.

Kafka_request_size_bytes_total

The total size of requests received by the Broker nodes.

  • Type: Counter

  • Labels:

    • type: Request Type

Kafka_request_size_50p(99p/mean/max)_bytes

The size of requests received by Broker nodes, represented by different percentiles.

  • Type: Gauge

  • Labels:

    • type: Request Type

Kafka_request_time_milliseconds_total

The total time consumed by Broker nodes to process requests.

  • Type: Counter

  • Labels:

    • type: Request Type

Kafka_request_time_50p(99p/mean/max)_milliseconds

The processing time of requests by Broker nodes, represented by different percentiles.

  • Type: Gauge

  • Labels:

    • type: Request Type

Kafka_request_queue_time_milliseconds_total

The total queuing time of requests at Broker nodes, which increases when Kafka IO threads are busy.

  • Type: Counter

  • Labels:

    • type: Request Type

Kafka_request_queue_time_50p(99p/mean/max)_milliseconds

Queue time of requests at Broker nodes, presented by different percentiles.

  • Type: Gauge

  • Labels:

    • type: Request Type

Kafka_response_queue_time_milliseconds_total

Response queue time at Broker nodes. When Kafka Network threads are busy, response queue time increases.

  • Type: Counter

  • Labels:

    • type: Request Type

Kafka_response_queue_time_50p(99p/mean/max)_milliseconds

Response queue time at Broker nodes, presented by different percentiles.

  • Type: Gauge

  • Labels:

    • type: Request Type

Kafka_request_queue_size

Request queue size at Broker nodes.

  • Type: Gauge

Kafka_response_queue_size

The response queue size of the Broker node.

  • Type: Gauge

Kafka_purgatory_size

The number of requests waiting in the producer or fetch purgatory on the Broker node.

  • Type: Gauge

  • Labels:

    • type:

      • Produce

      • Fetch

Kafka_partition_count

The number of partitions currently assigned to the Broker node.

  • Type: Gauge

Kafka_logs_flush_time_50p(99p/mean/max)_milliseconds

The log flush time on the Broker node; in AutoMQ, this represents the flush time of the Delta WAL, shown in different percentiles.

  • Type: Gauge

Kafka_log_end_offset

The maximum logical offset of each partition on the Broker node.

  • Type: Gauge

  • Labels:

    • topic

    • partition

Kafka_log_size

The message size of each partition on the Broker node.

  • Type: Gauge

  • Labels:

    • topic

    • partition

Kafka_group_commit_offset

The consumption offsets for each Consumer Group on the corresponding partitions. Note that this metric is reported by the Broker where the Group Coordinator of each Consumer Group resides.

  • Type: Gauge

  • Labels:

    • consumer_group

    • topic

    • partition

Kafka_group_count

The number of Consumer Groups managed by the Broker node where each Group Coordinator is located.

  • Type: Gauge

Kafka_group_preparing_rebalance_count

The number of Consumer Groups preparing for a rebalance.

  • Type: Gauge

Kafka_group_completing_rebalance_count

The number of Consumer Groups waiting for the Leader to assign state.

  • Type: Gauge

Kafka_group_stable_count

The number of Consumer Groups in a stable state.

  • Type: Gauge

Kafka_group_empty_count

The number of Consumer Groups with no members but not yet expired.

  • Type: Gauge

Kafka_group_dead_count

The number of Consumer Groups with no members and whose metadata has been removed.

  • Type: Gauge

Kafka_stream_upload_size_bytes_total

The total size of data uploaded by Broker nodes to object storage.

  • Type: Counter

Kafka_stream_download_size_bytes_total

The total size of data downloaded by Broker nodes from object storage.

  • Type: Counter

Kafka_stream_network_inbound_usage_bytes_total

The total ingress bandwidth usage of Broker nodes, including received messages and data downloaded from object storage, with the time derivative yielding ingress throughput.

  • Type: Counter

Kafka_stream_network_outbound_usage_bytes_total

The total outbound bandwidth usage of a Broker node, including message consumption and data upload to object storage, can be differentiated over time to obtain traffic throughput.

  • Type: Counter

Kafka_stream_network_inbound_available_bandwidth_bytes

The inbound traffic throughput reserved for cold reads and compaction on a Broker node. If this value is less than the required inbound traffic for cold reads and compaction, the corresponding requests will be placed into a throttling queue, and normal message sending and receiving traffic will not be affected by this throttling. Note that this metric value only represents the instantaneous value at the time of sampling. Due to the sampling interval and the specific implementation of the throttling strategy, this metric value is for reference only.

  • Type: Gauge

Kafka_stream_network_outbound_available_bandwidth_bytes

The outbound traffic throughput reserved for cold reads and compaction on a Broker node. If this value is less than the required outbound traffic for cold reads and compaction, the corresponding requests will be placed into a throttling queue, and normal message sending and receiving traffic will not be affected by this throttling. Note that this metric value only represents the instantaneous value at the time of sampling. Due to the sampling interval and the specific implementation of the throttling strategy, this metric value is for reference only.

  • Type: Gauge

Kafka_stream_network_inbound_limiter_queue_time_50p(99p/mean/max)_nanoseconds

The queuing time in the throttling queue for inbound traffic requests of cold reads and compaction when they get executed.

  • Type: Gauge

Kafka_stream_network_outbound_limiter_queue_time_50p(99p/mean/max)_nanoseconds

The queuing time in the throttling queue for outbound traffic requests of cold reads and compaction when they get executed.

  • Type: Gauge

Kafka_stream_operation_latency_50p(99p/mean/max)_nanoseconds

The operation duration at each stage of the AutoMQ S3Stream module.

  • Type: Gauge

  • Labels:

    • operation_type

    • operation_name

AutoMQ Wiki Key Pages

What is automq

Getting started

Architecture

Deployment

Migration

Observability

Integrations

Releases

Benchmarks

Reference

Articles

Clone this wiki locally