-
Notifications
You must be signed in to change notification settings - Fork 235
Metrics
This article will introduce the observability metrics definitions for AutoMQ, helping you better understand AutoMQ's performance and operational status.
AutoMQ metrics are defined and presented in Prometheus format. If you need other protocol formats, you will need to convert them yourself.
The current number of established connections on a node.
- Type: Gauge
Idle rate of Kafka SocketServer network threads, range: [0, 1.0].
- Type: Gauge
Idle time of Kafka request handler threads, which is a cumulative value of Apache Kafka's native metric RequestHandlerAvgIdlePercent
, measured in nanoseconds. By differentiating this value over time (in nanoseconds), you can obtain the thread idle rate. Note that if the node is a combined node (i.e., serves as both Controller and Broker), the request handler idle rate for both Controller and Broker are summed, making the maximum thread idle rate 2.0.
- Type: Counter
Indicates whether the current Controller node is the active Controller, with a value of 1 indicating active and 0 indicating inactive.
- Type: Gauge
The number of active Brokers in the current cluster.
- Type: Gauge
The number of Brokers fenced in the current cluster.
- Type: Gauge
The total number of Topics in the current cluster.
- Type: Gauge
The total number of partitions in the current cluster.
- Type: Gauge
The total number of partitions without a leader in the current cluster.
- Type: Gauge
The latency of AutoBalancer monitoring metrics reported by each Broker node in the cluster. If the latency exceeds a certain threshold, the Broker node is considered out-of-sync by AutoBalancer and will no longer participate in partition reassignment.
-
Type: Gauge
-
Labels:
- node_id: The node ID reporting AutoBalancer monitoring metrics.
The total number of Objects uploaded to Object storage in the current cluster, categorized by Object status.
-
Type: Gauge
-
Labels:
-
state: Object states are categorized into three types:
-
prepared: Objects that have not yet completed writing and have not been committed
-
committed: Objects that have completed writing and have been committed
-
mark_destroyed: Objects marked for deletion, which will be removed from the object storage after a certain delay
-
-
The total size of objects uploaded to object storage by the current cluster
- Type: Gauge
The number of StreamObjects uploaded to object storage by the current cluster
- Type: Gauge
The number of StreamSetObjects uploaded to object storage by each Broker in the current cluster
-
Type: Gauge
-
Labels:
- node_id: The corresponding Broker node ID
The total number of messages received by the Broker node; measuring this over time provides the message throughput.
-
Type: Counter
-
Labels:
- topic
The total size of messages received and sent by the Broker node; measuring this over time provides the message size throughput.
-
Type: Counter
-
Labels:
-
topic
-
partition
-
direction:
-
in: indicates receiving messages
-
out: indicates sending messages
-
-
The total number of requests received for each Topic on the Broker node, including only produce and fetch request types.
-
Type: Counter
-
Labels:
-
topic
-
type: request type
-
produce
-
fetch
-
-
The total number of failed requests for each Topic on the Broker node, including only produce and fetch request types.
-
Type: Counter
-
Labels:
-
topic
-
type: Request Type
-
produce
-
fetch
-
-
The total number of requests received by the Broker nodes.
-
Type: Counter
-
Labels:
-
type: Request Type
-
version: The API version of the request type.
-
The total number of failed requests at the Broker nodes. Note that even successful requests are counted in this metric, with the error code for successful requests being NONE.
-
Type: Counter
-
Labels:
-
type: Request Type
-
error: Error code, NONE indicates that the request was successful.
-
The total size of requests received by the Broker nodes.
-
Type: Counter
-
Labels:
- type: Request Type
The size of requests received by Broker nodes, represented by different percentiles.
-
Type: Gauge
-
Labels:
- type: Request Type
The total time consumed by Broker nodes to process requests.
-
Type: Counter
-
Labels:
- type: Request Type
The processing time of requests by Broker nodes, represented by different percentiles.
-
Type: Gauge
-
Labels:
- type: Request Type
The total queuing time of requests at Broker nodes, which increases when Kafka IO threads are busy.
-
Type: Counter
-
Labels:
- type: Request Type
Queue time of requests at Broker nodes, presented by different percentiles.
-
Type: Gauge
-
Labels:
- type: Request Type
Response queue time at Broker nodes. When Kafka Network threads are busy, response queue time increases.
-
Type: Counter
-
Labels:
- type: Request Type
Response queue time at Broker nodes, presented by different percentiles.
-
Type: Gauge
-
Labels:
- type: Request Type
Request queue size at Broker nodes.
- Type: Gauge
The response queue size of the Broker node.
- Type: Gauge
The number of requests waiting in the producer or fetch purgatory on the Broker node.
-
Type: Gauge
-
Labels:
-
type:
-
Produce
-
Fetch
-
-
The number of partitions currently assigned to the Broker node.
- Type: Gauge
The log flush time on the Broker node; in AutoMQ, this represents the flush time of the Delta WAL, shown in different percentiles.
- Type: Gauge
The maximum logical offset of each partition on the Broker node.
-
Type: Gauge
-
Labels:
-
topic
-
partition
-
The message size of each partition on the Broker node.
-
Type: Gauge
-
Labels:
-
topic
-
partition
-
The consumption offsets for each Consumer Group on the corresponding partitions. Note that this metric is reported by the Broker where the Group Coordinator of each Consumer Group resides.
-
Type: Gauge
-
Labels:
-
consumer_group
-
topic
-
partition
-
The number of Consumer Groups managed by the Broker node where each Group Coordinator is located.
- Type: Gauge
The number of Consumer Groups preparing for a rebalance.
- Type: Gauge
The number of Consumer Groups waiting for the Leader to assign state.
- Type: Gauge
The number of Consumer Groups in a stable state.
- Type: Gauge
The number of Consumer Groups with no members but not yet expired.
- Type: Gauge
The number of Consumer Groups with no members and whose metadata has been removed.
- Type: Gauge
The total size of data uploaded by Broker nodes to object storage.
- Type: Counter
The total size of data downloaded by Broker nodes from object storage.
- Type: Counter
The total ingress bandwidth usage of Broker nodes, including received messages and data downloaded from object storage, with the time derivative yielding ingress throughput.
- Type: Counter
The total outbound bandwidth usage of a Broker node, including message consumption and data upload to object storage, can be differentiated over time to obtain traffic throughput.
- Type: Counter
The inbound traffic throughput reserved for cold reads and compaction on a Broker node. If this value is less than the required inbound traffic for cold reads and compaction, the corresponding requests will be placed into a throttling queue, and normal message sending and receiving traffic will not be affected by this throttling. Note that this metric value only represents the instantaneous value at the time of sampling. Due to the sampling interval and the specific implementation of the throttling strategy, this metric value is for reference only.
- Type: Gauge
The outbound traffic throughput reserved for cold reads and compaction on a Broker node. If this value is less than the required outbound traffic for cold reads and compaction, the corresponding requests will be placed into a throttling queue, and normal message sending and receiving traffic will not be affected by this throttling. Note that this metric value only represents the instantaneous value at the time of sampling. Due to the sampling interval and the specific implementation of the throttling strategy, this metric value is for reference only.
- Type: Gauge
The queuing time in the throttling queue for inbound traffic requests of cold reads and compaction when they get executed.
- Type: Gauge
The queuing time in the throttling queue for outbound traffic requests of cold reads and compaction when they get executed.
- Type: Gauge
The operation duration at each stage of the AutoMQ S3Stream module.
-
Type: Gauge
-
Labels:
-
operation_type
-
operation_name
-
- What is automq: Overview
- Difference with Apache Kafka
- Difference with WarpStream
- Difference with Tiered Storage
- Compatibility with Apache Kafka
- Licensing
- Deploy Locally
- Cluster Deployment on Linux
- Cluster Deployment on Kubernetes
- Example: Produce & Consume Message
- Example: Simple Benchmark
- Example: Partition Reassignment in Seconds
- Example: Self Balancing when Cluster Nodes Change
- Example: Continuous Data Self Balancing
-
S3stream shared streaming storage
-
Technical advantage
- Deployment: Overview
- Runs on Cloud
- Runs on CEPH
- Runs on CubeFS
- Runs on MinIO
- Runs on HDFS
- Configuration
-
Data analysis
-
Object storage
-
Kafka ui
-
Observability
-
Data integration