-
Notifications
You must be signed in to change notification settings - Fork 235
What is automq: Overview
AutoMQ re-engineers Kafka for the cloud by decoupling storage to object storage. While maintaining 100% compatibility with Apache Kafka®, it offers users up to 10 times cost efficiency and 100 times elasticity.
The advantages of separating compute and storage have been widely recognized. However, the industry often implements this by decoupling storage into a self-managed distributed storage software, which significantly increases the complexity of software deployment, maintenance, and governance. AutoMQ believes that decoupling storage from the software and moving it to shared cloud storage services is the optimal solution in the cloud-native era.
AutoMQ leverages an S3-based stream repository, S3Stream, to offload storage to shared cloud storage services provided by cloud providers, such as EBS and S3. By fully utilizing the storage characteristics of both, AutoMQ offers low-cost, low-latency, highly available, highly durable, and virtually infinite streaming storage capabilities. For more technical details, please refer to the technical architecture chapter.
Apache Kafka® uses local disks to achieve high-durability storage, providing an abstraction of infinite streaming storage for business logic. All data is stored on the disks of each node according to specific logic, a design commonly referred to as Shared Nothing architecture.
Local disks lack scalability, so the Shared Nothing architecture typically achieves higher throughput through horizontal scaling. However, shared cloud storage has become highly elastic with near "infinite" capacity, making it easier to fully leverage the capabilities of cloud storage at a lower cost by adopting a shared storage architecture.
Apache Kafka® natively supports a multi-replica mechanism to ensure data durability through replica redundancy. It enhances system availability via a fast failover mechanism achieved through leader election among different replicas.
In cloud environments, EBS has a built-in 3-replica storage mechanism. If Kafka also implements a 3-replica storage system, it would result in data being stored 9 times, significantly increasing storage, bandwidth, and computational costs. AutoMQ Inc. believes that cloud-native Kafka no longer requires a multi-replica mechanism to provide both reliability and availability. By separating reliability to cloud storage and independently providing availability, it achieves true cloud-native implementation.
AutoMQ has the following advantages compared to Apache Kafka®:
The new cloud-native architecture of AutoMQ fully leverages the high availability and elastic provisioning capabilities of object storage, offering customers a 10x cost advantage over Apache Kafka®.
-
Using object storage as the core primary storage can significantly reduce storage costs.
-
Achieve high availability without duplicating multiple replicas, saving 2/3 of the traffic and replication costs.
-
Native support for Spot instances and AutoScaling, no need to reserve resources for peak loads.
AutoMQ separates state storage to object storage services, ensuring a completely stateless business logic layer. AutoMQ clusters can complete partition reassignment and traffic self-balancing within seconds, effectively solving the slow rebalancing and difficult partition reassignment issues encountered during Apache Kafka scaling operations. By integrating with cloud providers' elastic scaling group policies, adaptive elastic scaling of the cluster is easily achieved.
Each Apache Kafka server provides fixed IOPS for message read and write operations. If there is a large-scale cold read, causing IOPS to hit the limit, message writing may queue up and timeout. This is due to the limitations of Kafka's integrated storage and compute architecture. In contrast, AutoMQ adopts a storage-compute separation architecture, where cold and hot reads/writes do not interfere with each other. Cold read throughput depends on the object storage's throughput capacity, and EBS is used exclusively as WAL for the write message process. Therefore, this issue is effectively avoided in the AutoMQ architecture.
AutoMQ stores all data on S3, thus during cluster scaling, there is no need for data replication to quickly respond to sudden traffic spikes. In comparison, Apache Kafka requires substantial bandwidth for data replication after scaling, making it challenging to handle sudden traffic surges. Through features like auto-scaling, auto-traffic balancing, and automatic fault recovery, AutoMQ achieves a high level of system autonomy, ensuring higher availability without manual intervention.
Unlike other vendors who reimplement the Kafka protocol, AutoMQ uses a minimal storage layer replacement approach. By only modifying the underlying LogSegment implementation while keeping the main Apache Kafka code unchanged, AutoMQ easily achieves 100% compatibility with Apache Kafka and quickly adapts to new versions.
- For a quick experience with AutoMQ, please refer to: Deploy Locally▸.
- What is automq: Overview
- Difference with Apache Kafka
- Difference with WarpStream
- Difference with Tiered Storage
- Compatibility with Apache Kafka
- Licensing
- Deploy Locally
- Cluster Deployment on Linux
- Cluster Deployment on Kubernetes
- Example: Produce & Consume Message
- Example: Simple Benchmark
- Example: Partition Reassignment in Seconds
- Example: Self Balancing when Cluster Nodes Change
- Example: Continuous Data Self Balancing
-
S3stream shared streaming storage
-
Technical advantage
- Deployment: Overview
- Runs on Cloud
- Runs on CEPH
- Runs on CubeFS
- Runs on MinIO
- Runs on HDFS
- Configuration
-
Data analysis
-
Object storage
-
Kafka ui
-
Observability
-
Data integration