-
Notifications
You must be signed in to change notification settings - Fork 235
100 Lines of Code to Implement Kafka on S3
Yes, you read it right. AutoMQ[1] now supports running completely on top of object storage like S3. By extending the top-level WAL's abstraction, AutoMQ can achieve some features that other competitors pride themselves on, that is, building the streaming system entirely on S3 object storage, with a minimum amount of code changes over existing stream storage engines. Worth mentioning is that we have made this part of the source code completely open on GitHub. You can follow the quick start tutorial[3] to create an AutoMQ cluster directly on S3 and test it. Developers can use the S3Stream[2] stream storage engine to easily have a Kafka service fully deployed on object storage in their environment, offering extremely low storage costs and operational complexity.
The seamless ability of AutoMQ's core stream storage engine is inseparable from its excellent top-level abstraction designed around WAL and shared storage architecture. Based on this superior abstraction, we implemented the highly innovative S3Stream[2] stream storage engine. In this article, we will share the design details, underlying thoughts, and evolution process of AutoMQ's shared stream storage engine. After reading the previous content, you will truly understand why we say that only hundreds of lines of code are needed to implement Kafka on top of S3 for AutoMQ.
Over a decade ago, Kafka was born in an era where the IDC (Internet Data Center) was the main scenario. At that time, computing and storage resources were usually tightly coupled together, forming a shared-nothing architecture of integrated storage and computation. This architecture was very effective in the physical data center environment at that time, but with the maturity of public cloud technology, its limitations gradually became apparent. The shared-nothing architecture's storage and computation are strongly coupled, making it impossible to completely decouple the storage layer and offload capabilities such as durability and high availability of cloud storage services. This also means that shared-nothing architectures cannot take advantage of mature cloud storage services' elasticity and cost benefits. In addition, this integrated architecture makes Kafka lack elasticity and difficult to scale. When adjusting the capacity of a Kafka cluster, a large amount of data replication will be involved. This will affect the efficiency of its capacity adjustment, and it will also impact normal read and write requests during the adjustment period.
AutoMQ is dedicated to fully leveraging the advantages of the cloud, implementing a Cloud-First philosophy. Through shared storage architecture, AutoMQ decouples data durability and offloads it to mature cloud storage services like S3 and EBS, fully tapping into their potential. The lack of elasticity, high cost, and complex operation issues brought about by Kafka's Share-Nothing architecture are no longer problems under AutoMQ's new shared storage architecture implementation.
The core architecture of AutoMQ's shared storage is built on Shared WAL and Shared Object. Within this shared storage abstraction, various implementations can be achieved. The abstraction of Shared WAL allows us to migrate the WAL implementation to any shared storage medium, leveraging the inherent advantages of different shared storage mediums. Readers familiar with software engineering will understand that every software design involves trade-offs. Different shared storage mediums come with their own sets of advantages and disadvantages based on their trade-offs. The top-level abstraction based on Shared WAL in AutoMQ allows for adaptability across various scenarios. AutoMQ can freely reassign the Shared WAL implementation to any shared storage service and even combine them together. The Shared Object is primarily constructed on mature cloud object storage services, enabling extremely low storage costs while benefiting from the technological cost advantages of large-scale cloud object storage services. With the S3 API becoming the de facto standard for object storage protocols, AutoMQ can also adapt to various object storage services through Shared Object, providing users with multi-cloud storage solutions. Shared WAL can adapt to low-latency storage mediums like EBS and S3E1Z, offering users low-latency stream services.
WAL was initially used in relational databases to achieve data atomicity and consistency. With the maturity of cloud storage services like S3 and EBS, combining WAL with low-latency storage and then asynchronously writing data to low-cost storage like S3 allows for a balance between latency and cost. AutoMQ is the first player in the stream domain to use WAL in this way based on a shared storage architecture, thoroughly leveraging the advantages of different cloud storage services. We also believe that EBS WAL is the best implementation of a cloud stream storage engine because it combines the low-latency and high-durability advantages of EBS with the low-cost benefits of object storage. Through ingenious design, it also mitigates the expensive drawbacks of EBS.
The following diagram illustrates the core implementation process of EBS WAL:
- The producer writes data to the EBS WAL through the S3Stream stream storage engine. Once the data is successfully written to the disk, a successful response is immediately returned to the client, fully utilizing the low-latency and high-durability characteristics of EBS.
- Consumers can directly read newly written data from the cache.
- The cached data becomes invalid after being asynchronously written in bulk and in parallel to S3.
- Consumers can directly read historical data from object storage.
A common misconception is confusing the Shared WAL built on EBS with Kafka's tiered storage. The main way to distinguish them is by determining if the compute node Broker is completely stateless. For tiered storage implementations by Confluent and Aiven, their Brokers are still stateful. Kafka's tiered storage requires that the last log segment of its partitions be on a local disk, hence the data on local storage is tightly coupled with the compute layer's Broker. In contrast, AutoMQ's implementation of EBS WAL does not have this limitation. When a Broker node crashes, other healthy Broker nodes can take over the EBS volume within milliseconds using Multi Attach, write the small fixed-size (typically 500MB) WAL data to S3, and then delete the volume.
S3 WAL is the natural evolution of the Shared WAL architecture. Currently, AutoMQ supports building the storage layer entirely on S3, which is a specific implementation of Shared WAL. This WAL implementation directly built on S3 is what we call S3 WAL. Thanks to the top-level abstraction of Shared WAL and the foundational implementation of EBS WAL, the core process of S3 WAL is identical to EBS WAL, allowing the AutoMQ team to complete support for S3 WAL within weeks.
Implementing S3 WAL is both a natural progression of the AutoMQ Shared WAL architecture and a means to expand AutoMQ's capability boundaries. When using S3 WAL, all user data is written entirely to object storage, leading to some increase in latency compared to EBS WAL. However, with this trade-off, the architecture becomes more streamlined and efficient as it relies on fewer services. In "special" cloud providers like AWS that do not offer cross-AZ EBS, and in private IDC scenarios using self-built object storage services like minio, the S3 WAL architecture provides stronger cross-AZ availability guarantees and flexibility.
AutoMQ has made numerous optimizations to the performance of S3 WAL, particularly its latency performance. In our test scenarios, the average latency for S3 WAL Append is 168ms, with P99 at 296ms.
Kafka Produce request processing latency averages 170ms, with P99 at 346ms.
The average send latency is 230ms, with P99 at 489ms.
In the AutoMQ GitHub repository, you can find the core stream repository S3Stream[2]. The class com.automq.stream.s3.wal.WriteAheadLog contains the top-level abstraction of the WAL, while the implementation class ObjectWALService includes over 100 lines of implementation code for S3 WAL. In this sense, we have indeed utilized over 100 lines of implementation code along with the existing EBS WAL code infrastructure to fully build AutoMQ on S3.
Of course, implementing a few hundred lines of code does not mean that you can achieve running Kafka on S3 with just over 100 lines of code. This is merely the surface. The key lies in fully understanding the concept of AutoMQ's WAL-based shared storage architecture. Within this framework, whether you aim to implement a fully S3-based shared storage or on other shared storage mediums in the future, the approach remains consistent. In AutoMQ's architecture, Shared WAL is one of the core components. By abstracting and organizing the code for Shared WAL at a high level, we can reassign the implementation method of Shared WAL to any other shared storage medium. Specifically, when implementing a shared storage WAL on AutoMQ, the actual workload and complexity have already been absorbed by the underlying architecture. You only need to focus on efficiently writing and reading the WAL to and from the target storage medium. Since AutoMQ's stream storage engine has already paved the way for you, once you fully understand the concept of Shared WAL and the S3Stream storage engine, implementing an entirely S3-based S3WAL is as simple as writing 100 lines of code.
This article has revealed the core concept of the shared storage architecture based on Shared WAL by introducing the thoughts and evolution behind AutoMQ's storage architecture. In the future, AutoMQ will continue to optimize the capabilities of its abstracted stream storage engine foundation, building more robust Kafka stream services on top of it. Soon, the S3E1Z WAL will also be officially introduced, so please stay tuned for updates from us.
- What is automq: Overview
- Difference with Apache Kafka
- Difference with WarpStream
- Difference with Tiered Storage
- Compatibility with Apache Kafka
- Licensing
- Deploy Locally
- Cluster Deployment on Linux
- Cluster Deployment on Kubernetes
- Example: Produce & Consume Message
- Example: Simple Benchmark
- Example: Partition Reassignment in Seconds
- Example: Self Balancing when Cluster Nodes Change
- Example: Continuous Data Self Balancing
-
S3stream shared streaming storage
-
Technical advantage
- Deployment: Overview
- Runs on Cloud
- Runs on CEPH
- Runs on CubeFS
- Runs on MinIO
- Runs on HDFS
- Configuration
-
Data analysis
-
Object storage
-
Kafka ui
-
Observability
-
Data integration