-
Notifications
You must be signed in to change notification settings - Fork 235
Palmpay Uses AutoMQ to Replace Kafka, Optimizing Costs by 50%+
Palmpay is a leading fintech company in Africa, offering convenient mobile payment and financial services. As one of the fastest-growing fintech enterprises on the continent, Palmpay is dedicated to providing users and merchants with an inclusive, secure, and flexible digital payment experience, thereby promoting financial inclusion in Africa. Palmpay extensively utilizes Kafka for metrics logging and CDC (Change Data Capture) transmission scenarios.
Palmpay employs Kafka to manage real-time computing and mobile application metrics data collection. Real-time computing operations capture database change events through online applications, which are uniformly stored in Kafka. Downstream services can subscribe to these changes for real-time dashboards and risk control detection. For mobile application metrics, the gateway asynchronously distributes reported metrics data via Kafka for offline cleansing and storage.
Initially, Palmpay selected Kafka for its robust architecture. However, over time, Kafka’s limitations in resource overhead and elastic scalability became apparent. After an in-depth evaluation, Palmpay adopted AutoMQ, a redesigned solution based on object storage. AutoMQ serves as a storage and computation-separated Kafka replacement, converting Kafka storage to object storage. It offers benefits such as stateless computation, partition reassignment in seconds, automatic elasticity, and traffic self-balancing.
AutoMQ restructures the storage layer of Apache Kafka, achieving significant cost benefits:
-
Object Storage, Cost-Effective : AutoMQ uses object storage for data, leveraging it on-demand without reserved space. The cost per GB in object storage is up to one-tenth of that in Kafka deployments based on cloud disks.
-
Storage and Compute Without Idleness : AutoMQ separates storage and compute, allowing on-demand scaling of either. This enables scenarios like low traffic with high storage or high traffic with low storage, ensuring no resource idleness.
AutoMQ, built on separated storage and compute, achieves automatic traffic self-balancing. The read and write pressure within the cluster is automatically scheduled and balanced at the partition level, addressing Kafka’s limitations:
-
No Hotspots, More Balanced : With automatic traffic balancing, the load across nodes in an AutoMQ cluster is consistent, eliminating hotspots and reducing risk.
-
Scaling Without Manual Reassignment : AutoMQ users find that after scaling, nodes automatically balance traffic, eliminating the need for manual partition reassignment and simplifying maintenance.
AutoMQ’s architecture, which replaces only the storage layer while fully adopting Apache Kafka’s compute layer code, ensures compatibility when replacing Kafka with AutoMQ. Palmpay’s migration process utilized a dual-write traffic switching approach:
-
Kafka Upstream Dual-Write : Using Flink tasks and other tools, data is written simultaneously to AutoMQ and the original Kafka cluster to verify data consistency.
-
Downstream Grayscale Switching : Partial downstream business is switched to consume from AutoMQ, verifying existing business logic.
-
Upstream Stop Writing to Original Cluster : After all downstream switching is completed, upstream gradually stops writing to the original cluster, completing the switch.
Within one month, Palmpay seamlessly migrated its business, including metrics tracking and real-time computing to AutoMQ. AutoMQ now processes and distributes hundreds of billions of messages and events daily. The new solution reduced costs by over 50% compared to the original setup, without causing any negative impacts.
- What is automq: Overview
- Difference with Apache Kafka
- Difference with WarpStream
- Difference with Tiered Storage
- Compatibility with Apache Kafka
- Licensing
- Deploy Locally
- Cluster Deployment on Linux
- Cluster Deployment on Kubernetes
- Example: Produce & Consume Message
- Example: Simple Benchmark
- Example: Partition Reassignment in Seconds
- Example: Self Balancing when Cluster Nodes Change
- Example: Continuous Data Self Balancing
-
S3stream shared streaming storage
-
Technical advantage
- Deployment: Overview
- Runs on Cloud
- Runs on CEPH
- Runs on CubeFS
- Runs on MinIO
- Runs on HDFS
- Configuration
-
Data analysis
-
Object storage
-
Kafka ui
-
Observability
-
Data integration