Skip to content

Palmpay Uses AutoMQ to Replace Kafka, Optimizing Costs by 50%+

lyx edited this page Jan 17, 2025 · 1 revision

About Palmpay

Palmpay is a leading fintech company in Africa, offering convenient mobile payment and financial services. As one of the fastest-growing fintech enterprises on the continent, Palmpay is dedicated to providing users and merchants with an inclusive, secure, and flexible digital payment experience, thereby promoting financial inclusion in Africa. Palmpay extensively utilizes Kafka for metrics logging and CDC (Change Data Capture) transmission scenarios.

Business Background of Palmpay

Palmpay employs Kafka to manage real-time computing and mobile application metrics data collection. Real-time computing operations capture database change events through online applications, which are uniformly stored in Kafka. Downstream services can subscribe to these changes for real-time dashboards and risk control detection. For mobile application metrics, the gateway asynchronously distributes reported metrics data via Kafka for offline cleansing and storage.

Why Choose AutoMQ?

Initially, Palmpay selected Kafka for its robust architecture. However, over time, Kafka’s limitations in resource overhead and elastic scalability became apparent. After an in-depth evaluation, Palmpay adopted AutoMQ, a redesigned solution based on object storage. AutoMQ serves as a storage and computation-separated Kafka replacement, converting Kafka storage to object storage. It offers benefits such as stateless computation, partition reassignment in seconds, automatic elasticity, and traffic self-balancing.

Significant Cost Advantages

AutoMQ restructures the storage layer of Apache Kafka, achieving significant cost benefits:

  • Object Storage, Cost-Effective : AutoMQ uses object storage for data, leveraging it on-demand without reserved space. The cost per GB in object storage is up to one-tenth of that in Kafka deployments based on cloud disks.

  • Storage and Compute Without Idleness : AutoMQ separates storage and compute, allowing on-demand scaling of either. This enables scenarios like low traffic with high storage or high traffic with low storage, ensuring no resource idleness.

Automatic Traffic Self-Balancing

AutoMQ, built on separated storage and compute, achieves automatic traffic self-balancing. The read and write pressure within the cluster is automatically scheduled and balanced at the partition level, addressing Kafka’s limitations:

  • No Hotspots, More Balanced : With automatic traffic balancing, the load across nodes in an AutoMQ cluster is consistent, eliminating hotspots and reducing risk.

  • Scaling Without Manual Reassignment : AutoMQ users find that after scaling, nodes automatically balance traffic, eliminating the need for manual partition reassignment and simplifying maintenance.

Migration Plan and Overall Benefits

AutoMQ’s architecture, which replaces only the storage layer while fully adopting Apache Kafka’s compute layer code, ensures compatibility when replacing Kafka with AutoMQ. Palmpay’s migration process utilized a dual-write traffic switching approach:

  1. Kafka Upstream Dual-Write : Using Flink tasks and other tools, data is written simultaneously to AutoMQ and the original Kafka cluster to verify data consistency.

  2. Downstream Grayscale Switching : Partial downstream business is switched to consume from AutoMQ, verifying existing business logic.

  3. Upstream Stop Writing to Original Cluster : After all downstream switching is completed, upstream gradually stops writing to the original cluster, completing the switch.

Within one month, Palmpay seamlessly migrated its business, including metrics tracking and real-time computing to AutoMQ. AutoMQ now processes and distributes hundreds of billions of messages and events daily. The new solution reduced costs by over 50% compared to the original setup, without causing any negative impacts.

AutoMQ Wiki Key Pages

What is automq

Getting started

Architecture

Deployment

Migration

Observability

Integrations

Releases

Benchmarks

Reference

Articles

Clone this wiki locally