-
Notifications
You must be signed in to change notification settings - Fork 235
MinIO
MinIO is a high-performance, distributed object storage system that runs on standard hardware, offering exceptional cost-effectiveness and broad applicability. It's specifically designed for high-performance private cloud environments, utilizing a simple yet efficient architecture to deliver comprehensive object storage functionality while maintaining outstanding performance. MinIO demonstrates its robust adaptability and superiority in various fields, from traditional secondary storage, disaster recovery, and archiving, to emerging areas such as machine learning, big data, private cloud, and hybrid cloud.
Thanks to MinIO's full compatibility with the S3 API, you can deploy an AutoMQ cluster in a private data center to obtain a Kafka-compatible streaming system that offers better cost efficiency, extreme elasticity, and single-digit millisecond latency. This article will guide you on how to deploy an AutoMQ cluster on top of your MinIO in a private data center.
-
A functional MinIO environment. If you do not have an available MinIO environment, you can refer to its official installation guide for setup.
-
Prepare 5 hosts for deploying the AutoMQ cluster. It is recommended to select Linux amd64 hosts with 2 cores and 16GB of RAM, and to prepare two virtual storage volumes. An example is as follows:
Role
IP
Node ID
System Volume
Data Volume
CONTROLLER
192.168.0.1
0
EBS 20GB
EBS 20GB
CONTROLLER
192.168.0.2
1
EBS 20GB
EBS 20GB
CONTROLLER
192.168.0.3
2
EBS 20GB
EBS 20GB
BROKER
192.168.0.4
3
EBS 20GB
EBS 20GB
BROKER
192.168.0.5
4
EBS 20GB
EBS 20GB
Tips:
- Ensure that these machines are on the same subnet and can communicate with each other.
- In non-production environments, you can deploy only one Controller, which also serves as a Broker by default.
-
Download the latest stable binary package for installing AutoMQ from AutoMQ GitHub Releases.
-
Create two custom-named object storage buckets on Ceph:
automq-data
andautomq-ops
.- You can configure the required AWS CLI Access Key and Secret Key by setting environment variables.
export AWS_ACCESS_KEY_ID=X1J0E1EC3KZMQUZCVHED export AWS_SECRET_ACCESS_KEY=Hihmu8nIDN1F7wshByig0dwQ235a0WAeUvAEiWSD
- Use AWS CLI to create an S3 bucket.
aws s3api create-bucket --bucket automq-data --endpoint=http://127.0.0.1:80 aws s3api create-bucket --bucket automq-ops --endpoint=http://127.0.0.1:80
AutoMQ provides the automq-kafka-admin.sh
tool for quickly starting AutoMQ. Simply provide the S3 URL containing the required S3 endpoint and authentication information, and you can start AutoMQ with one click, without manually generating cluster IDs or performing storage formatting.
### Command Line Usage Example
bin/automq-kafka-admin.sh generate-s3-url \
--s3-access-key=xxx \
--s3-secret-key=yyy \
--s3-region=cn-northwest-1 \
--s3-endpoint=s3.cn-northwest-1.amazonaws.com.cn \
--s3-data-bucket=automq-data \
--s3-ops-bucket=automq-ops
When using MinIO, you can use the following configuration to generate a specific S3URL.
Parameter Name |
Default Value in This Example |
Description |
---|---|---|
--s3-access-key |
minioadmin |
Environment Variable MINIO_ROOT_USER |
--s3-secret-key |
minio-secret-key-CHANGE-ME |
Environment Variable MINIO_ROOT_PASSWORD |
--s3-region |
us-west-2 |
This parameter is not valid in MinIO, and can be set to any value, such as us-west-2 |
--s3-endpoint |
http://10.1.0.240:9000 |
You can obtain the endpoint by running the command sudo systemctl status minio.service |
--s3-data-bucket |
automq-data |
- |
--s3-ops-bucket |
automq-ops |
- |
After executing this command, the following stages will be automatically processed:
-
Probe the core features of S3 using the provided accessKey and secretKey to verify the compatibility between AutoMQ and S3.
-
Generate the s3url based on the identity information and access point information.
-
Obtain the startup command for AutoMQ using the s3url. In the command, replace --controller-list and --broker-list with the actual CONTROLLER and BROKER that need to be deployed.
An example of the execution result is as follows:
############ Ping S3 ########################
[ OK ] Write s3 object
[ OK ] Read s3 object
[ OK ] Delete s3 object
[ OK ] Write s3 object
[ OK ] Upload s3 multipart object
[ OK ] Read s3 multipart object
[ OK ] Delete s3 object
############ String of S3url ################
Your s3url is:
s3://s3.cn-northwest-1.amazonaws.com.cn?s3-access-key=xxx&s3-secret-key=yyy&s3-region=cn-northwest-1&s3-endpoint-protocol=https&s3-data-bucket=automq-data&s3-path-style=false&s3-ops-bucket=automq-ops&cluster-id=40ErA_nGQ_qNPDz0uodTEA
############ Usage of S3url ################
To start AutoMQ, generate the start commandline using s3url.
bin/automq-kafka-admin.sh generate-start-command \
--s3-url="s3://s3.cn-northwest-1.amazonaws.com.cn?s3-access-key=XXX&s3-secret-key=YYY&s3-region=cn-northwest-1&s3-endpoint-protocol=https&s3-data-bucket=automq-data&s3-path-style=false&s3-ops-bucket=automq-ops&cluster-id=40ErA_nGQ_qNPDz0uodTEA" \
--controller-list="192.168.0.1:9093;192.168.0.2:9093;192.168.0.3:9093" \
--broker-list="192.168.0.4:9092;192.168.0.5:9092"
TIPS: Please replace the controller-list and broker-list with your actual IP addresses.
Replace --controller-list and --broker-list in the generated command from the previous step with your host information, specifically replacing them with the IP addresses of the 3 CONTROLLERS and 2 BROKERS mentioned in the environment preparation, using the default ports 9092 and 9093.
bin/automq-kafka-admin.sh generate-start-command \
--s3-url="s3://s3.cn-northwest-1.amazonaws.com.cn?s3-access-key=XXX&s3-secret-key=YYY&s3-region=cn-northwest-1&s3-endpoint-protocol=https&s3-data-bucket=automq-data&s3-path-style=false&s3-ops-bucket=automq-ops&cluster-id=40ErA_nGQ_qNPDz0uodTEA" \
--controller-list="192.168.0.1:9093;192.168.0.2:9093;192.168.0.3:9093" \
--broker-list="192.168.0.4:9092;192.168.0.5:9092"
Parameter Name |
Required |
Description |
---|---|---|
--s3-url |
Yes |
Generated by the bin/automq-kafka-admin.sh generate-s3-url command line tool, includes authentication, cluster ID, and other information. |
--controller-list |
Yes |
At least one address is required, used as the IP and port list for the CONTROLLER hosts. The format is IP1:PORT1; IP2:PORT2; IP3:PORT3 |
--broker-list |
Yes |
At least one address is required, used as the IP and port list for the BROKER hosts. The format is IP1:PORT1; IP2:PORT2; IP3:PORT3 |
--controller-only-mode |
No |
Determines whether the CONTROLLER node only assumes the CONTROLLER role. Defaults to false, meaning the deployed CONTROLLER node also acts as a BROKER. |
After executing the command, a command for starting AutoMQ will be generated.
############ Start Commandline ##############
To start an AutoMQ Kafka server, please navigate to the directory where your AutoMQ tgz file is located and run the following command.
Before running the command, make sure that Java 17 is installed on your host. You can verify the Java version by executing 'java -version'.
bin/kafka-server-start.sh --s3-url="s3://s3.cn-northwest-1.amazonaws.com.cn?s3-access-key=XXX&s3-secret-key=YYY&s3-region=cn-northwest-1&s3-endpoint-protocol=https&s3-data-bucket=automq-data&s3-path-style=false&s3-ops-bucket=automq-ops&cluster-id=40ErA_nGQ_qNPDz0uodTEA" --override process.roles=broker,controller --override node.id=0 --override [email protected]:9093,[email protected]:9093,[email protected]:9093 --override listeners=PLAINTEXT://192.168.0.1:9092,CONTROLLER://192.168.0.1:9093 --override advertised.listeners=PLAINTEXT://192.168.0.1:9092
bin/kafka-server-start.sh --s3-url="s3://s3.cn-northwest-1.amazonaws.com.cn?s3-access-key=XXX&s3-secret-key=YYY&s3-region=cn-northwest-1&s3-endpoint-protocol=https&s3-data-bucket=automq-data&s3-path-style=false&s3-ops-bucket=automq-ops&cluster-id=40ErA_nGQ_qNPDz0uodTEA" --override process.roles=broker,controller --override node.id=1 --override [email protected]:9093,[email protected]:9093,[email protected]:9093 --override listeners=PLAINTEXT://192.168.0.2:9092,CONTROLLER://192.168.0.2:9093 --override advertised.listeners=PLAINTEXT://192.168.0.2:9092
bin/kafka-server-start.sh --s3-url="s3://s3.cn-northwest-1.amazonaws.com.cn?s3-access-key=XXX&s3-secret-key=YYY&s3-region=cn-northwest-1&s3-endpoint-protocol=https&s3-data-bucket=automq-data&s3-path-style=false&s3-ops-bucket=automq-ops&cluster-id=40ErA_nGQ_qNPDz0uodTEA" --override process.roles=broker,controller --override node.id=2 --override [email protected]:9093,[email protected]:9093,[email protected]:9093 --override listeners=PLAINTEXT://192.168.0.3:9092,CONTROLLER://192.168.0.3:9093 --override advertised.listeners=PLAINTEXT://192.168.0.3:9092
bin/kafka-server-start.sh --s3-url="s3://s3.cn-northwest-1.amazonaws.com.cn?s3-access-key=XXX&s3-secret-key=YYY&s3-region=cn-northwest-1&s3-endpoint-protocol=https&s3-data-bucket=automq-data&s3-path-style=false&s3-ops-bucket=automq-ops&cluster-id=40ErA_nGQ_qNPDz0uodTEA" --override process.roles=broker --override node.id=3 --override [email protected]:9093,[email protected]:9093,[email protected]:9093 --override listeners=PLAINTEXT://192.168.0.4:9092 --override advertised.listeners=PLAINTEXT://192.168.0.4:9092
bin/kafka-server-start.sh --s3-url="s3://s3.cn-northwest-1.amazonaws.com.cn?s3-access-key=XXX&s3-secret-key=YYY&s3-region=cn-northwest-1&s3-endpoint-protocol=https&s3-data-bucket=automq-data&s3-path-style=false&s3-ops-bucket=automq-ops&cluster-id=40ErA_nGQ_qNPDz0uodTEA" --override process.roles=broker --override node.id=4 --override [email protected]:9093,[email protected]:9093,[email protected]:9093 --override listeners=PLAINTEXT://192.168.0.5:9092 --override advertised.listeners=PLAINTEXT://192.168.0.5:9092
TIPS: Start controllers first and then the brokers.
node.id is automatically generated starting from 0.
To start the cluster, sequentially execute the command list from the previous step on the designated CONTROLLER or BROKER hosts. For example, to start the first CONTROLLER process on 192.168.0.1, execute the first command from the generated startup command list.
bin/kafka-server-start.sh --s3-url="s3://s3.cn-northwest-1.amazonaws.com.cn?s3-access-key=XXX&s3-secret-key=YYY&s3-region=cn-northwest-1&s3-endpoint-protocol=https&s3-data-bucket=automq-data&s3-path-style=false&s3-ops-bucket=automq-ops&cluster-id=40ErA_nGQ_qNPDz0uodTEA" --override process.roles=broker,controller --override node.id=0 --override [email protected]:9093,[email protected]:9093,[email protected]:9093 --override listeners=PLAINTEXT://192.168.0.1:9092,CONTROLLER://192.168.0.1:9093 --override advertised.listeners=PLAINTEXT://192.168.0.1:9092
When using the startup command, unspecified parameters will adopt Apache Kafka's default configurations. For parameters newly added by AutoMQ, AutoMQ's default values will be used. To override the default configurations, you can add additional --override key=value parameters at the end of the command.
Parameter Name |
Required |
Description |
---|---|---|
s3-url |
Yes |
Generated by the bin/automq-kafka-admin.sh generate-s3-url command line tool, containing authentication, cluster ID, and other information |
process.roles |
Yes |
Options are CONTROLLER or BROKER. If a host serves as both CONTROLLER and BROKER, the configuration value should be CONTROLLER,BROKER. |
node.id |
Yes |
An integer used to uniquely identify a BROKER or CONTROLLER within the Kafka cluster. It must be unique within the cluster. |
controller.quorum.voters |
Yes |
Information of hosts participating in KRAFT elections, including nodeid, ip, and port information. For example: [email protected]:9093, [email protected]:9093, [email protected]:9093 |
listeners |
Yes |
IP and port being listened to |
advertised.listeners |
Yes |
Access addresses provided by BROKER for Clients. |
log.dirs |
No |
Directory storing KRAFT and BROKER metadata. |
s3.wal.path |
No |
In a production environment, it is recommended to store AutoMQ WAL data on a newly mounted bare device for better performance. AutoMQ supports writing data to bare devices, reducing latency. Ensure the correct path is configured to store WAL data. |
autobalancer.controller.enable |
No |
Default value is false, which disables traffic self-balancing. When enabled, AutoMQ's auto balancer component will automatically reassign partitions to ensure overall traffic is balanced. |
Tips: To enable continuous traffic self-balancing or run Example: Self-Balancing When Cluster Nodes Change, it is recommended to explicitly specify the parameter --override autobalancer.controller.enable=true when starting the Controller.
If you need to run in the background mode, please add the following code at the end of the command:
command > /dev/null 2>&1 &
You can view the local data volume using the lsblk
command in Linux. The unpartitioned block device is the data volume. In the following example, vdb
is the unpartitioned raw block device.
vda 253:0 0 20G 0 disk
├─vda1 253:1 0 2M 0 part
├─vda2 253:2 0 200M 0 part /boot/efi
└─vda3 253:3 0 19.8G 0 part /
vdb 253:16 0 20G 0 disk
By default, AutoMQ stores metadata and WAL data in the /tmp
directory. However, it's important to note that if the /tmp
directory is mounted on tmpfs
, it is not suitable for a production environment.
For a more suitable production or formal testing environment, it is recommended to modify the configuration as follows: set the metadata directory log.dirs
and the WAL data directory s3.wal.path
(the raw block device for the write data disk) to other locations.
bin/kafka-server-start.sh ...\
--override s3.telemetry.metrics.exporter.type=prometheus \
--override s3.metrics.exporter.prom.host=0.0.0.0 \
--override s3.metrics.exporter.prom.port=9090 \
--override log.dirs=/root/kraft-logs \
--override s3.wal.path=/dev/vdb \
> /dev/null 2>&1 &
Tips:
Please change
s3.wal.path
to the actual local raw device name. To set AutoMQ's Write-Ahead-Log (WAL) to local SSD storage, you need to ensure that the specified file path is on an SSD with more than 10GB of available space. For example,--override s3.wal.path=/home/admin/automq-wal
.When deploying AutoMQ in a private data center for production, ensure the reliability of the local SSD, such as using RAID technology.
At this point, you have completed the AutoMQ cluster deployment based on MinIO, and you have a low-cost, low-latency, second-level elastic Kafka cluster. If you want to further experience AutoMQ's second-level partition reassignment and continuous self-balancing features, you can refer to the official example.
- What is automq: Overview
- Difference with Apache Kafka
- Difference with WarpStream
- Difference with Tiered Storage
- Compatibility with Apache Kafka
- Licensing
- Deploy Locally
- Cluster Deployment on Linux
- Cluster Deployment on Kubernetes
- Example: Produce & Consume Message
- Example: Simple Benchmark
- Example: Partition Reassignment in Seconds
- Example: Self Balancing when Cluster Nodes Change
- Example: Continuous Data Self Balancing
-
S3stream shared streaming storage
-
Technical advantage
- Deployment: Overview
- Runs on Cloud
- Runs on CEPH
- Runs on CubeFS
- Runs on MinIO
- Runs on HDFS
- Configuration
-
Data analysis
-
Object storage
-
Kafka ui
-
Observability
-
Data integration