-
Notifications
You must be signed in to change notification settings - Fork 235
Apache Doris
Apache Doris is a high-performance, real-time analytical database based on MPP architecture, known for its exceptional ease of use and sub-second response times for queries on massive datasets. It supports both high-concurrency point query scenarios and high-throughput complex analytical scenarios. Consequently, Apache Doris is well-suited for use cases such as report analysis, ad-hoc queries, unified data warehouse construction, and data lake federation query acceleration. Users can build applications on top of this, such as user behavior analysis, A/B testing platforms, log search analysis, user profiling, and order analysis.
This article will introduce how to use Apache Doris Routine Load to import data from AutoMQ into Apache Doris. For detailed information about Routine Load, please refer to the Routine Load Basic Principles documentation.
Ensure an available Apache Doris cluster is prepared. For demonstration purposes, we have deployed a test Apache Doris environment on Linux, following the Docker Deployment of Doris documentation.
Create a database and test table:
create database automq_db;
CREATE TABLE automq_db.users (
id bigint NOT NULL,
name string NOT NULL,
timestamp string NULL,
status string NULL
) DISTRIBUTED BY hash (id) PROPERTIES ('replication_num' = '1');
Download the latest TGZ package from AutoMQ Releases and extract it. Assuming the extraction directory is $AUTOMQ_HOME, this article will use the tools in $AUTOMQ_HOME/bin to create topics and generate test data.
Refer to the AutoMQ official deployment documentation to deploy a functional cluster. Ensure that the network connectivity between AutoMQ and Apache Doris is maintained.
Quickly create a topic named example_topic
in AutoMQ and write a test JSON data into it by following the steps below.
Use the Apache Kafka command-line tool to create the topic. Make sure you have access to the Kafka environment and that the Kafka service is running. Here is an example command for creating a topic:
$AUTOMQ_HOME/bin/kafka-topics.sh --create --topic exampleto_topic --bootstrap-server 127.0.0.1:9092 --partitions 1 --replication-factor 1
When executing the command, replace topic
and bootstrap-server
with the actual AutoMQ Bootstrap Server address.
After creating the topic, you can use the following command to verify if the topic was successfully created.
$AUTOMQ_HOME/bin/kafka-topics.sh --describe example_topic --bootstrap-server 127.0.0.1:9092
Generate a JSON formatted test data that corresponds with the previous table.
{
"id": 1,
"name": "Test User"
"timestamp": "2023-11-10T12:00:00",
"status": "active"
}
Use Kafka command line tools or programming methods to write the test data into a Topic named example_topic. Below is an example using the command line tool:
echo '{"id": 1, "name": "Test User", "timestamp": "2023-11-10T12:00:00", "status": "active"}' | sh kafka-console-producer.sh --broker-list 127.0.0.1:9092 --topic example_topic
Use the following command to view the data just written to the topic:
sh $AUTOMQ_HOME/bin/kafka-console-consumer.sh --bootstrap-server 127.0.0.1:9092 --topic example_topic --from-beginning
When executing the command, you need to replace the topic and bootstrap-server with the actual AutoMQ Bootstrap Server address.
In the Apache Doris command line, create a Routine Load job that receives JSON data to continuously import data from an AutoMQ Kafka topic. For detailed parameters of Routine Load, please refer to Doris Routine Load.
CREATE ROUTINE LOAD automq_example_load ON users
COLUMNS(id, name, timestamp, status)
PROPERTIES
(
"format" = "json",
"jsonpaths" = "[\"$.id\",\"$.name\",\"$.timestamp\",\"$.status\"]"
)
FROM KAFKA
(
"kafka_broker_list" = "127.0.0.1:9092",
"kafka_topic" = "example_topic",
"property.kafka_default_offsets" = "OFFSET_BEGINNING"
);
When executing the command, replace kafka_broker_list
with the actual AutoMQ Bootstrap Server address being used.
First, check the status of the Routine Load job to ensure the task is running.
show routine load\G;
Then, query the relevant table in the Apache Doris database to confirm that the data has been successfully imported.
select * from users;
+------+--------------+---------------------+--------+
| id | name | timestamp | status |
+------+--------------+---------------------+--------+
| 1 | Test User | 2023-11-10T12:00:00 | active |
| 2 | Test User | 2023-11-10T12:00:00 | active |
+------+--------------+---------------------+--------+
2 rows in set (0.01 sec)
- What is automq: Overview
- Difference with Apache Kafka
- Difference with WarpStream
- Difference with Tiered Storage
- Compatibility with Apache Kafka
- Licensing
- Deploy Locally
- Cluster Deployment on Linux
- Cluster Deployment on Kubernetes
- Example: Produce & Consume Message
- Example: Simple Benchmark
- Example: Partition Reassignment in Seconds
- Example: Self Balancing when Cluster Nodes Change
- Example: Continuous Data Self Balancing
-
S3stream shared streaming storage
-
Technical advantage
- Deployment: Overview
- Runs on Cloud
- Runs on CEPH
- Runs on CubeFS
- Runs on MinIO
- Runs on HDFS
- Configuration
-
Data analysis
-
Object storage
-
Kafka ui
-
Observability
-
Data integration