Skip to content

Apache Doris

lyx edited this page Jan 17, 2025 · 1 revision

Apache Doris is a high-performance, real-time analytical database based on MPP architecture, known for its exceptional ease of use and sub-second response times for queries on massive datasets. It supports both high-concurrency point query scenarios and high-throughput complex analytical scenarios. Consequently, Apache Doris is well-suited for use cases such as report analysis, ad-hoc queries, unified data warehouse construction, and data lake federation query acceleration. Users can build applications on top of this, such as user behavior analysis, A/B testing platforms, log search analysis, user profiling, and order analysis.

This article will introduce how to use Apache Doris Routine Load to import data from AutoMQ into Apache Doris. For detailed information about Routine Load, please refer to the Routine Load Basic Principles documentation.

Environment Preparation

Prepare Apache Doris and Test Data

Ensure an available Apache Doris cluster is prepared. For demonstration purposes, we have deployed a test Apache Doris environment on Linux, following the Docker Deployment of Doris documentation.

Create a database and test table:



create database automq_db;
CREATE TABLE automq_db.users (
                                 id bigint NOT NULL,
                                 name string NOT NULL,
                                 timestamp string NULL,
                                 status string NULL

) DISTRIBUTED BY hash (id) PROPERTIES ('replication_num' = '1');

Prepare Kafka Command-line Tools

Download the latest TGZ package from AutoMQ Releases and extract it. Assuming the extraction directory is $AUTOMQ_HOME, this article will use the tools in $AUTOMQ_HOME/bin to create topics and generate test data.

Prepare AutoMQ and Test Data

Refer to the AutoMQ official deployment documentation to deploy a functional cluster. Ensure that the network connectivity between AutoMQ and Apache Doris is maintained.

Quickly create a topic named example_topic in AutoMQ and write a test JSON data into it by following the steps below.

Create Topic

Use the Apache Kafka command-line tool to create the topic. Make sure you have access to the Kafka environment and that the Kafka service is running. Here is an example command for creating a topic:


$AUTOMQ_HOME/bin/kafka-topics.sh --create --topic exampleto_topic --bootstrap-server 127.0.0.1:9092  --partitions 1 --replication-factor 1

When executing the command, replace topic and bootstrap-server with the actual AutoMQ Bootstrap Server address.

After creating the topic, you can use the following command to verify if the topic was successfully created.


$AUTOMQ_HOME/bin/kafka-topics.sh --describe example_topic --bootstrap-server 127.0.0.1:9092

Generate Test Data

Generate a JSON formatted test data that corresponds with the previous table.


{
  "id": 1,
  "name": "Test User"
  "timestamp": "2023-11-10T12:00:00",
  "status": "active"
}

Write Test Data

Use Kafka command line tools or programming methods to write the test data into a Topic named example_topic. Below is an example using the command line tool:


echo '{"id": 1, "name": "Test User", "timestamp": "2023-11-10T12:00:00", "status": "active"}' | sh kafka-console-producer.sh --broker-list 127.0.0.1:9092 --topic example_topic

Use the following command to view the data just written to the topic:


sh $AUTOMQ_HOME/bin/kafka-console-consumer.sh --bootstrap-server 127.0.0.1:9092 --topic example_topic --from-beginning

When executing the command, you need to replace the topic and bootstrap-server with the actual AutoMQ Bootstrap Server address.

Create Routine Load Import Job

In the Apache Doris command line, create a Routine Load job that receives JSON data to continuously import data from an AutoMQ Kafka topic. For detailed parameters of Routine Load, please refer to Doris Routine Load.


CREATE ROUTINE LOAD automq_example_load ON users
COLUMNS(id, name, timestamp, status)
PROPERTIES
(
    "format" = "json",
    "jsonpaths" = "[\"$.id\",\"$.name\",\"$.timestamp\",\"$.status\"]"
 )
FROM KAFKA
(
    "kafka_broker_list" = "127.0.0.1:9092",
    "kafka_topic" = "example_topic",
    "property.kafka_default_offsets" = "OFFSET_BEGINNING"
);

When executing the command, replace kafka_broker_list with the actual AutoMQ Bootstrap Server address being used.

Verify Data Import

First, check the status of the Routine Load job to ensure the task is running.


show routine load\G;

Then, query the relevant table in the Apache Doris database to confirm that the data has been successfully imported.


select * from users;
+------+--------------+---------------------+--------+
| id   | name         | timestamp           | status |
+------+--------------+---------------------+--------+
|    1 | Test User     | 2023-11-10T12:00:00 | active |
|    2 | Test User     | 2023-11-10T12:00:00 | active |
+------+--------------+---------------------+--------+
2 rows in set (0.01 sec)

AutoMQ Wiki Key Pages

What is automq

Getting started

Architecture

Deployment

Migration

Observability

Integrations

Releases

Benchmarks

Reference

Articles

Clone this wiki locally