Skip to content

Commit

Permalink
Merge pull request #1490 from dingxiaobo/starrocks_plugin
Browse files Browse the repository at this point in the history
Merge StarRocks plugin
  • Loading branch information
TrafalgarLuo authored Aug 26, 2022
2 parents 2151860 + b49ceb1 commit ced5a45
Show file tree
Hide file tree
Showing 20 changed files with 1,538 additions and 2 deletions.
7 changes: 7 additions & 0 deletions package.xml
Original file line number Diff line number Diff line change
Expand Up @@ -231,6 +231,13 @@
</includes>
<outputDirectory>datax</outputDirectory>
</fileSet>
<fileSet>
<directory>starrockswriter/target/datax/</directory>
<includes>
<include>**/*.*</include>
</includes>
<outputDirectory>datax</outputDirectory>
</fileSet>
<fileSet>
<directory>drdswriter/target/datax/</directory>
<includes>
Expand Down
5 changes: 3 additions & 2 deletions pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -80,15 +80,14 @@

<!-- writer -->
<module>mysqlwriter</module>
<module>starrockswriter</module>
<module>drdswriter</module>
<module>oraclewriter</module>
<module>sqlserverwriter</module>
<module>postgresqlwriter</module>
<module>kingbaseeswriter</module>
<module>adswriter</module>
<module>oceanbasev10writer</module>
<module>cassandrawriter</module>
<module>clickhousewriter</module>
<module>adbpgwriter</module>
<module>hologresjdbcwriter</module>
<module>rdbmswriter</module>
Expand All @@ -114,6 +113,8 @@
<module>tsdbwriter</module>
<module>gdbwriter</module>
<module>oscarwriter</module>
<module>cassandrawriter</module>
<module>clickhousewriter</module>

<!-- common support module -->
<module>plugin-rdbms-util</module>
Expand Down
218 changes: 218 additions & 0 deletions starrockswriter/doc/starrockswriter.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,218 @@
# DataX StarRocksWriter


---


## 1 快速介绍

StarRocksWriter 插件实现了写入数据到 StarRocks 主库的目的表的功能。在底层实现上, StarRocksWriter 通过Streamload以csv格式导入数据至StarRocks。


## 2 实现原理

StarRocksWriter 通过Streamload以csv格式导入数据至StarRocks, 内部将`reader`读取的数据进行缓存后批量导入至StarRocks,以提高写入性能。


## 3 功能说明

### 3.1 配置样例

* 这里使用一份从内存Mysql读取数据后导入至StarRocks。

```json
{
"job": {
"setting": {
"speed": {
"channel": 1
},
"errorLimit": {
"record": 0,
"percentage": 0
}
},
"content": [
{
"reader": {
"name": "mysqlreader",
"parameter": {
"username": "xxxx",
"password": "xxxx",
"column": [ "k1", "k2", "v1", "v2" ],
"connection": [
{
"table": [ "table1", "table2" ],
"jdbcUrl": [
"jdbc:mysql://127.0.0.1:3306/datax_test1"
]
},
{
"table": [ "table3", "table4" ],
"jdbcUrl": [
"jdbc:mysql://127.0.0.1:3306/datax_test2"
]
}
]
}
},
"writer": {
"name": "starrockswriter",
"parameter": {
"username": "xxxx",
"password": "xxxx",
"database": "xxxx",
"table": "xxxx",
"column": ["k1", "k2", "v1", "v2"],
"preSql": [],
"postSql": [],
"jdbcUrl": "jdbc:mysql://172.28.17.100:9030/",
"loadUrl": ["172.28.17.100:8030", "172.28.17.100:8030"],
"loadProps": {}
}
}
}
]
}
}

```


### 3.2 参数说明

* **username**

* 描述:StarRocks数据库的用户名 <br />

* 必选:是 <br />

* 默认值:无 <br />

* **password**

* 描述:StarRocks数据库的密码 <br />

* 必选:是 <br />

* 默认值:无 <br />

* **database**

* 描述:StarRocks表的数据库名称。

* 必选:是 <br />

* 默认值:无 <br />

* **table**

* 描述:StarRocks表的表名称。

* 必选:是 <br />

* 默认值:无 <br />

* **loadUrl**

* 描述:StarRocks FE的地址用于Streamload,可以为多个fe地址,`fe_ip:fe_http_port`。

* 必选:是 <br />

* 默认值:无 <br />

* **column**

* 描述:目的表需要写入数据的字段,字段之间用英文逗号分隔。例如: "column": ["id","name","age"]。

**column配置项必须指定,不能留空!**

注意:我们强烈不推荐你这样配置,因为当你目的表字段个数、类型等有改动时,你的任务可能运行不正确或者失败

* 必选:是 <br />

* 默认值:否 <br />

* **preSql**

* 描述:写入数据到目的表前,会先执行这里的标准语句。 <br />

* 必选:否 <br />

* 默认值:无 <br />

* **postSql**

* 描述:写入数据到目的表后,会执行这里的标准语句。 <br />

* 必选:否 <br />

* 默认值:无 <br />

* **jdbcUrl**

* 描述:目的数据库的 JDBC 连接信息,用于执行`preSql`及`postSql`。 <br />

* 必选:否 <br />

* 默认值:无 <br />

* **maxBatchRows**

* 描述:单次StreamLoad导入的最大行数 <br />

* 必选:否 <br />

* 默认值:500000 (50W) <br />

* **maxBatchSize**

* 描述:单次StreamLoad导入的最大字节数。 <br />

* 必选:否 <br />

* 默认值:104857600 (100M)

* **flushInterval**

* 描述:上一次StreamLoad结束至下一次开始的时间间隔(单位:ms)。 <br />

* 必选:否 <br />

* 默认值:300000 (ms)

* **loadProps**

* 描述:StreamLoad 的请求参数,详情参照StreamLoad介绍页面。 <br />

* 必选:否 <br />

* 默认值:无 <br />


### 3.3 类型转换

默认传入的数据均会被转为字符串,并以`\t`作为列分隔符,`\n`作为行分隔符,组成`csv`文件进行StreamLoad导入操作。
如需更改列分隔符, 则正确配置 `loadProps` 即可:
```json
"loadProps": {
"column_separator": "\\x01",
"row_delimiter": "\\x02"
}
```

如需更改导入格式为`json`, 则正确配置 `loadProps` 即可:
```json
"loadProps": {
"format": "json",
"strip_outer_array": true
}
```

## 4 性能报告


## 5 约束限制


## FAQ
Loading

0 comments on commit ced5a45

Please sign in to comment.