Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(s3): add doc path & format content #617

Merged
merged 1 commit into from
Sep 20, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions directory.json
Original file line number Diff line number Diff line change
Expand Up @@ -467,6 +467,10 @@
{
"title": "GreptimeDB",
"path": "data_integration/greptimedb"
},
{
"title": "Amazon S3",
"path": "data_integration/s3"
}
]
},
Expand Down Expand Up @@ -1196,6 +1200,10 @@
{
"title": "Azure Event Hubs",
"path": "data_integration/azure_event_hubs"
},
{
"title": "Amazon S3",
"path": "data_integration/s3"
}
]
},
Expand Down
73 changes: 33 additions & 40 deletions en_US/data_integration/s3.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ This page provides a detailed introduction to the data integration between EMQX

Amazon S3 data integration in EMQX Platform is a ready-to-use feature that can be easily configured for complex business development. In a typical IoT application, EMQX Platform acts as the IoT platform responsible for device connectivity and message transmission, while Amazon S3 serves as the data storage platform, handling message data storage.

![EMQX Platform-integration-s3](./_assets/EMQX%20Platform-integration-s3.jpg)
![EMQX Platform-integration-s3](./_assets/data-integration-s3.jpg)

EMQX Platform utilizes rules and Sinks to forward device events and data to Amazon S3. Applications can read data from Amazon S3 for further data applications. The specific workflow is as follows:

Expand Down Expand Up @@ -43,6 +43,7 @@ This section introduces the preparations required before creating an Amazon S3 S
- Understand [data integration](./introduction.md).

### Network settings

Since EMQX accesses Amazon S3 through public network, you need to enable [NAT Gateway](../vas/nat-gateway.md) in your deployment. Click **VAS** from the top menu bar and select the NAT Gateway card, or you can select **Enable NAT Gateway service** in the bottom tab bar of the Deployment Overview page.

### Prepare an S3 Bucket
Expand All @@ -51,39 +52,32 @@ EMQX Platform supports Amazon S3 and other S3-compatible storage services. Here

1. In the [AWS S3 Console](https://console.amazonaws.cn/s3/home), click the **Create bucket** button. Follow the instructions to enter the relevant information, such as bucket name and region, to create an S3 bucket. For detailed operations, refer to the [AWS Documentation](https://docs.amazonaws.cn/AmazonS3/latest/userguide/creating-bucket.html).
2. Set bucket permissions. After the bucket is created successfully, select the bucket and click the **Permissions** tab. Based on your needs, you can set the bucket to public read/write, private, or other permissions.Setting bucket access can be referenced in the following JSON:
```json
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "Stmt1ListBucket",
"Effect": "Allow",
"Action": [
"s3:ListBucket"
],
"Resource": [
"arn:aws:s3:::emqx-cloud-s3-connector-test"
]
},
{
"Sid": "Stmt2GetAndPutObject",
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:PutObject"
],
"Resource": [
"arn:aws:s3:::emqx-cloud-s3-connector-test/*"
]
},
{
"Effect": "Allow",
"Action": "s3:ListAllMyBuckets",
"Resource": "*"
}
]
}
```

```json
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "Stmt1ListBucket",
"Effect": "Allow",
"Action": ["s3:ListBucket"],
"Resource": ["arn:aws:s3:::emqx-cloud-s3-connector-test"]
},
{
"Sid": "Stmt2GetAndPutObject",
"Effect": "Allow",
"Action": ["s3:GetObject", "s3:PutObject"],
"Resource": ["arn:aws:s3:::emqx-cloud-s3-connector-test/*"]
},
{
"Effect": "Allow",
"Action": "s3:ListAllMyBuckets",
"Resource": "*"
}
]
}
```

3. Obtain access keys. In the AWS Console, search for and select the **IAM** service. Create a new user for S3 and obtain the Access Key and Secret Key.

With the Amazon S3 bucket created and configured, you are now ready to create an Amazon S3 Sink in EMQX Platform.
Expand Down Expand Up @@ -175,17 +169,16 @@ This section demonstrates how to create a rule in EMQX Platform to process messa
- **Max Records**: When the maximum number of records is reached, the aggregation of a single file will be completed and uploaded, resetting the time interval.

- **Time Interval**: When the time interval is reached, even if the maximum number of records has not been reached, the aggregation of a single file will be completed and uploaded, resetting the maximum number of records.
- **Min Part Size**: The minimum chunk size for part uploads after aggregation is complete. The data to be uploaded will accumulate in memory until it reaches this size.
- **Max Part Size**: The maximum chunk size for part uploads. The S3 Sink will not attempt to upload parts exceeding this size.
- **Min Part Size**: The minimum chunk size for part uploads after aggregation is complete. The data to be uploaded will accumulate in memory until it reaches this size.
- **Max Part Size**: The maximum chunk size for part uploads. The S3 Sink will not attempt to upload parts exceeding this size.

:::
:::

::::

8. Expand **Advanced Settings** and configure the advanced setting options as needed (optional). For more details, refer to [Advanced Settings](#advanced-settings).

8. Expand **Advanced Settings** and configure the advanced setting options as needed (optional). For more details, refer to [Advanced Settings](#advanced-settings).

9. Click the **Confirm** button to complete the rule creation.
9. Click the **Confirm** button to complete the rule creation.

10. In the **Successful new rule** pop-up, click **Back to Rules**, thus completing the entire data integration configuration chain.

Expand Down
103 changes: 46 additions & 57 deletions zh_CN/data_integration/s3.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,12 +4,11 @@

本页详细介绍了 EMQX Platform 与 Amazon S3 的数据集成并提供了实用的规则和动作创建指导。


## 工作原理

Amazon S3 数据集成是 EMQX Platform 中开箱即用的功能,通过简单的配置即可实现复杂的业务开发。在一个典型的物联网应用中,EMQX Platform 作为物联网平台,负责接入设备进行消息传输,Amazon S3 作为数据存储平台,负责消息数据的存储。

![EMQX Platform Amazon S3 数据集成](./_assets/EMQX%20Platform-integration-s3.jpg)
![EMQX Platform Amazon S3 数据集成](./_assets/data-integration-s3.jpg)

EMQX Platform 通过规则和动作将设备事件和数据转发至 Amazon S3,应用读取 Amazon S3 中数据即可进行数据的应用。其具体的工作流程如下:

Expand Down Expand Up @@ -42,55 +41,47 @@ EMQX Platform 通过规则和动作将设备事件和数据转发至 Amazon S3

- 了解[数据集成](./introduction.md)。
- 了解[规则](./rules.md)。

### 网络设置
由于 EMQX 通过公网访问 Amazon S3,您需要在部署中开通 [NAT 网关](../vas/nat-gateway.md)。您可以在顶部菜单栏中的**增值服务**中选择 NAT 网关卡片,或者在部署概览底部标签栏中选择开通 NAT 网关服务。

由于 EMQX 通过公网访问 Amazon S3,您需要在部署中开通 [NAT 网关](../vas/nat-gateway.md)。您可以在顶部菜单栏中的**增值服务**中选择 NAT 网关卡片,或者在部署概览底部标签栏中选择开通 NAT 网关服务。

### 准备 S3 存储桶

EMQX Platform 支持 Amazon S3 以及兼容 S3 的存储服务,您可以使用 AWS 云服务创建 S3 存储桶。

1. 在 [AWS S3 控制台](https://console.amazonaws.cn/s3/home)中,点击**创建存储桶**按钮。然后按照向导的指示填写相关信息,如存储桶名称(例如 `emqx-cloud-s3-connector-test`)、区域等,创建一个 S3 存储桶。详细操作可参考 [AWS 文档](https://docs.amazonaws.cn/AmazonS3/latest/userguide/creating-bucket.html)。
2. 设置存储桶权限:在存储桶创建成功后,选择该存储桶,并点击**权限**选项卡,根据需求可以为存储桶选择公共读写、私有等权限。设置存储桶访问权限可以参考以下 JSON:
```json
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "Stmt1ListBucket",
"Effect": "Allow",
"Action": [
"s3:ListBucket"
],
"Resource": [
"arn:aws:s3:::emqx-cloud-s3-connector-test"
]
},
{
"Sid": "Stmt2GetAndPutObject",
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:PutObject"
],
"Resource": [
"arn:aws:s3:::emqx-cloud-s3-connector-test/*"
]
},
{
"Effect": "Allow",
"Action": "s3:ListAllMyBuckets",
"Resource": "*"
}
]
}
```

```json
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "Stmt1ListBucket",
"Effect": "Allow",
"Action": ["s3:ListBucket"],
"Resource": ["arn:aws:s3:::emqx-cloud-s3-connector-test"]
},
{
"Sid": "Stmt2GetAndPutObject",
"Effect": "Allow",
"Action": ["s3:GetObject", "s3:PutObject"],
"Resource": ["arn:aws:s3:::emqx-cloud-s3-connector-test/*"]
},
{
"Effect": "Allow",
"Action": "s3:ListAllMyBuckets",
"Resource": "*"
}
]
}
```

3. 获取访问密钥:在 AWS 控制台中,搜索并选择 **IAM** 服务,为 S3 创建一个新的用户,获取 Access Key 和 Secret Key。

至此,您已经完成了 S3 存储桶的创建与配置,接下来我们将在 EMQX Platform 中创建 Amazon S3 动作。


## 创建连接器

在创建数据集成的规则之前,您需要先创建一个 Amazon S3 连接器用于访问 S3 服务。
Expand Down Expand Up @@ -132,10 +123,8 @@ EMQX Platform 支持 Amazon S3 以及兼容 S3 的存储服务,您可以使用

3. 点击**下一步**开始创建包含 Amazon S3 Sink 的动作。


4. 从**使用连接器**下拉框中选择您之前创建的连接器。


5. 设置**存储桶**,此处输入 `emqx-cloud-s3-connector-test`,此处也支持 `${var}` 格式的占位符,但要注意需要在 S3 中预先创建好对应名称的存储桶。

6. 根据情况选择 **ACL**,指定上传对象的访问权限。
Expand All @@ -144,27 +133,27 @@ EMQX Platform 支持 Amazon S3 以及兼容 S3 的存储服务,您可以使用

- **直接上传**:每次规则触发时,按照预设的对象键和内容直接上传到 S3,适合存储二进制或体积较大的文本数据。这种方法可能会生成大量的文件。
- **聚合上传**:将多次规则触发的结果打包为一个文件(如 CSV 文件)并上传到 S3,适合存储结构化数据。这种方法可以减少文件数量,提高写入效率。

两种方式配置的参数不同,请根据所选方式进行配置:

:::: tabs type:card

::: tab 直接上传

直接上传需要配置以下字段:

- **对象键**:定义了要上传到存储桶中的对象的位置。它支持 `${var}` 格式的占位符,并可以使用 `/` 来指定存储目录。通常还需要设定对象的后缀名,以便于管理和区分。在此,我们输入 `msgs/${clientid}_${timestamp}.json`,其中 `${clientid}` 是客户端 ID,`${timestamp}` 是消息的时间戳。这样做可以确保每个设备的消息都被写入到不同的对象中。

- **对象内容**:默认情况下,它是包含所有字段的 JSON 文本格式。它支持使用 `${var}` 格式的占位符,此处我们输入 `${payload}` 表示将消息体作为对象内容。这时,对象的存储格式将取决于消息体的格式,支持压缩包、图片或其他二进制格式。

:::

::: tab 聚合上传

需要设置以下参数:

- **对象键**:用于指定对象的存储路径,可以使用以下变量:

- **`${action}`**:动作名称(必需)。
- **`${node}`**:执行上传的 EMQX 节点名称(必需)。
- **`${datetime.{format}}`**:聚合开始的日期和时间,格式根据 `{format}` 字符串指定(必需):
Expand All @@ -173,15 +162,15 @@ EMQX Platform 支持 Amazon S3 以及兼容 S3 的存储服务,您可以使用
- **`${datetime.unix}`**:Unix 时间戳。
- **`${datetime_until.{format}}`**:聚合结束的日期和时间,格式选项与上述相同。
- **`${sequence}`**:相同时间间隔内聚合上传的序列号(必需)。

请注意,如果模板中没有使用所有标记为必需的占位符,这些占位符将作为路径后缀自动添加到 S3 对象键中,以避免重复。所有其他占位符均视为无效。

- **聚合方式**:目前仅支持 CSV。数据将以逗号分隔的 CSV 格式写入到 S3。

- **列排序**:通过下拉选择调整规则结果列的顺序。生成的 CSV 文件将首先按所选列排序,未选中的列将按字典顺序排在所选列之后。

- **最大记录数**:达到最大记录数时将完成单个文件的聚合进行上传,并重置时间间隔。

- **时间间隔**:达到时间间隔时,即使未达到最大记录数,也会完成单个文件的聚合进行上传,并重置最大记录数。

- **最小分片大小** 聚合完成后的分片上传的最小块大小,上传的数据将在内存中累积,直到达到此大小。默认`5MB`
Expand All @@ -192,10 +181,10 @@ EMQX Platform 支持 Amazon S3 以及兼容 S3 的存储服务,您可以使用

::::

8. 展开**高级设置**,根据需要配置高级设置选项(可选),详细请参考[高级设置](#高级设置)。
8. 展开**高级设置**,根据需要配置高级设置选项(可选),详细请参考[高级设置](#高级设置)。

9. 点击**确认**按钮完成动作的配置。

10. 在弹出的**成功创建规则**提示框中点击**返回规则列表**,从而完成了整个数据集成的配置链路。

## 测试规则
Expand Down