forked from tbs005/DataX
-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add s3 writer #1
Open
Chyroc-MD
wants to merge
2
commits into
master
Choose a base branch
from
Add/s3writer
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
2 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,207 @@ | ||
# DataX S3Writer 说明 | ||
|
||
|
||
------------ | ||
|
||
## 1 快速介绍 | ||
|
||
S3Writer提供了向S3写入类CSV格式的一个或者多个表文件。 | ||
|
||
**写入S3文件内容存放的是一张逻辑意义上的二维表,例如CSV格式的文本信息。** | ||
|
||
|
||
## 2 功能与限制 | ||
|
||
S3Writer实现了从DataX协议转为S3TXT文件功能,S3文件本身是无结构化数据存储,S3Writer如下几个方面约定: | ||
|
||
1. 支持且仅支持写入 TXT的文件,且要求TXT中shema为一张二维表。 | ||
|
||
2. 支持类CSV格式文件,自定义分隔符。 | ||
|
||
3. 支持文本压缩,现有压缩格式为gzip、bzip2。 | ||
|
||
6. 支持多线程写入,每个线程写入不同子文件。 | ||
|
||
7. 文件支持滚动,当文件大于某个size值或者行数值,文件需要切换。 [暂不支持] | ||
|
||
我们不能做到: | ||
|
||
1. 单个文件不能支持并发写入。 | ||
|
||
|
||
## 3 功能说明 | ||
|
||
|
||
### 3.1 配置样例 | ||
|
||
```json | ||
{ | ||
{ | ||
"job": { | ||
"content": [ | ||
{ | ||
"reader": { | ||
"name": "mysqlreader", | ||
"parameter": { | ||
"column": ["*"], | ||
"connection": [ | ||
{ | ||
"jdbcUrl": ["jdbc:mysql://xxx:3306/xxx"], | ||
"table": ["yyy"] | ||
} | ||
], | ||
"password": "root", | ||
"username": "root", | ||
"where": "" | ||
} | ||
}, | ||
"writer": { | ||
"name": "s3writer", | ||
"parameter": { | ||
"s3Bucket": "xxx", | ||
"s3AccessKey": "xxx", | ||
"s3SecretKey": "xxx+", | ||
"s3Endpoint": "s3.cn-north-1.amazonaws.com.cn", | ||
|
||
"dateFormat": "", | ||
"fieldDelimiter": ",", | ||
"fileName": "yyy", | ||
"path": "xxx/xxx", | ||
"writeMode": "truncate" | ||
} | ||
} | ||
} | ||
], | ||
"setting": { | ||
"speed": { | ||
"channel": 10 | ||
} | ||
} | ||
} | ||
} | ||
} | ||
``` | ||
|
||
### 3.2 参数说明 | ||
|
||
* **path** | ||
|
||
* 描述:S3文件系统的路径信息,S3Writer会写入Path目录下属多个文件。 <br /> | ||
|
||
* 必选:是 <br /> | ||
|
||
* 默认值:无 <br /> | ||
|
||
* **fileName** | ||
|
||
* 描述:S3Writer写入的文件名,该文件名会添加随机的后缀作为每个线程写入实际文件名。 <br /> | ||
|
||
* 必选:是 <br /> | ||
|
||
* 默认值:无 <br /> | ||
|
||
* **writeMode** | ||
|
||
* 描述:S3Writer写入前数据清理处理模式: <br /> | ||
|
||
* truncate,写入前清理目录下一fileName前缀的所有文件。 | ||
* append,写入前不做任何处理,DataX S3Writer直接使用filename写入,并保证文件名不冲突。 | ||
* nonConflict,如果目录下有fileName前缀的文件,直接报错。 | ||
|
||
* 必选:是 <br /> | ||
|
||
* 默认值:无 <br /> | ||
|
||
* **fieldDelimiter** | ||
|
||
* 描述:读取的字段分隔符 <br /> | ||
|
||
* 必选:否 <br /> | ||
|
||
* 默认值:, <br /> | ||
|
||
* **compress** | ||
|
||
* 描述:文本压缩类型,默认不填写意味着没有压缩。支持压缩类型为zip、lzo、lzop、tgz、bzip2。 <br /> | ||
|
||
* 必选:否 <br /> | ||
|
||
* 默认值:无压缩 <br /> | ||
|
||
* **encoding** | ||
|
||
* 描述:读取文件的编码配置。<br /> | ||
|
||
* 必选:否 <br /> | ||
|
||
* 默认值:utf-8 <br /> | ||
|
||
|
||
* **nullFormat** | ||
|
||
* 描述:文本文件中无法使用标准字符串定义null(空指针),DataX提供nullFormat定义哪些字符串可以表示为null。<br /> | ||
|
||
例如如果用户配置: nullFormat="\N",那么如果源头数据是"\N",DataX视作null字段。 | ||
|
||
* 必选:否 <br /> | ||
|
||
* 默认值:\N <br /> | ||
|
||
* **dateFormat** | ||
|
||
* 描述:日期类型的数据序列化到文件中时的格式,例如 "dateFormat": "yyyy-MM-dd"。<br /> | ||
|
||
* 必选:否 <br /> | ||
|
||
* 默认值:无 <br /> | ||
|
||
* **fileFormat** | ||
|
||
* 描述:文件写出的格式,包括csv (http://zh.wikipedia.org/wiki/%E9%80%97%E5%8F%B7%E5%88%86%E9%9A%94%E5%80%BC) 和text两种,csv是严格的csv格式,如果待写数据包括列分隔符,则会按照csv的转义语法转义,转义符号为双引号";text格式是用列分隔符简单分割待写数据,对于待写数据包括列分隔符情况下不做转义。<br /> | ||
|
||
* 必选:否 <br /> | ||
|
||
* 默认值:text <br /> | ||
|
||
* **header** | ||
|
||
* 描述:txt写出时的表头,示例['id', 'name', 'age']。<br /> | ||
|
||
* 必选:否 <br /> | ||
|
||
* 默认值:无 <br /> | ||
|
||
### 3.3 类型转换 | ||
|
||
|
||
S3文件本身不提供数据类型,该类型是DataX S3Writer定义: | ||
|
||
| DataX 内部类型| S3文件 数据类型 | | ||
| -------- | ----- | | ||
| | ||
| Long |Long | | ||
| Double |Double| | ||
| String |String| | ||
| Boolean |Boolean | | ||
| Date |Date | | ||
|
||
其中: | ||
|
||
* S3文件 Long是指S3文件文本中使用整形的字符串表示形式,例如"19901219"。 | ||
* S3文件 Double是指S3文件文本中使用Double的字符串表示形式,例如"3.1415"。 | ||
* S3文件 Boolean是指S3文件文本中使用Boolean的字符串表示形式,例如"true"、"false"。不区分大小写。 | ||
* S3文件 Date是指S3文件文本中使用Date的字符串表示形式,例如"2014-12-31",Date可以指定format格式。 | ||
|
||
|
||
## 4 性能报告 | ||
|
||
|
||
## 5 约束限制 | ||
|
||
略 | ||
|
||
## 6 FAQ | ||
|
||
略 | ||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,88 @@ | ||
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" | ||
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> | ||
<modelVersion>4.0.0</modelVersion> | ||
<parent> | ||
<groupId>com.alibaba.datax</groupId> | ||
<artifactId>datax-all</artifactId> | ||
<version>0.0.1-SNAPSHOT</version> | ||
</parent> | ||
|
||
<artifactId>s3writer</artifactId> | ||
<name>s3writer</name> | ||
<description>S3Writer提供了本地写入TEXT功能,建议开发、测试环境使用。</description> | ||
<packaging>jar</packaging> | ||
|
||
<dependencies> | ||
<dependency> | ||
<groupId>com.alibaba.datax</groupId> | ||
<artifactId>datax-common</artifactId> | ||
<version>${datax-project-version}</version> | ||
<exclusions> | ||
<exclusion> | ||
<artifactId>slf4j-log4j12</artifactId> | ||
<groupId>org.slf4j</groupId> | ||
</exclusion> | ||
</exclusions> | ||
</dependency> | ||
<dependency> | ||
<groupId>com.alibaba.datax</groupId> | ||
<artifactId>plugin-unstructured-storage-util</artifactId> | ||
<version>${datax-project-version}</version> | ||
</dependency> | ||
<dependency> | ||
<groupId>org.slf4j</groupId> | ||
<artifactId>slf4j-api</artifactId> | ||
</dependency> | ||
<dependency> | ||
<groupId>ch.qos.logback</groupId> | ||
<artifactId>logback-classic</artifactId> | ||
</dependency> | ||
<dependency> | ||
<groupId>com.google.guava</groupId> | ||
<artifactId>guava</artifactId> | ||
<version>16.0.1</version> | ||
</dependency> | ||
<dependency> | ||
<groupId>com.amazonaws</groupId> | ||
<artifactId>aws-java-sdk-core</artifactId> | ||
<version>1.11.52</version> | ||
</dependency> | ||
<dependency> | ||
<groupId>com.amazonaws</groupId> | ||
<artifactId>aws-java-sdk-s3</artifactId> | ||
<version>1.11.52</version> | ||
</dependency> | ||
</dependencies> | ||
|
||
<build> | ||
<plugins> | ||
<!-- compiler plugin --> | ||
<plugin> | ||
<artifactId>maven-compiler-plugin</artifactId> | ||
<configuration> | ||
<source>1.6</source> | ||
<target>1.6</target> | ||
<encoding>${project-sourceEncoding}</encoding> | ||
</configuration> | ||
</plugin> | ||
<plugin> | ||
<artifactId>maven-assembly-plugin</artifactId> | ||
<configuration> | ||
<descriptors> | ||
<descriptor>src/main/assembly/package.xml</descriptor> | ||
</descriptors> | ||
<finalName>datax</finalName> | ||
</configuration> | ||
<executions> | ||
<execution> | ||
<id>dwzip</id> | ||
<phase>package</phase> | ||
<goals> | ||
<goal>single</goal> | ||
</goals> | ||
</execution> | ||
</executions> | ||
</plugin> | ||
</plugins> | ||
</build> | ||
</project> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,35 @@ | ||
<assembly | ||
xmlns="http://maven.apache.org/plugins/maven-assembly-plugin/assembly/1.1.0" | ||
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" | ||
xsi:schemaLocation="http://maven.apache.org/plugins/maven-assembly-plugin/assembly/1.1.0 http://maven.apache.org/xsd/assembly-1.1.0.xsd"> | ||
<id></id> | ||
<formats> | ||
<format>dir</format> | ||
</formats> | ||
<includeBaseDirectory>false</includeBaseDirectory> | ||
<fileSets> | ||
<fileSet> | ||
<directory>src/main/resources</directory> | ||
<includes> | ||
<include>plugin.json</include> | ||
<include>plugin_job_template.json</include> | ||
</includes> | ||
<outputDirectory>plugin/writer/s3writer</outputDirectory> | ||
</fileSet> | ||
<fileSet> | ||
<directory>target/</directory> | ||
<includes> | ||
<include>s3writer-0.0.1-SNAPSHOT.jar</include> | ||
</includes> | ||
<outputDirectory>plugin/writer/s3writer</outputDirectory> | ||
</fileSet> | ||
</fileSets> | ||
|
||
<dependencySets> | ||
<dependencySet> | ||
<useProjectArtifact>false</useProjectArtifact> | ||
<outputDirectory>plugin/writer/s3writer/libs</outputDirectory> | ||
<scope>runtime</scope> | ||
</dependencySet> | ||
</dependencySets> | ||
</assembly> |
6 changes: 6 additions & 0 deletions
6
s3writer/src/main/java/com/alibaba/datax/plugin/writer/s3writer/Key.java
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
package com.alibaba.datax.plugin.writer.s3writer; | ||
|
||
public class Key { | ||
// must have | ||
public static final String PATH = "path"; | ||
} |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个版本感觉有点太低了