Skip to content

Commit

Permalink
增加分布式分词预处理程序
Browse files Browse the repository at this point in the history
  • Loading branch information
JeremySun1224 authored Mar 28, 2020
1 parent cd9997f commit 30ff8f6
Showing 1 changed file with 2 additions and 1 deletion.
3 changes: 2 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
## 代码主要是对大数据集(5G以上)进行分句操作
## 代码主要是对较大型语料(约14G的语料)进行分布式清洗和分句分词操作
#### 代码包括:
##### 如何批量读取文件夹及子文件夹下的数据
##### 如何将批量整合文件夹及子文件下的数据
Expand All @@ -8,4 +8,5 @@
##### 利用*PyLTP*模块进行分句
##### 去掉文本空行函数
##### 计时装饰器以及代码进度条
##### 添加分布式分词处理类文件
**欢迎star和fork**

0 comments on commit 30ff8f6

Please sign in to comment.