update to paddle 2.0

ywangzi · Feb 7, 2021 · 3b9d92f · 3b9d92f
1 parent 881bd97
commit 3b9d92f
Show file tree

Hide file tree

Showing 62 changed files with 4,672 additions and 3,185 deletions.
diff --git a/.github/stale.yml b/.github/stale.yml
@@ -15,4 +15,3 @@ markComment: >
   Thank you for your contributions.
 # Comment to post when closing a stale issue. Set to `false` to disable
 closeComment: false
-
diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
@@ -0,0 +1,17 @@
+# See https://pre-commit.com for more information
+# See https://pre-commit.com/hooks.html for more hooks
+repos:
+-   repo: https://github.com/PaddlePaddle/mirrors-yapf.git
+    rev: 0d79c0c469bab64f7229c9aca2b1186ef47f0e37
+    hooks:
+    -   id: yapf
+        files: (.*\.(py|bzl)|BUILD|.*\.BUILD|WORKSPACE)$
+-   repo: https://github.com/pre-commit/pre-commit-hooks
+    rev: 5bf6c09bfa1297d3692cadd621ef95f1284e33c0
+    hooks:
+    -   id: check-added-large-files
+    -   id: check-merge-conflict
+    -   id: check-symlinks
+    -   id: detect-private-key
+        files: (?!.*third_party)^.*$ | (?!.*book)^.*$
+    -   id: end-of-file-fixer
diff --git a/README.en.md b/README.en.md
@@ -11,7 +11,13 @@ ERNIE 2.0 builds a strong basic for nearly every NLP tasks: Text Classification,
 [\[more information\]](https://wenxin.baidu.com/)
 
 # News
-- Sept.24.2020: 
+
+- Dec.29.2020:
+ 	- Pretrain and finetune ERNIE with [PaddlePaddle v2.0](https://github.com/PaddlePaddle/Paddle/tree/release/2.0-rc).
+    - New AMP(auto mixed precision) feature for every demo in this repo.
+    - Introducing `Gradient accumulation`, run `ERNIE-large` with only 8G memory.
+
+- Sept.24.2020:
     - [`ERNIE-ViL`](https://github.com/PaddlePaddle/ERNIE/tree/repro/ernie-vil) is **avaliable** now!
         - A **knowledge-enhanced** joint representations for vision-language tasks.
             - Constructing three **Scene Graph Prediction** tasks utilizing structured knowledge.
@@ -20,28 +26,27 @@ ERNIE 2.0 builds a strong basic for nearly every NLP tasks: Text Classification,
 - May.20.2020:
 
     - Try ERNIE in "`dygraph`", with:
-    	- Pretrain and finetune ERNIE with [PaddlePaddle v1.8](https://github.com/PaddlePaddle/Paddle/tree/release/1.8).
     	- Eager execution with `paddle.fluid.dygraph`.
     	- Distributed training.
     	- Easy deployment.
     	- Learn NLP in Aistudio tutorials.
     	- Backward compatibility for old-styled checkpoint
-    
+
     - [`ERNIE-GEN`](https://github.com/PaddlePaddle/ERNIE/tree/repro/ernie-gen) is **avaliable** now! ([link here](https://github.com/PaddlePaddle/ERNIE/tree/repro/ernie-gen))
     	- the **state-of-the-art** pre-trained model for generation tasks, accepted by `IJCAI-2020`.
         	- A novel **span-by-span generation pre-training task**.
         	- An **infilling generation** echanism and a **noise-aware generation** method.
         	- Implemented by a carefully designed **Multi-Flow Attention** architecture.
     	- You are able to `download` all models including `base/large/large-430G`.
-  
+
 - Apr.30.2020: Release [ERNIESage](https://github.com/PaddlePaddle/PGL/tree/master/examples/erniesage), a novel Graph Neural Network Model using ERNIE as its aggregator. It is implemented through [PGL](https://github.com/PaddlePaddle/PGL)
 - Mar.27.2020: [Champion on 5 SemEval2020 sub tasks](https://www.jiqizhixin.com/articles/2020-03-27-8)
 - Dec.26.2019: [1st place on GLUE leaderboard](https://www.technologyreview.com/2019/12/26/131372/ai-baidu-ernie-google-bert-natural-language-glue/)
 - Nov.6.2019: [Introducing ERNIE-tiny](https://www.jiqizhixin.com/articles/2019-11-06-9)
 - Jul.7.2019: [Introducing ERNIE2.0](https://www.jiqizhixin.com/articles/2019-07-31-10)
 - Mar.16.2019: [Introducing ERNIE1.0](https://www.jiqizhixin.com/articles/2019-03-16-3)
 
-	
+
 # Table of contents
 * [Tutorials](#tutorials)
 * [Setup](#setup)
@@ -54,18 +59,16 @@ ERNIE 2.0 builds a strong basic for nearly every NLP tasks: Text Classification,
 
 ```python
 import numpy as np
-import paddle.fluid.dygraph as D
+import paddle as P
 from ernie.tokenizing_ernie import ErnieTokenizer
 from ernie.modeling_ernie import ErnieModel
 
-D.guard().__enter__() # activate paddle `dygrpah` mode
-
 model = ErnieModel.from_pretrained('ernie-1.0')    # Try to get pretrained model from server, make sure you have network connection
 model.eval()
 tokenizer = ErnieTokenizer.from_pretrained('ernie-1.0')
 
 ids, _ = tokenizer.encode('hello world')
-ids = D.to_variable(np.expand_dims(ids, 0))  # insert extra `batch` dimension
+ids = P.to_tensor(np.expand_dims(ids, 0))  # insert extra `batch` dimension
 pooled, encoded = model(ids)                 # eager execution
 print(pooled.numpy())                        # convert  results to numpy
 
@@ -95,7 +98,7 @@ This repo requires PaddlePaddle 1.7.0+, please see [here](https://www.paddlepadd
 pip install paddle-ernie
 ```
 
-or 
+or
 
 ```shell
 git clone https://github.com/PaddlePaddle/ERNIE.git --depth 1
@@ -117,10 +120,10 @@ pip install -e .
 | [ERNIE Gen Large 430G for English](https://ernie-github.cdn.bcebos.com/model-ernie-gen-large-430g-en.1.tar.gz)| Layer:24, Hidden:1024, Heads:16 + 430G pretrain corpus | ernie-gen-large-430g-en |
 
 ##### 4. download datasets
- 
+
 **English Datasets**
 
-Download the [GLUE datasets](https://gluebenchmark.com/tasks) by running [this script](https://gist.github.com/W4ngatang/60c2bdb54d156a41194446737ce03e2e) 
+Download the [GLUE datasets](https://gluebenchmark.com/tasks) by running [this script](https://gist.github.com/W4ngatang/60c2bdb54d156a41194446737ce03e2e)
 
 the `--data_dir` option in the following section assumes a directory tree like this:
 
@@ -152,11 +155,16 @@ see [demo](https://ernie-github.cdn.bcebos.com/data-mnli-m.tar.gz) data for MNLI
 - try eager execution with `dygraph model` :
 
 ```script
-python3 ./ernie_d/demo/finetune_classifier_dygraph.py \
+python3 ./demo/finetune_classifier.py \
        --from_pretrained ernie-1.0 \
-       --data_dir ./data/xnli  
+       --data_dir ./data/xnli
 ```
 
+  - specify `--use_amp` to activate AMP training.
+  - `--bsz` denotes global batch size for one optimization step, `--micro_bsz` denotes maximum batch size for each GPU device.
+if `--micro_bsz < --bsz`, gradient accumulation will be actiavted.
+
+
 - Distributed finetune
 
 `paddle.distributed.launch` is a process manager, we use it to launch python processes on each avalible GPU devices:
@@ -165,15 +173,15 @@ When in distributed training, `max_steps` is used as stopping criteria rather th
 You could calculate `max_steps` with `EPOCH * NUM_TRAIN_EXAMPLES / TOTAL_BATCH`.
 Also notice than we shard the train data according to device id to prevent over fitting.
 
-demo: 
-(make sure you have more than 2 GPUs, 
-online model download can not work in `paddle.distributed.launch`, 
-you need to run single card finetuning first to get pretrained model, or donwload and extract one manualy from [here](#section-pretrained-models)): 
+demo:
+(make sure you have more than 2 GPUs,
+online model download can not work in `paddle.distributed.launch`,
+you need to run single card finetuning first to get pretrained model, or donwload and extract one manualy from [here](#section-pretrained-models)):
 
 
 ```script
 python3 -m paddle.distributed.launch \
-./demo/finetune_classifier_dygraph_distributed.py \
+./demo/finetune_classifier_distributed.py  \
     --data_dir data/mnli \
     --max_steps 10000 \
     --from_pretrained ernie-2.0-en
@@ -182,11 +190,12 @@ python3 -m paddle.distributed.launch \
 
 many other demo python scripts:
 
-1. [Sentiment Analysis](./demo/finetune_sentiment_analysis_dygraph.py)
-1. [Semantic Similarity](./demo/finetune_classifier_dygraph.py)
-1. [Name Entity Recognition(NER)](./demo/finetune_ner_dygraph.py)
-1. [Machine Reading Comprehension](./demo/finetune_mrc_dygraph.py)
+1. [Sentiment Analysis](./demo/finetune_sentiment_analysis.py)
+1. [Semantic Similarity](./demo/finetune_classifier.py)
+1. [Name Entity Recognition(NER)](./demo/finetune_ner.py)
+1. [Machine Reading Comprehension](./demo/finetune_mrc.py)
 1. [Text generation](./demo/seq2seq/README.md)
+1. [Text classification with `paddle.static` API](./demo/finetune_classifier_static.py)
 
 
 
@@ -220,7 +229,7 @@ see [here](./demo/pretrain/README.md)
 
 # Online inference
 
-If `--inference_model_dir` is passed to `finetune_classifier_dygraph.py`, 
+If `--inference_model_dir` is passed to `finetune_classifier_dygraph.py`,
 a deployable model will be generated at the end of finetuning and your model is ready to serve.
 
 For details about online inferece, see [C++ inference API](./inference/README.md),
@@ -244,14 +253,14 @@ sids = np.expand_dims(sids, 0)
 result = client(ids, sids)
 ```
 
-A pre-made `inference model` for ernie-1.0 can be downloaded at [here](https://ernie.bj.bcebos.com/ernie1.0_zh_inference_model.tar.gz). 
+A pre-made `inference model` for ernie-1.0 can be downloaded at [here](https://ernie.bj.bcebos.com/ernie1.0_zh_inference_model.tar.gz).
 It can be used for feature-based finetuning or feature extraction.
 
 # Distillation
 
-Knowledge distillation is good way to compress and accelerate ERNIE. 
+Knowledge distillation is good way to compress and accelerate ERNIE.
 
-For details about distillation, see [here](./distill/README.md)
+For details about distillation, see [here](./demo/distill/README.md)
 
 # Citation
 
@@ -271,7 +280,7 @@ For details about distillation, see [here](./distill/README.md)
   title={ERNIE 2.0: A Continual Pre-training Framework for Language Understanding},
   author={Sun, Yu and Wang, Shuohuan and Li, Yukun and Feng, Shikun and Tian, Hao and Wu, Hua and Wang, Haifeng},
   journal={arXiv preprint arXiv:1907.12412},
-  year={2019} 
+  year={2019}
 }
 ```
 
@@ -306,4 +315,3 @@ For full reproduction of paper results, please checkout to `repro` branch of thi
 - QQ discussion group: 760439550 (ERNIE discussion group).
 - QQ discussion group: 958422639 (ERNIE discussion group-v2).
 - [Forums](http://ai.baidu.com/forum/topic/list/168?pageNo=1): discuss implementations, research, etc.
-
diff --git a/README.zh.md b/README.zh.md
@@ -10,16 +10,20 @@ ERNIE是百度开创性提出的基于知识增强的持续学习语义理解框
 
 # 新闻
 
-- 2020.9.24: 
+- 2020.12.29:
+   - `ERNIE`开源工具套件全面升级 [PaddlePaddle v2.0](https://github.com/PaddlePaddle/Paddle/tree/release/2.0-rc)
+   - 所有demo教程均引入AMP（混合精度训练), 平均提速达2.3倍。
+   - 引入`Gradient accumulation`, 8G显存也可运行`ERNIE-large`模型。
+
+- 2020.9.24:
    - `ERNIE-ViL` 模型正式开源! ([点击进入](https://github.com/PaddlePaddle/ERNIE/tree/repro/ernie-vil))
        - 面向视觉-语言知识增强的预训练框架，首次在视觉-语言预训练引入结构化的知识。
            - 利用场景图中的知识，构建了物体、属性和关系预测任务，精细刻画模态间细粒度语义对齐。
        - 五项视觉-语言下游任务取得最好效果，[视觉常识推理榜单](https://visualcommonsense.com/)取得第一。
-       
-        
-- 2020.5.20:     
+
+
+- 2020.5.20:
     - 欢迎试用`动态图`实现的 ERNIE:
-        - 基于[PaddlePaddle v1.8](https://github.com/PaddlePaddle/Paddle/tree/release/1.8)使用 ERNIE 进行 Pretrain 和 Finetune.
         - 动态执行, 所见即所得。
         - 大规模分布式训练。
         - 易于部署。
@@ -52,26 +56,24 @@ ERNIE是百度开创性提出的基于知识增强的持续学习语义理解框
 # 快速上手
 ```python
 import numpy as np
-import paddle.fluid.dygraph as D
+import paddle as P
 from ernie.tokenizing_ernie import ErnieTokenizer
 from ernie.modeling_ernie import ErnieModel
 
-D.guard().__enter__() # activate paddle `dygrpah` mode
-
 model = ErnieModel.from_pretrained('ernie-1.0')    # Try to get pretrained model from server, make sure you have network connection
 model.eval()
 tokenizer = ErnieTokenizer.from_pretrained('ernie-1.0')
 
 ids, _ = tokenizer.encode('hello world')
-ids = D.to_variable(np.expand_dims(ids, 0))  # insert extra `batch` dimension
+ids = P.to_tensor(np.expand_dims(ids, 0))  # insert extra `batch` dimension
 pooled, encoded = model(ids)                 # eager execution
 print(pooled.numpy())                        # convert  results to numpy
 
 ```
 
 # 教程
 
-手边没有GPU？欢迎在[AIStudio](https://aistudio.baidu.com/aistudio/index)中直接试用 ERNIE. 
+手边没有GPU？欢迎在[AIStudio](https://aistudio.baidu.com/aistudio/index)中直接试用 ERNIE.
 (请选择最新版本的教程并申请GPU运行环境)
 
 1. [从0开始学ERNIE](https://aistudio.baidu.com/studio/edu/group/quick/join/314947)
@@ -159,11 +161,16 @@ data/xnli
 - 使用 `动态图` 模型进行finetune:
 
 ```script
-python3 ./ernie_d/demo/finetune_classifier_dygraph.py \
+python3 ./ernie_d/demo/finetune_classifier.py \
        --from_pretrained ernie-1.0 \
-       --data_dir ./data/xnli  
+       --data_dir ./data/xnli
 ```
 
+   - 加入`--use_amp`以启用AMP功能(请在支持`TensorCore`设备上启用AMP)
+   - 通过`--bsz`指定全局batch\_size(一步优化中模型所能见到的样本数), 通过`--micro_bsz` 指定输入给每一张GPU卡的样本数
+若`--bsz > --micro_bsz` 脚本会自动开启梯度累计功能.
+
+
 - 分布式 finetune
 
 `paddle.distributed.launch` 是一个进程管理器，我们采用它在每一张GPU上启动一个python进程，并配置相应的环境变量以进行分布式训练:
@@ -177,7 +184,7 @@ python3 ./ernie_d/demo/finetune_classifier_dygraph.py \
 
 ```script
 python3 -m paddle.distributed.launch \
-./demo/finetune_classifier_dygraph_distributed.py \
+./demo/finetune_classifier_distributed.py \
     --data_dir data/mnli \
     --max_steps 10000 \
     --from_pretrained ernie2.0-en
@@ -186,11 +193,12 @@ python3 -m paddle.distributed.launch \
 
 更多示例脚本:
 
-1. [情感分析](./demo/finetune_sentiment_analysis_dygraph.py)
-1. [语义匹配](./demo/finetune_classifier_dygraph.py)
-1. [命名实体识别(NER)](./demo/finetune_ner_dygraph.py)
-1. [机器阅读理解](./demo/finetune_mrc_dygraph.py) (需要多卡环境运行；参见上面"分布式 finetune"一节)
+1. [情感分析](./demo/finetune_sentiment_analysis.py)
+1. [语义匹配](./demo/finetune_classifier.py)
+1. [命名实体识别(NER)](./demo/finetune_ner.py)
+1. [机器阅读理解](./demo/finetune_mrc.py) (需要多卡环境运行；参见上面"分布式 finetune"一节)
 1. [文本摘要生成](./demo/seq2seq/README.md)
+1. [使用静态图完成文本分类](./demo/finetune_classifier_static.py)
 
 
 **推荐超参数设置：**
@@ -221,7 +229,7 @@ python3 -m paddle.distributed.launch \
 
 # 在线预测
 
-如果`finetune_classifier_dygraph.py`中指定了`--inference_model_dir`参数，funetune脚本会将你的模型序列化并产出可以直接部署线上预测的`inference_model`.
+如果`finetune_classifier.py`中指定了`--inference_model_dir`参数，funetune脚本会将你的模型序列化并产出可以直接部署线上预测的`inference_model`.
 
 关于生产环境中使用线上预测代码的实现细节，请见[C++ inference API](./inference/README.md).
 或者你可以使用`propeller`启动一个多GPU预测服务(需要GPU环境)，只需执行：
@@ -254,7 +262,7 @@ ids = np.expand_dims(ids, -1) # ids.shape==[BATCH, SEQLEN, 1]
 
 # 蒸馏
 
-知识蒸馏是进行ERNIE模型压缩、加速的有效方式；关于知识蒸馏的实现细节请参见[这里](./distill/README.md)。
+知识蒸馏是进行ERNIE模型压缩、加速的有效方式；关于知识蒸馏的实现细节请参见[这里](./demo/distill/README.md)。
 
 # 文献引用
 
@@ -274,7 +282,7 @@ ids = np.expand_dims(ids, -1) # ids.shape==[BATCH, SEQLEN, 1]
   title={ERNIE 2.0: A Continual Pre-training Framework for Language Understanding},
   author={Sun, Yu and Wang, Shuohuan and Li, Yukun and Feng, Shikun and Tian, Hao and Wu, Hua and Wang, Haifeng},
   journal={arXiv preprint arXiv:1907.12412},
-  year={2019} 
+  year={2019}
 }
 ```
 
@@ -309,4 +317,3 @@ ids = np.expand_dims(ids, -1) # ids.shape==[BATCH, SEQLEN, 1]
 - QQ 群: 760439550 (ERNIE discussion group).
 - QQ 2群: 958422639 (ERNIE discussion group-v2).
 - [Forums](http://ai.baidu.com/forum/topic/list/168?pageNo=1): discuss implementations, research, etc.
-
diff --git a/demo/__init__.py b/demo/__init__.py
diff --git a/distill/README.md → demo/distill/README.md b/distill/README.md → demo/distill/README.md
@@ -9,7 +9,7 @@
 # ERNIE Slim 数据蒸馏
 在ERNIE强大的语义理解能力背后，是需要同样强大的算力才能支撑起如此大规模模型的训练和预测。很多工业应用场景对性能要求较高，若不能有效压缩则无法实际应用。
 
-![ernie_distill](../.metas/ernie_distill.png)
+![ernie_distill](../../.metas/ernie_distill.png)
 
 因此，如上图所示，我们基于[数据蒸馏技术](https://arxiv.org/pdf/1712.04440.pdf)构建了**ERNIE Slim数据蒸馏系统**。它的原理是通过数据作为桥梁，将ERNIE模型的知识迁移至小模型，以达到损失很小的效果却能达到上千倍的预测速度提升的效果。
 
@@ -18,11 +18,11 @@
 
  - **Step 1**. 使用ERNIE模型对输入标注数据对进行fine-tune，得到Teacher Model
  - **Step 2**. 使用ERNIE Service对以下无监督数据进行预测：
- 
+
    1. 用户提供的大规模无标注数据，需与标注数据同源
    2. 对标注数据进行数据增强，具体增强策略见下节
-   3. 对无标注数据和数据增强数据进行一定比例混合 
-   
+   3. 对无标注数据和数据增强数据进行一定比例混合
+
  - **Step 3.** 使用步骤2的数据训练出Student Model
 
 
@@ -59,7 +59,6 @@ python ./distill/distill.py
 |---|---|
 |ERNIE-Finetune |95.4% |
 |非ERNIE基线(BOW)|90.1%|
-|**+ 数据蒸馏** |91.4%| 
+|**+ 数据蒸馏** |91.4%|
 |非ERNIE基线（LSTM）|91.2%|
 |**+ 数据蒸馏**|93.9%|
-
Original file line number	Diff line number	Diff line change
Expand Up		@@ -15,4 +15,3 @@ markComment: >
		Thank you for your contributions.
		# Comment to post when closing a stale issue. Set to `false` to disable
		closeComment: false