diff --git a/.gitignore b/.gitignore index e13d7d768..e8f8cc26e 100644 --- a/.gitignore +++ b/.gitignore @@ -10,3 +10,4 @@ test_data/ download_glue_data.py data/ +output/ diff --git a/examples/language_modeling/tensorflow/bert_large/training/fp32/README.md b/examples/language_modeling/tensorflow/bert_large/training/fp32/README.md index bf0eb6cc3..85cb87b70 100644 --- a/examples/language_modeling/tensorflow/bert_large/training/fp32/README.md +++ b/examples/language_modeling/tensorflow/bert_large/training/fp32/README.md @@ -9,15 +9,175 @@ should be downloaded as mentioned in the [Google bert repo](https://github.com/g Refer to google reference page for [checkpoints](https://github.com/google-research/bert#pre-trained-models). +## Datasets + +### Pretrained models + +Download and extract checkpoints the bert pretrained model from the +[google bert repo](https://github.com/google-research/bert#pre-trained-models). +The extracted directory should be set to the `CHECKPOINT_DIR` environment +variable when running example scripts. + +For training from scratch, Wikipedia and BookCorpus need to be downloaded +and pre-processed. + +### GLUE data + +[GLUE data](https://gluebenchmark.com/tasks) is used when running BERT +classification training. Download and unpack the GLUE data by running +(this script)[https://gist.github.com/W4ngatang/60c2bdb54d156a41194446737ce03e2e]. + +### SQuAD data + +The Stanford Question Answering Dataset (SQuAD) dataset files can be downloaded +from the [Google bert repo](https://github.com/google-research/bert#squad-11). +The three files (`train-v1.1.json`, `dev-v1.1.json`, and `evaluate-v1.1.py`) +should be downloaded to the same directory. Set the `DATASET_DIR` to point to +that directory when running bert fine tuning using the SQuAD data. + ## Example Scripts | Script name | Description | |-------------|-------------| -| [`fp32_training_multi_node.sh`](fp32_training_multi_node.sh) | This script is used by the Kubernetes pods to run training across multiple nodes using mpirun and horovod. | +| [`fp32_classifier_training.sh`](fp32_classifier_training.sh) | This script fine-tunes the bert base model on the Microsoft Research Paraphrase Corpus (MRPC) corpus, which only contains 3,600 examples. Download the [bert base pretrained model](https://github.com/google-research/bert#pre-trained-models) and set the `CHECKPOINT_DIR` to that directory. The `DATASET_DIR` should point to the [GLUE data](#glue-data). | +| [`fp32_squad_training.sh`](fp32_squad_training.sh) | This script fine-tunes bert using SQuAD data. Download the [bert large pretrained model](https://github.com/google-research/bert#pre-trained-models) and set the `CHECKPOINT_DIR` to that directory. The `DATASET_DIR` should point to the [squad data files](#squad-data). | +| [`fp32_training_single_node.sh`](fp32_training_single_node.sh) | This script is used by the single node Kubernetes job to run bert classifier inference. | +| [`fp32_training_multi_node.sh`](fp32_training_multi_node.sh) | This script is used by the Kubernetes pods to run bert classifier training across multiple nodes using mpirun and horovod. | These examples can be run the following environments: +* [Bare metal](#bare-metal) +* [Docker](#docker) * [Kubernetes](#kubernetes) +## Bare Metal + +To run on bare metal, the following prerequisites must be installed in your enviornment: +* Python 3 +* [intel-tensorflow==2.1.0](https://pypi.org/project/intel-tensorflow/) +* numactl +* git + +Once the above dependencies have been installed, download and untar the model +package, set environment variables, and then run an example script. See the +[datasets](#datasets) and [list of example scripts](#example-scripts) for more +details on the different options. + +The snippet below shows an example running with a single instance: +``` +wget https://ubit-artifactory-or.intel.com/artifactory/list/cicd-or-local/model-zoo/bert-large-fp32-training.tar.gz +tar -xvf bert-large-fp32-training.tar.gz +cd bert-large-fp32-training + +CHECKPOINT_DIR= +DATASET_DIR= +OUTPUT_DIR= + +# Run a script for your desired usage +./examples/