Add Contribute.md doc with instructions on adding a new model (#266)

Also, updated the launch script documentation with instructions on using the `--debug` flag.
kanvi-nervana · Apr 4, 2019 · 8802bc6 · 8802bc6
1 parent d8f9014
commit 8802bc6
Show file tree

Hide file tree

Showing 7 changed files with 275 additions and 1 deletion.
diff --git a/Contribute.md b/Contribute.md
@@ -0,0 +1,176 @@
+# Contributing to the Model Zoo for Intel® Architecture
+
+## Adding benchmarking scripts for a new TensorFlow model
+
+### Code updates
+
+In order to add a new model to the zoo, there are a few things that are
+required:
+
+1. Setup the directory structure to allow the
+   [launch script](/docs/general/tensorflow/LaunchBenchmark.md) to find
+   your model. This involves creating folders for:
+   `/benchmarks/<use case>/<framework>/<model name>/<mode>/<precision>`.
+   Note that you will need to add `__init__.py` files in each new
+   directory that you add, in order for python to find the code.
+
+   ![Benchmarks Directory Structure](benchmarks_directory_structure.png)
+
+2. Next, in the leaf folder that was created in the previous step, you
+   will need to create a `model_init.py` file:
+
+   ![Add model init](add_model_init.png)
+
+   This file is used to initialize the best known configuration for the
+   model, and then start executing inference or training. When the
+   [launch script](/docs/general/tensorflow/LaunchBenchmark.md) is run,
+   it will look for the appropriate `model_init.py` file to use
+   according to the model name, framework, mode, and precision that are
+   specified by the user.
+
+   The contents of the `model_init.py` file will vary by framework. For
+   TensorFlow models, we typically use the
+   [base model init class](/benchmarks/common/base_model_init.py) that
+   includes functions for doing common tasks such as setting up the best
+   known environment variables (like `KMP_BLOCKTIME`, `KMP_SETTINGS`,
+   `KMP_AFFINITY`, and `OMP_NUM_THREADS`), num intra threads, and num
+   inter threads. The `model_init.py` file also sets up the string that
+   will ultimately be used to run inference or model training, which
+   normally includes the use of `numactl` and sending all of the
+   appropriate arguments to the model's script. Also, if your model
+   requires any non-standard arguments (arguments that are not part of
+   the [launch script flags](/docs/general/tensorflow/LaunchBenchmark.md#launch_benchmarkpy-flags)),
+   the `model_init.py` file is where you would define and parse those
+   args.
+
+3. [start.sh](/benchmarks/common/tensorflow/start.sh) is a shell script
+   that is called by the `launch_benchmarks.py` script in the docker
+   container. This script installs dependencies that are required by
+   the model, sets up the `PYTHONPATH` environment variable, and then
+   calls the [run_tf_benchmark.py](/benchmarks/common/tensorflow/run_tf_benchmark.py)
+   script with the appropriate args. That run script will end up calling
+   the `model_init.py` file that you have defined in the previous step.
+
+   To add support for a new model in the `start.sh` script, you will
+   need to add a function with the same name as your model. Note that
+   this function name should match the `<model name>` folder from the
+   first step where you setup the directories for your model. In this
+   function, add commands to install any third-party dependencies within
+   an `if [ ${NOINSTALL} != "True" ]; then` conditional block. The
+   purpose of the `NOINSTALL` flag is to be able to skip the installs
+   for quicker iteration when running on bare metal or debugging. If
+   your model requires the `PYTHONPATH` environment variable to be setup
+   to find model code or dependencies, that should be done in the
+   model's function. Next, setup the command that will be run. The
+   standard launch script args are already added to the `CMD` variable,
+   so your model function will only need to add on more args if you have
+   model-specific args defined in your `model_init.py`. Lastly, call the
+   `run_model` function with the `PYTHONPATH` and the `CMD` string.
+
+   Below is a sample template of a `start.sh` model function that
+   installs dependencies from `requirements.txt` file, sets up the
+   `PYHTONPATH` to find model source files, adds on a custom steps flag
+   to the run command, and then runs the model:
+   ```bash
+   function <model_name>() {
+     if [ ${PRECISION} == "fp32" ]; then
+       if [ ${NOINSTALL} != "True" ]; then
+         pip install -r ${MOUNT_EXTERNAL_MODELS_SOURCE}/requirements.txt
+       fi
+
+       export PYTHONPATH=${PYTHONPATH}:${MOUNT_EXTERNAL_MODELS_SOURCE}
+       CMD="${CMD} $(add_steps_args)"
+       PYTHONPATH=${PYTHONPATH} CMD=${CMD} run_model
+     else
+       echo "PRECISION=${PRECISION} is not supported for ${MODEL_NAME}"
+       exit 1
+     fi
+   }
+   ```
+
+Optional step:
+* If there is CPU-optimized model code that has not been upstreamed to
+  the original repository, then it can be added to the
+  [models](/models) directory in the zoo repo. As with the first step
+  in the previous section, the directory structure should be setup like:
+  `/models/<use case>/<framework>/<model name>/<mode>/<precision>`:
+
+  ![Models Directory Structure](models_directory_structure.png)
+
+  If there are model files that can be shared by multiple modes or
+  precisions, they can be placed the higher-level directory. For
+  example, if a file could be shared by both `FP32` and `Int8`
+  precisions, then it could be placed in the directory at:
+  `/models/<use case>/<framework>/<model name>/<mode>` (omitting the
+  `<precision>` directory). Note that if this is being done, you need to
+  ensure that the license that is associated with the original model
+  repository is compatible with the license of the model zoo.
+
+### Debugging
+
+There are a couple of options for debugging and quicker iteration when
+developing new scripts:
+* Use the `--debug` flag in the launch_benchmark.py script, which will
+  give you a shell into the docker container. See the
+  [debugging section](/docs/general/tensorflow/LaunchBenchmark.md#debugging)
+  of the launch script documentation for more information on using this
+  flag.
+* Run the launch script on bare metal (without a docker container). The
+  launch script documentation also has a
+  [section](/docs/general/tensorflow/LaunchBenchmark.md#alpha-feature-running-on-bare-metal)
+  with instructions on how to do this. Note that when running without
+  docker, you are responsible for installing all dependencies on your
+  system before running the launch script. If you are using this option
+  during development, be sure to also test _with_ a docker container to
+  ensure that the `start.sh` script dependency installation is working
+  properly for your model.
+
+### Documentation updates
+
+1. Create a `README.md` file in the
+   `/benchmarks/<use case>/<framework>/<model name>` directory:
+
+   ![Add README file](add_readme.png)
+
+   This README file should describe all of the steps necessary to run
+   the model, including downloading and preprocessing the dataset,
+   downloading the pretrained model, cloning repositories, and running
+   the benchmarking script with the appropriate arguments. Most models
+   have best known settings for throughput and latency performance
+   testing as well as testing accuracy. The README file should specify
+   how to set these configs using the `launch_benchmark.py` script.
+
+2. Update the table in the [benchmarks README](/benchmarks/README.md)
+   with a link to the model that you are adding. Note that the models
+   in this table are ordered alphabetically by use case, framework, and
+   model name. The model name should link to the original paper for the
+   model. The benchmarking instructions column should link to the README
+   file that you created in the previous step.
+
+### Testing
+
+1. After you've completed the above steps, run the model according to
+   instructions in the README file for the new model. Ensure that the
+   performance and accuracy metrics are on par with what you would
+   expect.
+
+2. Add unit tests to cover the new model.
+   * For TensorFlow models, there is a
+     [parameterized test](/tests/unit/common/tensorflow/test_run_tf_benchmarks.py#L80)
+     that checks the flow running from `run_tf_benchmarks.py` to the
+     inference command that is executed by the `model_init.py` file. The
+     test ensures that the inference command has all of the expected
+     arguments.
+
+     To add a new parameterized instance of the test for your
+     new model, update the [tf_models_args.txt](/tests/unit/common/tensorflow/tf_model_args.txt)
+     file. This file has comma-separated values where each row has two
+     items: (1) the `run_tf_benchmarks.py` command with the appropriate
+     flags to run the model (2) the expected inference or training
+     command that should get run by the `model_init.py` file.
+   * If any launch script or base class files were changed, then
+     additional unit tests should be added.
+   * Unit tests and style checks are run when you post a GitHub PR, and
+     the tests must be passing before the PR is merged.
+   * For information on how to run the unit tests and style checks
+     locally, see the [tests documentation](/tests/README.md).
diff --git a/README.md b/README.md
@@ -29,4 +29,7 @@ We hope this structure is intuitive and helps you find what you are looking for;
 
 ![Repo Structure](repo_structure.png)
 
-*Note: For model quantization and optimization tools, see [https://github.com/IntelAI/tools](https://github.com/IntelAI/tools)*.
+*Note: For model quantization and optimization tools, see [https://github.com/IntelAI/tools](https://github.com/IntelAI/tools)*.
+
+## How to Contribute
+If you would like to add a new benchmarking script, please use [this guide](/Contribute.md).
diff --git a/add_model_init.png b/add_model_init.png
diff --git a/add_readme.png b/add_readme.png
diff --git a/benchmarks_directory_structure.png b/benchmarks_directory_structure.png
diff --git a/docs/general/tensorflow/LaunchBenchmark.md b/docs/general/tensorflow/LaunchBenchmark.md
@@ -106,6 +106,101 @@ optional arguments:
   --debug               Launches debug mode which doesn't execute start.sh
 ```
 
+## Debugging
+
+The `--debug` flag in the `launch_benchmarks.py` script gives you a
+shell into the docker container with the volumes mounted for any
+dataset, pretrained model, model source code, etc that has been
+provided by the other flags. It does not execute the `start.sh` script,
+and is intended as a way to setup an environment for quicker iteration
+when debugging and doing development. From the shell, you can manually
+execute the `start.sh` script and select to not re-install dependencies
+each time that you re-run, so that the script takes less time to run.
+
+Below is an example showing how to use the `--debug` flag:
+
+1. Run the model using your model's `launch_benchmark.py` command, but
+   add on the `--debug` flag, which will take you to a shell. If you
+   list the files in the directory at that prompt, you will see the
+   `start.sh` file:
+
+   ```
+   $ python launch_benchmark.py \
+        --in-graph /home/<user>/resnet50_fp32_pretrained_model.pb \
+        --model-name resnet50 \
+        --framework tensorflow \
+        --precision fp32 \
+        --mode inference \
+        --batch-size=1 \
+        --socket-id 0 \
+        --data-location /home/<user>/Imagenet_Validation \
+        --docker-image intelaipg/intel-optimized-tensorflow:latest-devel-mkl \
+        --debug
+
+   # ls
+   __init__.py  logs  run_tf_benchmark.py  start.sh
+   ```
+
+2. Flags that were passed to the launch script are set as environment
+   variables in the container:
+
+   ```
+   # env
+   EXTERNAL_MODELS_SOURCE_DIRECTORY=None
+   IN_GRAPH=/in_graph/resnet50_fp32_pretrained_model.pb
+   WORKSPACE=/workspace/benchmarks/common/tensorflow
+   MODEL_NAME=resnet50
+   PRECISION=fp32
+   BATCH_SIZE=1
+   MOUNT_EXTERNAL_MODELS_SOURCE=/workspace/models
+   DATASET_LOCATION=/dataset
+   BENCHMARK_ONLY=True
+   ACCURACY_ONLY=False
+   ...
+   ```
+3. Run the `start.sh` script, which will setup the `PYTHONPATH`, install
+   dependencies, and then run the model:
+   ```
+   # bash start.sh
+   ...
+   Iteration 48: 0.011513 sec
+   Iteration 49: 0.011664 sec
+   Iteration 50: 0.011802 sec
+   Average time: 0.011650 sec
+   Batch size = 1
+   Latency: 11.650 ms
+   Throughput: 85.833 images/sec
+   Ran inference with batch size 1
+   Log location outside container: <output directory>/benchmark_resnet50_inference_fp32_20190403_212048.log
+   ```
+
+4. Code changes that are made locally will also be made in the container
+   (and vice versa), since the directories are mounted in the docker
+   container. Once code changes are made, you can rerun the start
+   script, except set the `NOINSTALL` variable, since dependencies were
+   already installed in the previous run. You can also change the
+   environment variable values for other settings, like the batch size.
+
+   ```
+   # NOINSTALL=True
+   # BATCH_SIZE=128
+   # bash start.sh
+   ...
+   Iteration 48: 0.631819 sec
+   Iteration 49: 0.625606 sec
+   Iteration 50: 0.618813 sec
+   Average time: 0.625285 sec
+   Batch size = 128
+   Throughput: 204.707 images/sec
+   Ran inference with batch size 128
+   Log location outside container: <output directory>/benchmark_resnet50_inference_fp32_20190403_212310.log
+   ```
+
+5. Once you are done with the session, exit out of the docker container:
+   ```
+   # exit
+   ```
+
 ## Alpha feature: Running on bare metal
 
 We recommend using [Docker](https://www.docker.com) to run the

diff --git a/models_directory_structure.png b/models_directory_structure.png