Skip to content

Commit

Permalink
Update PyTorch containers and replace bf16 with bfloat16
Browse files Browse the repository at this point in the history
  • Loading branch information
ashahba committed Feb 13, 2021
1 parent f221d4e commit 1474177
Show file tree
Hide file tree
Showing 53 changed files with 125 additions and 57 deletions.
2 changes: 1 addition & 1 deletion benchmarks/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@ dependencies to be installed:
| Use Case | Framework | Model | Mode | oneContainer Portal | Run from the Model Zoo repository |
| ----------------------- | ------------ | ------------------ | --------- | ------------------- | --------------------------------- |
| Image Recognition | PyTorch | [ResNet 50](https://arxiv.org/pdf/1512.03385.pdf) | Inference | Model Containers: [FP32](https://software.intel.com/content/www/us/en/develop/articles/containers/resnet50-fp32-inference-pytorch-container.html) [BFloat16**](https://software.intel.com/content/www/us/en/develop/articles/containers/resnet50-bfloat16-inference-pytorch-container.html) <br> Model Packages: [FP32](https://software.intel.com/content/www/us/en/develop/articles/containers/resnet50-fp32-inference-pytorch-model.html) [BFloat16**](https://software.intel.com/content/www/us/en/develop/articles/containers/resnet50-bfloat16-inference-pytorch-model.html) | [FP32](/quickstart/image_recognition/pytorch/resnet50/inference/fp32/README.md) [BFloat16**](/quickstart/image_recognition/pytorch/resnet50/inference/bf16/README.md) |
| Recommendation | PyTorch | [DLRM](https://arxiv.org/pdf/1906.00091.pdf) | Training | Model Containers: [BFloat16**](https://software.intel.com/content/www/us/en/develop/articles/containers/dlrm-bfloat16-training-pytorch-container.html) <br> Model Packages: [BFloat16**](https://software.intel.com/content/www/us/en/develop/articles/containers/dlrm-bfloat16-training-pytorch-model.html) | [BFloat16**](../models/recommendation/pytorch/dlrm/training/bf16/README.md#dlrm-mlperf-bf16-training-v07-intel-submission) |
| Recommendation | PyTorch | [DLRM](https://arxiv.org/pdf/1906.00091.pdf) | Training | Model Containers: [BFloat16**](https://software.intel.com/content/www/us/en/develop/articles/containers/dlrm-bfloat16-training-pytorch-container.html) <br> Model Packages: [BFloat16**](https://software.intel.com/content/www/us/en/develop/articles/containers/dlrm-bfloat16-training-pytorch-model.html) | [BFloat16**](../models/recommendation/pytorch/dlrm/training/bfloat16/README.md#dlrm-mlperf-bf16-training-v07-intel-submission) |

*Means the model belongs to [MLPerf](https://mlperf.org/) models and will be supported long-term.

Expand Down
70 changes: 70 additions & 0 deletions dockerfiles/pytorch/pytorch-resnet50-bfloat16-inference.Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
# Copyright (c) 2020 Intel Corporation
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ============================================================================
#
# THIS IS A GENERATED DOCKERFILE.
#
# This file was assembled from multiple pieces, whose use is documented
# throughout. Please refer to the TensorFlow dockerfiles documentation
# for more information.

ARG PYTORCH_IMAGE="intel/intel-optimized-pytorch"
ARG PYTORCH_TAG

FROM ${PYTORCH_IMAGE}:${PYTORCH_TAG}

ARG PACKAGE_DIR=model_packages

ARG PACKAGE_NAME

ARG MODEL_WORKSPACE

# ${MODEL_WORKSPACE} and below needs to be owned by root:root rather than the current UID:GID
# this allows the default user (root) to work in k8s single-node, multi-node
RUN umask 002 && mkdir -p ${MODEL_WORKSPACE} && chgrp root ${MODEL_WORKSPACE} && chmod g+s+w,o+s+r ${MODEL_WORKSPACE}

ADD --chown=0:0 ${PACKAGE_DIR}/${PACKAGE_NAME}.tar.gz ${MODEL_WORKSPACE}

RUN chown -R root ${MODEL_WORKSPACE}/${PACKAGE_NAME} && chgrp -R root ${MODEL_WORKSPACE}/${PACKAGE_NAME} && chmod -R g+s+w ${MODEL_WORKSPACE}/${PACKAGE_NAME} && find ${MODEL_WORKSPACE}/${PACKAGE_NAME} -type d | xargs chmod o+r+x

WORKDIR ${MODEL_WORKSPACE}/${PACKAGE_NAME}

ENV USER_ID=0

ENV USER_NAME=root

ENV GROUP_ID=0

ENV GROUP_NAME=root

RUN apt-get update && \
apt-get install --no-install-recommends --fix-missing -y gosu

RUN echo '#!/bin/bash\n\
USER_ID=$USER_ID\n\
USER_NAME=$USER_NAME\n\
GROUP_ID=$GROUP_ID\n\
GROUP_NAME=$GROUP_NAME\n\
if [[ $GROUP_NAME != root ]]; then\n\
groupadd -r -g $GROUP_ID $GROUP_NAME\n\
fi\n\
if [[ $USER_NAME != root ]]; then\n\
useradd --no-log-init -r -u $USER_ID -g $GROUP_NAME -s /bin/bash -M $USER_NAME\n\
fi\n\
exec /usr/sbin/gosu $USER_NAME:$GROUP_NAME "$@"\n '\
>> /tmp/entrypoint.sh

RUN chmod u+x,g+x /tmp/entrypoint.sh

ENTRYPOINT ["/tmp/entrypoint.sh"]
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ This document has instructions for running ResNet50 BFloat16 inference using
<!--- 20. Download link -->
## Download link

[pytorch-resnet50-bf16-inference.tar.gz](https://storage.googleapis.com/intel-optimized-tensorflow/models/v2_3_0/pytorch-resnet50-bf16-inference.tar.gz)
[pytorch-resnet50-bfloat16-inference.tar.gz](https://storage.googleapis.com/intel-optimized-tensorflow/models/v2_3_0/pytorch-resnet50-bfloat16-inference.tar.gz)

<!--- 30. Datasets -->
## Datasets
Expand Down Expand Up @@ -71,9 +71,9 @@ Download and untar the model package and then run a [quickstart script](#quick-s
export DATASET_DIR=<path to the preprocessed imagenet dataset>
# Download and extract the model package
wget https://storage.googleapis.com/intel-optimized-tensorflow/models/v2_3_0/pytorch-resnet50-bf16-inference.tar.gz
tar -xzf pytorch-resnet50-bf16-inference.tar.gz
cd pytorch-resnet50-bf16-inference
wget https://storage.googleapis.com/intel-optimized-tensorflow/models/v2_3_0/pytorch-resnet50-bfloat16-inference.tar.gz
tar -xzf pytorch-resnet50-bfloat16-inference.tar.gz
cd pytorch-resnet50-bfloat16-inference
bash quickstart/<script name>.sh
```
Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
<!--- 40. Bare Metal -->
## Bare Metal

To run on bare metal first, follow the [instruction described here](/models/recommendation/pytorch/dlrm/training/bf16/README.md#1-install-anaconda-30) until section 4.
To run on bare metal first, follow the [instruction described here](/models/recommendation/pytorch/dlrm/training/bfloat16/README.md#1-install-anaconda-30) until section 4.

After installing the prerequisites, Set environment variables
for the path to your `DATA_PATH`then run a
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
<!--- 20. Datasets -->
## Datasets

Prepare your dataset according to the [instruction described here](/models/recommendation/pytorch/dlrm/training/bf16/README.md#4-prepare-dataset)
Prepare your dataset according to the [instruction described here](/models/recommendation/pytorch/dlrm/training/bfloat16/README.md#4-prepare-dataset)

Set the `DATA_PATH` to point to "<dir/to/save/dlrm_data>" directory when running DLRM.
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,6 @@ docker run \
--volume ${DATA_PATH}:${DATA_PATH} \
--volume ${OUTPUT_DIR}:${OUTPUT_DIR} \
--privileged --init -t \
model-zoo:intel-python-dlrm-bf16-training \
/bin/bash quickstart/recommendation/pytorch/dlrm/training/bf16/train_single_node.sh
intel/recommendation:pytorch-1.5.0-rc3-dlrm-bfloat16-training \
/bin/bash quickstart/<script name>.sh
```
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ Intel-optimized PyTorch.
<!--- 20. Datasets -->
## Datasets

Prepare your dataset according to the [instruction described here](/models/recommendation/pytorch/dlrm/training/bf16/README.md#4-prepare-dataset)
Prepare your dataset according to the [instruction described here](/models/recommendation/pytorch/dlrm/training/bfloat16/README.md#4-prepare-dataset)

Set the `DATA_PATH` to point to "<dir/to/save/dlrm_data>" directory when running DLRM.

Expand All @@ -28,7 +28,7 @@ These quickstart scripts can be run in different environments:
<!--- 40. Bare Metal -->
## Bare Metal

To run on bare metal first, follow the [instruction described here](/models/recommendation/pytorch/dlrm/training/bf16/README.md#1-install-anaconda-30) until section 4.
To run on bare metal first, follow the [instruction described here](/models/recommendation/pytorch/dlrm/training/bfloat16/README.md#1-install-anaconda-30) until section 4.

After installing the prerequisites, Set environment variables
for the path to your `DATA_PATH`then run a
Expand Down Expand Up @@ -58,8 +58,8 @@ docker run \
--volume ${DATA_PATH}:${DATA_PATH} \
--volume ${OUTPUT_DIR}:${OUTPUT_DIR} \
--privileged --init -t \
model-zoo:intel-python-dlrm-bf16-training \
/bin/bash quickstart/recommendation/pytorch/dlrm/training/bf16/train_single_node.sh
intel/recommendation:pytorch-1.5.0-rc3-dlrm-bfloat16-training \
/bin/bash quickstart/<script name>.sh
```

<!--- 70. License -->
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,6 @@ fi

# TODO: Fill in the launch_benchmark.py command with the recommended args

${MODEL_DIR}/models/recommendation/pytorch/dlrm/training/bf16/bench/cleanup.sh
${MODEL_DIR}/models/recommendation/pytorch/dlrm/training/bf16/bench/dlrm_mlperf_4s_1n_cpx.sh
${MODEL_DIR}/models/recommendation/pytorch/dlrm/training/bfloat16/bench/cleanup.sh
${MODEL_DIR}/models/recommendation/pytorch/dlrm/training/bfloat16/bench/dlrm_mlperf_4s_1n_cpx.sh

4 changes: 2 additions & 2 deletions tools/ModelBuilderAdvanced.md
Original file line number Diff line number Diff line change
Expand Up @@ -128,9 +128,9 @@ The `--framework` (or `-f`) flag applies to the following model-builder subcomma
* generate-deployment (e.g. `model-builder generate-deployment -f k8s`)
* images (e.g `model-builder images -f pytorch` or `model-builder images -f ml`)
* init-spec (e.g. `model-builder init-spec -f tensorflow inceptionv4-fp32-inference`)
* make (e.g. `model-builder make -f pytorch pytorch-resnet50-bf16-inference`)
* make (e.g. `model-builder make -f pytorch pytorch-resnet50-bfloat16-inference`)
* models (e.g. `model-builder models -f pytorch`)
* package (e.g. `model-builder package -f pytorch pytorch-resnet50-bf16-inference`)
* package (e.g. `model-builder package -f pytorch pytorch-resnet50-bfloat16-inference`)
* packages (e.g. `model-builder packages -f tensorflow`)
* run-test-suite (e.g. `model-builder run-test-suite -c generate-dockerfile -f pytorch`)

Expand Down
2 changes: 1 addition & 1 deletion tools/docker/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ tools/docker/specs/
│   ├── scikit-learn-census_spec.yml
│   └── ...
├── pytorch
│   ├── pytorch-resnet50-bf16-inference_spec.yml
│   ├── pytorch-resnet50-bfloat16-inference_spec.yml
│   └── ...
└── tensorflow
├── bert-large-bfloat16-inference_spec.yml
Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
releases:
versioned:
tag_specs:
- '{ubuntu}{intel-python}{pytorch-conda}{ipex-conda}{torch_ccl-conda}{dlrm-bf16-training}'
- '{ubuntu}{intel-python}{pytorch-conda}{ipex-conda}{torch_ccl-conda}{dlrm-bfloat16-training}'
slice_sets:
pytorch-conda:
- add_to_name: "pytorch"
Expand All @@ -23,44 +23,44 @@ slice_sets:
- pytorch/torchccl-conda
tests:
- pytorch/import-torchccl.sh
dlrm-bf16-training:
- add_to_name: -dlrm-bf16-training
dlrm-bfloat16-training:
- add_to_name: -dlrm-bfloat16-training
args:
- PACKAGE_NAME=dlrm-bf16-training
- PACKAGE_NAME=dlrm-bfloat16-training
dockerfile_subdirectory: pytorch
documentation:
docs:
- name: Title
uri: models/quickstart/recommendation/pytorch/dlrm/training/bf16/.docs/title.md
uri: models/quickstart/recommendation/pytorch/dlrm/training/bfloat16/.docs/title.md
- name: Description
uri: models/quickstart/recommendation/pytorch/dlrm/training/bf16/.docs/description.md
uri: models/quickstart/recommendation/pytorch/dlrm/training/bfloat16/.docs/description.md
- name: Datasets
uri: models/quickstart/recommendation/pytorch/dlrm/training/bf16/.docs/datasets.md
uri: models/quickstart/recommendation/pytorch/dlrm/training/bfloat16/.docs/datasets.md
- name: Quick Start Scripts
uri: models/quickstart/recommendation/pytorch/dlrm/training/bf16/.docs/quickstart.md
uri: models/quickstart/recommendation/pytorch/dlrm/training/bfloat16/.docs/quickstart.md
- name: Bare Metal
uri: models/quickstart/recommendation/pytorch/dlrm/training/bf16/.docs/baremetal.md
uri: models/quickstart/recommendation/pytorch/dlrm/training/bfloat16/.docs/baremetal.md
- name: Docker
uri: models/quickstart/recommendation/pytorch/dlrm/training/bf16/.docs/docker.md
uri: models/quickstart/recommendation/pytorch/dlrm/training/bfloat16/.docs/docker.md
- name: License
uri: models/quickstart/recommendation/pytorch/dlrm/training/bf16/.docs/license.md
uri: models/quickstart/recommendation/pytorch/dlrm/training/bfloat16/.docs/license.md
name: README.md
text_replace:
<docker image>: ''
<mode>: training
<model name>: Dlrm
<package dir>: dlrm-bf16-training
<package name>: dlrm-bf16-training.tar.gz
<package dir>: dlrm-bfloat16-training
<package name>: dlrm-bfloat16-training.tar.gz
<package url>: ''
<precision>: bf16
<precision>: bfloat16
<use case>: recommendation
uri: models/quickstart/recommendation/pytorch/dlrm/training/bf16
uri: models/quickstart/recommendation/pytorch/dlrm/training/bfloat16
downloads: []
files:
- destination: models/recommendation/pytorch/dlrm/training/bf16
source: models/recommendation/pytorch/dlrm/training/bf16
- destination: models/recommendation/pytorch/dlrm/training/bfloat16
source: models/recommendation/pytorch/dlrm/training/bfloat16
- destination: quickstart
source: quickstart/recommendation/pytorch/dlrm/training/bf16
source: quickstart/recommendation/pytorch/dlrm/training/bfloat16
partials:
- recommendation/dlrm
- model_package
Expand Down
Original file line number Diff line number Diff line change
@@ -1,51 +1,51 @@
releases:
versioned:
tag_specs:
- '{pytorch}{pytorch-resnet50-bf16-inference}'
- '{pytorch}{pytorch-resnet50-bfloat16-inference}'
slice_sets:
pytorch-resnet50-bf16-inference:
- add_to_name: -resnet50-bf16-inference
pytorch-resnet50-bfloat16-inference:
- add_to_name: -resnet50-bfloat16-inference
args:
- PYTORCH_TAG=1.5.0-rc3-ipex-latest
- PACKAGE_NAME=pytorch-resnet50-bf16-inference
- PACKAGE_NAME=pytorch-resnet50-bfloat16-inference
dockerfile_subdirectory: pytorch
documentation:
docs:
- name: Title
uri: models/quickstart/image_recognition/pytorch/resnet50/inference/bf16/.docs/title.md
uri: models/quickstart/image_recognition/pytorch/resnet50/inference/bfloat16/.docs/title.md
- name: Description
uri: models/quickstart/image_recognition/pytorch/resnet50/inference/bf16/.docs/description.md
uri: models/quickstart/image_recognition/pytorch/resnet50/inference/bfloat16/.docs/description.md
- name: Download link
uri: models/quickstart/image_recognition/pytorch/resnet50/inference/bf16/.docs/download.md
uri: models/quickstart/image_recognition/pytorch/resnet50/inference/bfloat16/.docs/download.md
- name: Datasets
uri: models/quickstart/image_recognition/pytorch/resnet50/inference/bf16/.docs/datasets.md
uri: models/quickstart/image_recognition/pytorch/resnet50/inference/bfloat16/.docs/datasets.md
- name: Quick Start Scripts
uri: models/quickstart/image_recognition/pytorch/resnet50/inference/bf16/.docs/quickstart.md
uri: models/quickstart/image_recognition/pytorch/resnet50/inference/bfloat16/.docs/quickstart.md
- name: Bare Metal
uri: models/quickstart/image_recognition/pytorch/resnet50/inference/bf16/.docs/baremetal.md
uri: models/quickstart/image_recognition/pytorch/resnet50/inference/bfloat16/.docs/baremetal.md
- name: Docker
uri: models/quickstart/image_recognition/pytorch/resnet50/inference/bf16/.docs/docker.md
uri: models/quickstart/image_recognition/pytorch/resnet50/inference/bfloat16/.docs/docker.md
- name: License
uri: models/quickstart/image_recognition/pytorch/resnet50/inference/bf16/.docs/license.md
uri: models/quickstart/image_recognition/pytorch/resnet50/inference/bfloat16/.docs/license.md
name: README.md
text_replace:
<docker image>: 'intel/image-recognition:pytorch-1.5.0-rc3-resnet50-bfloat16-inference'
<mode>: inference
<model name>: ResNet50
<package dir>: pytorch-resnet50-bf16-inference
<package name>: pytorch-resnet50-bf16-inference.tar.gz
<package url>: 'https://storage.googleapis.com/intel-optimized-tensorflow/models/v2_3_0/pytorch-resnet50-bf16-inference.tar.gz'
<package dir>: pytorch-resnet50-bfloat16-inference
<package name>: pytorch-resnet50-bfloat16-inference.tar.gz
<package url>: 'https://storage.googleapis.com/intel-optimized-tensorflow/models/v2_3_0/pytorch-resnet50-bfloat16-inference.tar.gz'
<precision>: BFloat16
<use case>: image_recognition
uri: models/quickstart/image_recognition/pytorch/resnet50/inference/bf16
uri: models/quickstart/image_recognition/pytorch/resnet50/inference/bfloat16
downloads: []
files:
- destination: models/image_recognition/pytorch/common
source: models/image_recognition/pytorch/common
- destination: quickstart/common
source: quickstart/common
- destination: quickstart
source: quickstart/image_recognition/pytorch/resnet50/inference/bf16
source: quickstart/image_recognition/pytorch/resnet50/inference/bfloat16
partials:
- model_package
- entrypoint
Original file line number Diff line number Diff line change
Expand Up @@ -76,4 +76,4 @@ slice_sets:
- opencv
- model_package
- entrypoint
- object_detection/ssdresnet34_bf16_patch
- object_detection/ssdresnet34_bfloat16_patch
4 changes: 1 addition & 3 deletions tools/scripts/model-builder
Original file line number Diff line number Diff line change
Expand Up @@ -1507,9 +1507,7 @@ _model_builder._parse_model_name()
declare -n __precision=$4
declare -n __mode=$5
declare -n __dir=$6
if [[ $_model =~ bf16 ]]; then
__precision=bf16
elif [[ $_model =~ bfloat16 ]]; then
if [[ $_model =~ bfloat16 ]]; then
__precision=bfloat16
elif [[ $_model =~ fp32 ]]; then
__precision=fp32
Expand Down

0 comments on commit 1474177

Please sign in to comment.