Add docker image for BERT e2e inference task #455

mattcjo · 2024-06-27T16:23:07Z

Issue #, if available:

Description of changes:
A inference script (e2e2/test/images/bert-inference/infer.py) has been added, along with its dependencies (e2e2/test/images/bert-inference/requirements.txt), in a new docker file (e2e2/test/images/bert-training/Dockerfile.bert-inference). Building the dockerfile will produce an image that will run a BERT inference job on GPU.

The testing of the docker image took place on a g5.2xlarge instance utilizing the AMI: ami-05e885690ca33b527. The goal of the image is to run an inference workload through a BERT model. There are two inference modes: latency and throughput. These are two different modes that optimize for latency or throughput.

The results of the test show that the running the docker image starts up and executes the BERT inference job for latency optimization as expected :

docker run --rm --gpus all -e INFERENCE_MODE=latency aws-bert-inference:latest python infer.py

==========
== CUDA ==
==========

CUDA Version 12.5.0

Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.

/usr/local/lib/python3.11/dist-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
GPU is available
Running inference in latency mode with batch size 1
Inference Mode: latency
Average time per batch: 0.0075 seconds
Throughput: 132.61 samples/second

The results of the test show that the running the docker image starts up and executes the BERT inference job for throughput optimization as expected :

docker run --rm --gpus all -e INFERENCE_MODE=throughput aws-bert-inference:latest python infer.py

==========
== CUDA ==
==========

CUDA Version 12.5.0

Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.

/usr/local/lib/python3.11/dist-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
GPU is available
Running inference in throughput mode with batch size 8
Inference Mode: throughput
Average time per batch: 0.0129 seconds
Throughput: 619.77 samples/second

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

…builds

weicongw · 2024-07-11T22:12:06Z

I'm not an expert on inference tasks, but can we get the Sequence Length in the performance data? Also, how long does each inference take? Can we run multiple inferences and calculate the p50 and p99 latency?

weicongw · 2024-07-11T22:12:38Z

Overall, this PR looks good to me.

mattcjo · 2024-07-11T22:17:27Z

I'm not an expert on inference tasks, but can we get the Sequence Length in the performance data? Also, how long does each inference take? Can we run multiple inferences and calculate the p50 and p99 latency?

@weicongw So the sequence length will always be 128. As far as p50 and p99 latency, I'm not sure how much insight we'd gain from these metrics based on how short running of a test it is.

ndbaker1 · 2024-07-12T06:49:39Z

e2e2/test/images/bert-inference/Dockerfile.bert-inference

thinking just name this as Dockerfile, following the current pattern neuron and nvidia images

I'm good with that. Added on a prefix as I wasn't sure what the directory structure would like at first.

…r bert inference

ndbaker1 · 2024-07-12T17:42:51Z

e2e2/test/images/bert-inference/Dockerfile

+# Install Python 3.11
+RUN apt-get update && apt-get install -y \
+    software-properties-common && \
+    add-apt-repository ppa:deadsnakes/ppa && \


@mattcjo @weicongw what are your opinions of different ways of installing dependencies like python in these images? I see weicong's is building from source in the neuron image

aws-k8s-tester/e2e2/test/images/neuron/Dockerfile

Lines 90 to 101 in cc66356

# install Python

RUN wget -q https://www.python.org/ftp/python/$PYTHON_VERSION/Python-$PYTHON_VERSION.tgz \

&& tar -xzf Python-$PYTHON_VERSION.tgz \

&& cd Python-$PYTHON_VERSION \

&& ./configure --enable-shared --prefix=/usr/local \

&& make -j $(nproc) && make install \

&& cd .. && rm -rf ../Python-$PYTHON_VERSION* \

&& ln -s /usr/local/bin/pip3 /usr/bin/pip \

&& ln -s /usr/local/bin/$PYTHON /usr/local/bin/python \

&& ${PIP} --no-cache-dir install --upgrade \

pip \

setuptools

do you see a benefit in being consistent for all our images?

So, in theory, I prefer installing directly from source as that gives us the most control and consistency. My main issue with it is that you need to specify the minor version as well. I feel like the system package manager does a good job and is far easier to use, but would like to hear everyone's take. I can make arguments for both.

One other thing to add on... My main point in support of installing from source is that there are sometime new features and optimizations introduced in minor releases. This has the potential of leading to different performance results between minor versions. The likelihood of this happening, or being of any significance? I think pretty low, but something worth noting

+1 to those points, and irrespective of execution i think having shared expectations across our images is also a good idea (unless there are obvious requirements derived from software/hardware differences)

my 2 cents is that i like building from source if we're already doing it

Sounds good to me. I'll make the changes now

ndbaker1 · 2024-07-15T21:48:17Z

e2e2/test/images/bert-inference/Dockerfile

+    ca-certificates \
+    cmake \
+    curl \
+    emacs \


if you can trim any of these out I would but nbd 🤔

I actually only included those because they were in our other images - https://github.com/aws/aws-k8s-tester/blob/cc66356a28d11fdd4a60573d2a2bbe502a14dbab/e2e2/test/images/neuron/Dockerfile#L39C1-L43C10

Was just trying to keep things consistent and start to identify common patterns amongst our images. Maybe these are required in the other image...? If not, I would definitely advocate for us to remove.

yea agreed, its not blocking me though

ndbaker1 · 2024-07-15T21:56:02Z

.github/workflows/ci.yaml

+    steps:
+    - uses: actions/checkout@v3
+    - run: docker build --file e2e2/test/images/bert-inference/Dockerfile e2e2/test/images/bert-inference


as much as i prefer setting the build workspace to the exact path, I think it's set to . in the other images because its built from the root in other workflows. Can you double check this will work

This doesn't work because the image build references e2e2/test/images/bert-inference/requirements.txt. If we set the path to ., instead of the full path, then it doesn't know it exists. This is actually something I was wanting to discuss, python dependency declaration/management, but was hoping it'd be a down the line thing (not a short convo)

i agree, i don't love the full path but we'll need to make sure it works with our current setup

…solute

mattcjo added 2 commits July 11, 2024 17:53

Add image for e2e bert inference testing, and all its dependencies

ffe3cf8

Update git workflow with a new action to verify bert inference image …

4589c65

…builds

mattcjo force-pushed the e2e2-bert-inference branch from 248ae3d to 4589c65 Compare July 11, 2024 17:56

ndbaker1 reviewed Jul 12, 2024

View reviewed changes

mattcjo added 2 commits July 12, 2024 15:08

Update Dockerfile name to not include prefix

28f5db8

Update git workflow to account for changing of the Dockerfile name fo…

cc66356

…r bert inference

ndbaker1 reviewed Jul 12, 2024

View reviewed changes

Update bert-inference Dockerfile to install Python from source

1234026

ndbaker1 reviewed Jul 15, 2024

View reviewed changes

mattcjo and others added 3 commits July 15, 2024 22:08

Update bert inference docker build to use relative path instead of ab…

8044d08

…solute

revert bert inference dockerfile path back to full path from relative

acefe4e

Merge branch 'aws:main' into e2e2-bert-inference

76ce08c

ndbaker1 approved these changes Jul 16, 2024

View reviewed changes

cartermckinnon approved these changes Jul 16, 2024

View reviewed changes

cartermckinnon merged commit f5c1831 into aws:main Jul 16, 2024
5 checks passed

mattcjo deleted the e2e2-bert-inference branch July 16, 2024 17:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add docker image for BERT e2e inference task #455

Add docker image for BERT e2e inference task #455

mattcjo commented Jun 27, 2024

weicongw commented Jul 11, 2024

weicongw commented Jul 11, 2024

mattcjo commented Jul 11, 2024 •

edited

Loading

ndbaker1 Jul 12, 2024

mattcjo Jul 12, 2024

mattcjo Jul 12, 2024

ndbaker1 Jul 12, 2024

mattcjo Jul 12, 2024 •

edited

Loading

mattcjo Jul 12, 2024

ndbaker1 Jul 12, 2024 •

edited

Loading

mattcjo Jul 12, 2024

mattcjo Jul 15, 2024

ndbaker1 Jul 15, 2024

mattcjo Jul 15, 2024

ndbaker1 Jul 15, 2024 •

edited

Loading

ndbaker1 Jul 15, 2024

mattcjo Jul 15, 2024

ndbaker1 Jul 15, 2024

	# install Python
	RUN wget -q https://www.python.org/ftp/python/$PYTHON_VERSION/Python-$PYTHON_VERSION.tgz \
	&& tar -xzf Python-$PYTHON_VERSION.tgz \
	&& cd Python-$PYTHON_VERSION \
	&& ./configure --enable-shared --prefix=/usr/local \
	&& make -j $(nproc) && make install \
	&& cd .. && rm -rf ../Python-$PYTHON_VERSION* \
	&& ln -s /usr/local/bin/pip3 /usr/bin/pip \
	&& ln -s /usr/local/bin/$PYTHON /usr/local/bin/python \
	&& ${PIP} --no-cache-dir install --upgrade \
	pip \
	setuptools

Add docker image for BERT e2e inference task #455

Add docker image for BERT e2e inference task #455

Conversation

mattcjo commented Jun 27, 2024

weicongw commented Jul 11, 2024

weicongw commented Jul 11, 2024

mattcjo commented Jul 11, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mattcjo Jul 12, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ndbaker1 Jul 12, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ndbaker1 Jul 15, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mattcjo commented Jul 11, 2024 •

edited

Loading

mattcjo Jul 12, 2024 •

edited

Loading

ndbaker1 Jul 12, 2024 •

edited

Loading

ndbaker1 Jul 15, 2024 •

edited

Loading