-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add docker image for BERT e2e inference task #455
Conversation
248ae3d
to
4589c65
Compare
I'm not an expert on inference tasks, but can we get the |
Overall, this PR looks good to me. |
@weicongw So the sequence length will always be 128. As far as p50 and p99 latency, I'm not sure how much insight we'd gain from these metrics based on how short running of a test it is. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thinking just name this as Dockerfile
, following the current pattern neuron and nvidia images
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm good with that. Added on a prefix as I wasn't sure what the directory structure would like at first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done :)
# Install Python 3.11 | ||
RUN apt-get update && apt-get install -y \ | ||
software-properties-common && \ | ||
add-apt-repository ppa:deadsnakes/ppa && \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mattcjo @weicongw what are your opinions of different ways of installing dependencies like python in these images? I see weicong's is building from source in the neuron image
aws-k8s-tester/e2e2/test/images/neuron/Dockerfile
Lines 90 to 101 in cc66356
# install Python | |
RUN wget -q https://www.python.org/ftp/python/$PYTHON_VERSION/Python-$PYTHON_VERSION.tgz \ | |
&& tar -xzf Python-$PYTHON_VERSION.tgz \ | |
&& cd Python-$PYTHON_VERSION \ | |
&& ./configure --enable-shared --prefix=/usr/local \ | |
&& make -j $(nproc) && make install \ | |
&& cd .. && rm -rf ../Python-$PYTHON_VERSION* \ | |
&& ln -s /usr/local/bin/pip3 /usr/bin/pip \ | |
&& ln -s /usr/local/bin/$PYTHON /usr/local/bin/python \ | |
&& ${PIP} --no-cache-dir install --upgrade \ | |
pip \ | |
setuptools |
do you see a benefit in being consistent for all our images?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, in theory, I prefer installing directly from source as that gives us the most control and consistency. My main issue with it is that you need to specify the minor version as well. I feel like the system package manager does a good job and is far easier to use, but would like to hear everyone's take. I can make arguments for both.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One other thing to add on... My main point in support of installing from source is that there are sometime new features and optimizations introduced in minor releases. This has the potential of leading to different performance results between minor versions. The likelihood of this happening, or being of any significance? I think pretty low, but something worth noting
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to those points, and irrespective of execution i think having shared expectations across our images is also a good idea (unless there are obvious requirements derived from software/hardware differences)
my 2 cents is that i like building from source if we're already doing it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good to me. I'll make the changes now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
ca-certificates \ | ||
cmake \ | ||
curl \ | ||
emacs \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you can trim any of these out I would but nbd 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually only included those because they were in our other images - https://github.com/aws/aws-k8s-tester/blob/cc66356a28d11fdd4a60573d2a2bbe502a14dbab/e2e2/test/images/neuron/Dockerfile#L39C1-L43C10
Was just trying to keep things consistent and start to identify common patterns amongst our images. Maybe these are required in the other image...? If not, I would definitely advocate for us to remove.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea agreed, its not blocking me though
steps: | ||
- uses: actions/checkout@v3 | ||
- run: docker build --file e2e2/test/images/bert-inference/Dockerfile e2e2/test/images/bert-inference |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as much as i prefer setting the build workspace to the exact path, I think it's set to .
in the other images because its built from the root in other workflows. Can you double check this will work
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't work because the image build references e2e2/test/images/bert-inference/requirements.txt
. If we set the path to .
, instead of the full path, then it doesn't know it exists. This is actually something I was wanting to discuss, python dependency declaration/management, but was hoping it'd be a down the line thing (not a short convo)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i agree, i don't love the full path but we'll need to make sure it works with our current setup
Issue #, if available:
Description of changes:
A inference script (e2e2/test/images/bert-inference/infer.py) has been added, along with its dependencies (e2e2/test/images/bert-inference/requirements.txt), in a new docker file (e2e2/test/images/bert-training/Dockerfile.bert-inference). Building the dockerfile will produce an image that will run a BERT inference job on GPU.
The testing of the docker image took place on a
g5.2xlarge
instance utilizing the AMI: ami-05e885690ca33b527. The goal of the image is to run an inference workload through a BERT model. There are two inference modes:latency
andthroughput
. These are two different modes that optimize for latency or throughput.The results of the test show that the running the docker image starts up and executes the BERT inference job for latency optimization as expected :
The results of the test show that the running the docker image starts up and executes the BERT inference job for throughput optimization as expected :
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.