-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add docker image for BERT e2e inference task #455
Changes from 4 commits
ffe3cf8
4589c65
28f5db8
cc66356
1234026
8044d08
acefe4e
76ce08c
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
@@ -0,0 +1,31 @@ | ||||||||||||||||||||||||||
# Use the NVIDIA CUDA runtime as a parent image | ||||||||||||||||||||||||||
FROM nvidia/cuda:12.5.0-devel-ubuntu22.04 | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
# Set environment variable to disable interactive prompts | ||||||||||||||||||||||||||
ENV DEBIAN_FRONTEND=noninteractive | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
# Install Python 3.11 | ||||||||||||||||||||||||||
RUN apt-get update && apt-get install -y \ | ||||||||||||||||||||||||||
software-properties-common && \ | ||||||||||||||||||||||||||
add-apt-repository ppa:deadsnakes/ppa && \ | ||||||||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @mattcjo @weicongw what are your opinions of different ways of installing dependencies like python in these images? I see weicong's is building from source in the neuron image aws-k8s-tester/e2e2/test/images/neuron/Dockerfile Lines 90 to 101 in cc66356
do you see a benefit in being consistent for all our images? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So, in theory, I prefer installing directly from source as that gives us the most control and consistency. My main issue with it is that you need to specify the minor version as well. I feel like the system package manager does a good job and is far easier to use, but would like to hear everyone's take. I can make arguments for both. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. One other thing to add on... My main point in support of installing from source is that there are sometime new features and optimizations introduced in minor releases. This has the potential of leading to different performance results between minor versions. The likelihood of this happening, or being of any significance? I think pretty low, but something worth noting There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. +1 to those points, and irrespective of execution i think having shared expectations across our images is also a good idea (unless there are obvious requirements derived from software/hardware differences) my 2 cents is that i like building from source if we're already doing it There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sounds good to me. I'll make the changes now There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done |
||||||||||||||||||||||||||
apt-get update && \ | ||||||||||||||||||||||||||
apt-get install -y \ | ||||||||||||||||||||||||||
python3.11 \ | ||||||||||||||||||||||||||
python3.11-dev \ | ||||||||||||||||||||||||||
python3.11-distutils \ | ||||||||||||||||||||||||||
python3-pip && \ | ||||||||||||||||||||||||||
rm -rf /var/lib/apt/lists/* | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
# Create a symbolic link to use python3.11 as python | ||||||||||||||||||||||||||
RUN ln -sf /usr/bin/python3.11 /usr/bin/python | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
# Set the working directory in the container | ||||||||||||||||||||||||||
WORKDIR /app | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
# Copy only the necessary files into the container at /app | ||||||||||||||||||||||||||
COPY infer.py /app/ | ||||||||||||||||||||||||||
COPY requirements.txt /app/ | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
# Install any needed packages specified in requirements.txt | ||||||||||||||||||||||||||
RUN python -m pip install --upgrade pip && \ | ||||||||||||||||||||||||||
pip install --no-cache-dir -r requirements.txt |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,112 @@ | ||
import os | ||
import time | ||
import torch | ||
from transformers import BertForPreTraining, BertTokenizer | ||
from torch.utils.data import DataLoader, TensorDataset | ||
import numpy as np | ||
|
||
|
||
def create_dummy_data(tokenizer, num_samples=100, max_length=128): | ||
# Create dummy input data | ||
sentences = [ | ||
"This is a dummy sentence number {}".format(i) for i in range(num_samples) | ||
] | ||
tokenized_inputs = tokenizer( | ||
sentences, | ||
max_length=max_length, | ||
padding="max_length", | ||
truncation=True, | ||
return_tensors="pt", | ||
) | ||
labels = tokenized_inputs.input_ids.detach().clone() | ||
|
||
# MLM task: randomly mask some tokens | ||
mlm_probability = 0.15 | ||
input_ids, labels = mask_tokens( | ||
tokenized_inputs.input_ids, tokenizer, mlm_probability | ||
) | ||
|
||
# NSP task: create dummy pairs | ||
next_sentence_labels = torch.randint(0, 2, (num_samples,)) | ||
|
||
return TensorDataset( | ||
input_ids, tokenized_inputs.attention_mask, next_sentence_labels | ||
) | ||
|
||
|
||
def mask_tokens(inputs, tokenizer, mlm_probability): | ||
labels = inputs.clone() | ||
probability_matrix = torch.full(labels.shape, mlm_probability) | ||
special_tokens_mask = [ | ||
tokenizer.get_special_tokens_mask(val, already_has_special_tokens=True) | ||
for val in labels.tolist() | ||
] | ||
probability_matrix.masked_fill_( | ||
torch.tensor(special_tokens_mask, dtype=torch.bool), value=0.0 | ||
) | ||
masked_indices = torch.bernoulli(probability_matrix).bool() | ||
labels[~masked_indices] = -100 # We only compute loss on masked tokens | ||
|
||
inputs[masked_indices] = tokenizer.convert_tokens_to_ids(tokenizer.mask_token) | ||
|
||
return inputs, labels | ||
|
||
|
||
def run_inference(model, tokenizer, batch_size, mode): | ||
device = torch.device("cuda" if torch.cuda.is_available() else "cpu") | ||
model.to(device) | ||
model.eval() | ||
|
||
dataset = create_dummy_data(tokenizer) | ||
dataloader = DataLoader(dataset, batch_size=batch_size) | ||
|
||
total_time = 0 | ||
total_batches = len(dataloader) | ||
|
||
with torch.no_grad(): | ||
for batch in dataloader: | ||
inputs, masks, next_sentence_labels = batch | ||
inputs, masks, next_sentence_labels = ( | ||
inputs.to(device), | ||
masks.to(device), | ||
next_sentence_labels.to(device), | ||
) | ||
|
||
start_time = time.time() | ||
outputs = model( | ||
input_ids=inputs, | ||
attention_mask=masks, | ||
next_sentence_label=next_sentence_labels, | ||
) | ||
end_time = time.time() | ||
|
||
total_time += end_time - start_time | ||
|
||
avg_time_per_batch = total_time / total_batches | ||
throughput = (total_batches * batch_size) / total_time | ||
|
||
print(f"Inference Mode: {mode}") | ||
print(f"Average time per batch: {avg_time_per_batch:.4f} seconds") | ||
print(f"Throughput: {throughput:.2f} samples/second") | ||
|
||
|
||
def main(): | ||
# Verify GPU availability | ||
if not torch.cuda.is_available(): | ||
raise RuntimeError("GPU isnot available. Exiting") | ||
|
||
print("GPU is available") | ||
|
||
# Pre-download model and tokenizer | ||
tokenizer = BertTokenizer.from_pretrained("bert-base-uncased") | ||
model = BertForPreTraining.from_pretrained("bert-base-uncased") | ||
|
||
mode = os.environ.get("INFERENCE_MODE", "throughput").lower() | ||
batch_size = 1 if mode == "latency" else 8 | ||
|
||
print(f"Running inference in {mode} mode with batch size {batch_size}") | ||
run_inference(model, tokenizer, batch_size, mode) | ||
|
||
|
||
if __name__ == "__main__": | ||
main() |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
torch==2.3 | ||
transformers==4.29 | ||
numpy==1.23 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as much as i prefer setting the build workspace to the exact path, I think it's set to
.
in the other images because its built from the root in other workflows. Can you double check this will workThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't work because the image build references
e2e2/test/images/bert-inference/requirements.txt
. If we set the path to.
, instead of the full path, then it doesn't know it exists. This is actually something I was wanting to discuss, python dependency declaration/management, but was hoping it'd be a down the line thing (not a short convo)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i agree, i don't love the full path but we'll need to make sure it works with our current setup