Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Batch Optimization Scripts for Neuron Instances #500

Open
wants to merge 24 commits into
base: main
Choose a base branch
from

Conversation

mattcjo
Copy link
Contributor

@mattcjo mattcjo commented Oct 25, 2024

This pull request introduces the training and inference scripts essential for model development. Additionally, a supporting Dockerfile is provided to optimize batch sizes specifically for Neuron GPU instances, ensuring efficient utilization of GPU resources.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

mattcjo and others added 23 commits July 11, 2024 17:36
…ce to be consistent with the other test images
@mattcjo mattcjo changed the title Batch optimization neuron Add Batch Optimization Scripts for Neuron Instances Oct 25, 2024
@cartermckinnon
Copy link
Member

cartermckinnon commented Oct 25, 2024

Is this going to be used to tune our test cases? https://github.com/aws/aws-k8s-tester/tree/main/e2e2/test/cases/neuron

I'm not clear on the goal

Comment on lines +71 to +72
COPY train_bert_neuron.py /app/train_bert_neuron.py
COPY infer_bert_neuron.py /app/infer_bert_neuron.py
Copy link
Contributor

@ndbaker1 ndbaker1 Oct 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so this image supports inference and training for neuron? should we just put it under e2e2's images folder rather than hack?

these python scripts you could leave in /hack and then just volume mount them into the container

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. Yeah honestly I struggled with where to put these, and someone recommended hack a couple weeks ago. The main use case right now is to just get optimal batch size to support upcoming benchmarking efforts for our e2e tests.

I could see it evolving in the future to being automatically ran when certain dependencies are updated, or as new instance types become available.

Copy link
Contributor

@ndbaker1 ndbaker1 Oct 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so IIUC we can use the neuron test for inference tuning but you need an imaage for neuron here that supports training as well? im trying to decouple the test image from the optimization suite/framework.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ndbaker1 Use of Dockerfile was just to make things more portable across instances as I did testing. Also, while probably made no difference, there is slight overhead introduced from running in a container versus just a script. Additional dependencies (e.g. neuron container runtime) as well, which makes the optimization's environment closer to the tests' runtime environment.

Copy link
Contributor Author

@mattcjo mattcjo Oct 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ndbaker1 @cartermckinnon Not sure I have a perfect answer of where these scripts/dockerfile should go, but here's the full context...

  • The training and inference tests part of e2e2 currently have suboptimal values for their batch parameter.

  • A standard batch value is hardcoded for all of them, leaving many of the instance's GPUs underutilized.

  • A major goal moving forward is to be able to benchmark these tests on all instances, and to gain an understanding of what full peak performance looks like for each instance type.

  • These new optimization scripts look to target a single GPU on an instance (even if multiple GPU), and to determine max batch size that a GPU of a certain type can handle.

  • The optimal batch value will then be used to determine the total batch size per instance (batch_size * num_gpus) for each instance, enabling us to run benchmarking for each instance at full GPU utilization (like our customers would)

  • The need for a training and inference script has to do with the fact that depending on the "mode" of a model, more/less memory might be utilized

  • Memory utilization by mode differs significantly because training requires large amounts of temporary parameter values to be held in memory (as weights/parameters get updated during the training process), while inference does not (parameter values are static)

  • The scripts were containerized to more closely mirror the test's runtime environment of running on kubernetes

  • A single Dockerfile was used for simplicity

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. Can we include this script in our existing test images so we don't need a separate pipeline for it? will be easier to set up a periodic for this as well if it's all the same spec with a different command

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this, dependencies should be kept the consistent anyways. Can't do this for Neuron yet, I'm just now noticing that the PR for Neuron BERT training/inference was closed and never merged. Will need to get that merged in first.

@mattcjo
Copy link
Contributor Author

mattcjo commented Oct 25, 2024

Is this going to be used to tune our test cases? https://github.com/aws/aws-k8s-tester/tree/main/e2e2/test/cases/neuron

I'm not clear on the goal

@cartermckinnon Yes, these are used to determine optimal batch size for Neuron instances for both training and inference e2e tests. There's one for NVIDIA instances as well - #498

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants