Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unexpected throughput results - Increasing instance group count VS deploying the count distributed on the same card using shared computing windows #7956

Open
ariel291888 opened this issue Jan 21, 2025 · 0 comments

Comments

@ariel291888
Copy link

Description
Hey there, we are working on openshift
When increasing instance group over a specific amount X (in my case it was 6), there is no improvement in the throughput, GPU utilization and the memory doesn't even surpass its half.
On the other hand, when deploying two pods on the same GPU (using shared computing windows and fractional memory partitioning- runai fractions), both the throughput and GPU utilization were much higher.

Triton Information
r23.08
We were using model analyzer to check the throughput – 24.10

Are you using the Triton container or did you build it yourself?
Both containers.

To Reproduce
Steps to reproduce the behavior.

Describe the models (framework, inputs, outputs), ideally include the model configuration file (if using an ensemble include the model configuration file for that as well):
platform: "pytorch_libtorch"
max_batch_size: 128
input
{
name: “input"
data_type: TYPE_FP32
dims: [-1,1]
}
input
{
name: "lengths"
data_type: TYPE_INT64
dims: [1]
}

output
{
name: "output"
data_type: TYPE_FP32
dims: [-1,-1]
}
instance_group {
count: 3
kind: KIND_GPU
}
dynamic_batching {
max_queue_delay_microseconds: 4000
}
Backend:”pytorch”

Expected behavior
My expectations are that the instances count approach would have at least the same throughput as running multiple pods on the same GPU approach.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

1 participant