You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Description
We have been using triton in kubernetes for nearly 2 years and recently we introduced a new model to it and immediately started noticing instability. Triton would just stop after ~20 minutes of running, nothing in the logs to indicate why.
This occurs even if Triton receives no inference requests during this period.
The new model is a python backend model which uses CPU for inference.
We are deploying triton in Kubernetes in AWS on a Nvidia T4 single GPU, explicitly a g4dn.2xlarge.
The Triton K8s deployment has more than sufficient memory (25Gb). Checking memory for the pod in Grafana shows memory being under utilized.
Running kubectl describe on the pod after triton stops shows:
Last State: Terminated
Reason: Error
Exit Code: 137
Models are hosted in S3 and we are using poll mode. We are using AWS environment variables to authenticate with s3.
$ aws s3 ls s3://bucket/model_repo/model/
PRE 1/
PRE assets/
2024-10-22 20:21:21 778 config.pbtxt
2024-10-22 20:21:22 1023 config.yaml
$ aws s3 ls s3://bucket/model_repo/model/1/
2024-10-22 20:21:21 291 model.py
$ aws s3 ls s3://bucket/model_repo/model/assets/
PRE kenlms/
PRE tokenizers/
$ aws s3 ls s3://bucket/model_repo/model/assets/kenlms/
2024-10-22 20:21:22 12975096709 model.trie
$ aws s3 ls s3://bucket/model_repo/model/assets/tokenizers/
2024-10-22 20:21:21 37919 tokenizer.model
I have attached logs with log-verbose=10. In this log, only this new model we have introduced is present in the model repo. triton-verbose.log
Some analysis I have done:
The model model.trie is optional and configured in config.yaml. If I remove model.trie and its path from config.yaml triton remains stable again and it does not shutdown. The model is 12.9GB.
If I keep model.trie and turn off Kubernetes liveness and readiness we do not see the problem, triton remains stable again and it does not shutdown.
If keep model.trie and turn Kubernetes liveness and readiness back on but change the K8s deployment to have an initContainer which pulls the models into an emptyDir which is also mounted into the triton container and then tell triton to look in the emptyDir path for model via --model-store= I do not see the issue, triton remains stable again and it does not shutdown. Triton is only polling a local model repo in this setup, not s3.
Changing --repository-poll-secs has no impact and it still restarts every ~20 minutes
Widening the period in which K8s checks liveness and readiness does seem to change how quick it stops.
The combination of the large model.trie, polling s3 and liveness and readiness is causing triton to stop. Every time it stops the last log line in verbose mode is always:
2024-10-22T20:45:09Z I 1 model_lifecycle.cc:265] ModelStates()
Liveness and readiness probes are configured as below in the K8s deployment:
Triton Information
What version of Triton are you using?
2.42.0
Upgrading to the latest triton version does not fix the issue.
Are you using the Triton container or did you build it yourself?
We are using the 24.01.py3 triton docker image as a base image and adding a small layer which adds our own code to support using python backend models. This layer is not new and has remained the same for 12+ months.
To Reproduce
Configure triton to use poll model
Configure triton to use s3 for its model repository.
Add a large model (12.9GB) to that s3 model repository.
Deploy to kubernetes with liveness and readiness configured as above.
Do not make any inference requests
Wait ~20 minutes.
Expected behavior
Triton remains stable and do not stop unexpectedly
The text was updated successfully, but these errors were encountered:
Description
We have been using triton in kubernetes for nearly 2 years and recently we introduced a new model to it and immediately started noticing instability. Triton would just stop after ~20 minutes of running, nothing in the logs to indicate why.
This occurs even if Triton receives no inference requests during this period.
The new model is a python backend model which uses CPU for inference.
We are deploying triton in Kubernetes in AWS on a Nvidia T4 single GPU, explicitly a g4dn.2xlarge.
The Triton K8s deployment has more than sufficient memory (25Gb). Checking memory for the pod in Grafana shows memory being under utilized.
Running
kubectl describe
on the pod after triton stops shows:Models are hosted in S3 and we are using poll mode. We are using AWS environment variables to authenticate with s3.
Triton command line:
Model repo:
I have attached logs with
log-verbose=10
. In this log, only this new model we have introduced is present in the model repo.triton-verbose.log
Some analysis I have done:
model.trie
is optional and configured inconfig.yaml
. If I removemodel.trie
and its path fromconfig.yaml
triton remains stable again and it does not shutdown. The model is 12.9GB.model.trie
and turn off Kubernetes liveness and readiness we do not see the problem, triton remains stable again and it does not shutdown.model.trie
and turn Kubernetes liveness and readiness back on but change the K8s deployment to have aninitContainer
which pulls the models into anemptyDir
which is also mounted into the triton container and then tell triton to look in theemptyDir
path for model via--model-store=
I do not see the issue, triton remains stable again and it does not shutdown. Triton is only polling a local model repo in this setup, not s3.--repository-poll-secs
has no impact and it still restarts every ~20 minutesThe combination of the large
model.trie
, polling s3 and liveness and readiness is causing triton to stop. Every time it stops the last log line in verbose mode is always:Liveness and readiness probes are configured as below in the K8s deployment:
Has this been seen before?
How can I investigate further?
Triton Information
What version of Triton are you using?
2.42.0
Upgrading to the latest triton version does not fix the issue.
Are you using the Triton container or did you build it yourself?
We are using the
24.01.py3
triton docker image as a base image and adding a small layer which adds our own code to support using python backend models. This layer is not new and has remained the same for 12+ months.To Reproduce
Expected behavior
Triton remains stable and do not stop unexpectedly
The text was updated successfully, but these errors were encountered: