Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Triton is stopping, unexpectedly and without logging, when using a large model, s3 and periodical checks to ready and live endpoints #7728

Open
smcbn opened this issue Oct 22, 2024 · 0 comments

Comments

@smcbn
Copy link

smcbn commented Oct 22, 2024

Description
We have been using triton in kubernetes for nearly 2 years and recently we introduced a new model to it and immediately started noticing instability. Triton would just stop after ~20 minutes of running, nothing in the logs to indicate why.

This occurs even if Triton receives no inference requests during this period.

The new model is a python backend model which uses CPU for inference.

We are deploying triton in Kubernetes in AWS on a Nvidia T4 single GPU, explicitly a g4dn.2xlarge.

The Triton K8s deployment has more than sufficient memory (25Gb). Checking memory for the pod in Grafana shows memory being under utilized.

Running kubectl describe on the pod after triton stops shows:

    Last State:     Terminated
      Reason:       Error
      Exit Code:    137

Models are hosted in S3 and we are using poll mode. We are using AWS environment variables to authenticate with s3.

Triton command line:

tritonserver \
--model-store=s3://bucket/model_repo/ \
--model-control-mode=poll \
--repository-poll-secs=500 \
--exit-timeout-secs=900 \
--log-verbose=0 \
--log-format=ISO8601 \
--exit-on-error=true \
--rate-limit=execution_count \
--rate-limit-resource=r_gpu:1 \
--rate-limit-resource=r_cpu:6

Model repo:

$ aws s3 ls s3://bucket/model_repo/model/
                           PRE 1/
                           PRE assets/
2024-10-22 20:21:21        778 config.pbtxt
2024-10-22 20:21:22       1023 config.yaml

$ aws s3 ls s3://bucket/model_repo/model/1/
2024-10-22 20:21:21        291 model.py

$ aws s3 ls s3://bucket/model_repo/model/assets/
                           PRE kenlms/
                           PRE tokenizers/

$ aws s3 ls s3://bucket/model_repo/model/assets/kenlms/
2024-10-22 20:21:22 12975096709 model.trie

$ aws s3 ls s3://bucket/model_repo/model/assets/tokenizers/
2024-10-22 20:21:21      37919 tokenizer.model

I have attached logs with log-verbose=10. In this log, only this new model we have introduced is present in the model repo.
triton-verbose.log

Some analysis I have done:

  1. The model model.trie is optional and configured in config.yaml. If I remove model.trie and its path from config.yaml triton remains stable again and it does not shutdown. The model is 12.9GB.
  2. If I keep model.trie and turn off Kubernetes liveness and readiness we do not see the problem, triton remains stable again and it does not shutdown.
  3. If keep model.trie and turn Kubernetes liveness and readiness back on but change the K8s deployment to have an initContainer which pulls the models into an emptyDir which is also mounted into the triton container and then tell triton to look in the emptyDir path for model via --model-store= I do not see the issue, triton remains stable again and it does not shutdown. Triton is only polling a local model repo in this setup, not s3.
  4. Changing --repository-poll-secs has no impact and it still restarts every ~20 minutes
  5. Widening the period in which K8s checks liveness and readiness does seem to change how quick it stops.

The combination of the large model.trie, polling s3 and liveness and readiness is causing triton to stop. Every time it stops the last log line in verbose mode is always:

2024-10-22T20:45:09Z I 1 model_lifecycle.cc:265] ModelStates()

Liveness and readiness probes are configured as below in the K8s deployment:

        livenessProbe:
          failureThreshold: 3
          httpGet:
            path: /v2/health/live
            port: http
            scheme: HTTP
          initialDelaySeconds: 180
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 1
        readinessProbe:
          failureThreshold: 3
          httpGet:
            path: /v2/health/ready
            port: http
            scheme: HTTP
          initialDelaySeconds: 180
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 1

Has this been seen before?

How can I investigate further?

Triton Information
What version of Triton are you using?
2.42.0

Upgrading to the latest triton version does not fix the issue.

Are you using the Triton container or did you build it yourself?
We are using the 24.01.py3 triton docker image as a base image and adding a small layer which adds our own code to support using python backend models. This layer is not new and has remained the same for 12+ months.

To Reproduce

  1. Configure triton to use poll model
  2. Configure triton to use s3 for its model repository.
  3. Add a large model (12.9GB) to that s3 model repository.
  4. Deploy to kubernetes with liveness and readiness configured as above.
  5. Do not make any inference requests
  6. Wait ~20 minutes.

Expected behavior
Triton remains stable and do not stop unexpectedly

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

1 participant