-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Failed to launch triton-server:”error: creating server: Internal - failed to load all models“ #7950
Comments
Hi @pzydzh, from the server logs, it looks like there are some template values in the
You can try looking into those files manually and search for the Something that may help avoid this issue - if you're running the 24.03 version of tritonserver / 0.10.0 trtllm, make sure to clone the matching versions of any git repositories to make sure they match up correctly, such as: Also, I noticed you mentioned You can try following the latest README quickstart steps here as a sanity check: https://github.com/triton-inference-server/tensorrtllm_backend/tree/main?tab=readme-ov-file#quick-start |
Thank you for responding. I have tried downloading the latest image to deploy the model. Libraries steps
But I still got the error:
I noticed that there is an error message saying 'failed to load 'tensorrt_llm' version 1'. I have already installed tensorrt_llm, and it can also be seen in pip. Why did the loading fail? Thank you. |
Environment
CPU architecture: x86_64
CPU/Host memory size: 1.0Ti
GPU properties: 8.0
GPU name: NVIDIA A800-SXM4-80GB
GPU memory size: 81920 MiB
Clock frequencies used: 210 MHz
Libraries
TensorRT-LLM: v0.10.0
CUDA: 12.3
Container used : 24.03-trtllm-python-py3
NVIDIA driver version: 535.183.06
OS : Ubuntu 22.04
Reproduction Steps
docker image: nvcr.io/nvidia/tensorrt:24.12-py
2.Run(Successful)
ERROR log
The text was updated successfully, but these errors were encountered: