Why do I get this error after executing the command python3 scripts/launch_triton_server.py --world_size 1 --model_repo=llama_ifb/? #405
Unanswered
SevenEmotion
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I0412 01:49:31.086437 386 pinned_memory_manager.cc:241] Pinned memory pool is created at '0x7f3bec000000' with size 268435456
I0412 01:49:31.088762 386 cuda_memory_manager.cc:107] CUDA memory pool is created on device 0 with size 67108864
I0412 01:49:31.093652 386 model_lifecycle.cc:461] loading: postprocessing:1
I0412 01:49:31.093690 386 model_lifecycle.cc:461] loading: preprocessing:1
I0412 01:49:31.093763 386 model_lifecycle.cc:461] loading: tensorrt_llm:1
I0412 01:49:31.093790 386 model_lifecycle.cc:461] loading: tensorrt_llm_bls:1
I0412 01:49:31.146171 386 python_be.cc:2199] TRITONBACKEND_ModelInstanceInitialize: postprocessing_0_0 (CPU device 0)
I0412 01:49:31.146238 386 python_be.cc:2199] TRITONBACKEND_ModelInstanceInitialize: preprocessing_0_0 (CPU device 0)
[TensorRT-LLM][WARNING] batch_scheduler_policy parameter was not found or is invalid (must be max_utilization or guaranteed_no_evict)
[TensorRT-LLM][WARNING] max_num_sequences is not specified, will be set to the TRT engine max_batch_size
[TensorRT-LLM][WARNING] enable_trt_overlap is not specified, will be set to true
E0412 01:49:31.247550 386 backend_model.cc:634] ERROR: Failed to create instance: unexpected error when creating modelInstanceState: [json.exception.out_of_range.403] key 'builder_config' not found
E0412 01:49:31.247624 386 model_lifecycle.cc:621] failed to load 'tensorrt_llm' version 1: Internal: unexpected error when creating modelInstanceState: [json.exception.out_of_range.403] key 'builder_config' not found
I0412 01:49:31.247646 386 model_lifecycle.cc:756] failed to load 'tensorrt_llm'
I0412 01:49:31.247701 386 python_be.cc:2199] TRITONBACKEND_ModelInstanceInitialize: tensorrt_llm_bls_0_0 (CPU device 0)
I0412 01:49:31.492028 386 model_lifecycle.cc:818] successfully loaded 'tensorrt_llm_bls'
I0412 01:49:31.736910 386 pb_stub.cc:325] Failed to initialize Python stub: ValueError: Couldn't instantiate the backend tokenizer from one of:
(1) a
tokenizers
library serialization file,(2) a slow tokenizer instance to convert or
(3) an equivalent slow tokenizer class to instantiate and convert.
You need to have sentencepiece installed to convert a slow tokenizer to a fast one.
At:
/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_fast.py(120): init
/usr/local/lib/python3.10/dist-packages/transformers/models/llama/tokenization_llama_fast.py(122): init
/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py(2250): _from_pretrained
/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py(2017): from_pretrained
/usr/local/lib/python3.10/dist-packages/transformers/models/auto/tokenization_auto.py(751): from_pretrained
/tensorrtllm_backend/llama_ifb/postprocessing/1/model.py(64): initialize
I0412 01:49:31.738283 386 pb_stub.cc:325] Failed to initialize Python stub: ValueError: Couldn't instantiate the backend tokenizer from one of:
(1) a
tokenizers
library serialization file,(2) a slow tokenizer instance to convert or
(3) an equivalent slow tokenizer class to instantiate and convert.
You need to have sentencepiece installed to convert a slow tokenizer to a fast one.
At:
/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_fast.py(120): init
/usr/local/lib/python3.10/dist-packages/transformers/models/llama/tokenization_llama_fast.py(122): init
/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py(2250): _from_pretrained
/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py(2017): from_pretrained
/usr/local/lib/python3.10/dist-packages/transformers/models/auto/tokenization_auto.py(751): from_pretrained
/tensorrtllm_backend/llama_ifb/preprocessing/1/model.py(65): initialize
E0412 01:49:31.829928 386 backend_model.cc:634] ERROR: Failed to create instance: ValueError: Couldn't instantiate the backend tokenizer from one of:
(1) a
tokenizers
library serialization file,(2) a slow tokenizer instance to convert or
(3) an equivalent slow tokenizer class to instantiate and convert.
You need to have sentencepiece installed to convert a slow tokenizer to a fast one.
At:
/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_fast.py(120): init
/usr/local/lib/python3.10/dist-packages/transformers/models/llama/tokenization_llama_fast.py(122): init
/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py(2250): _from_pretrained
/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py(2017): from_pretrained
/usr/local/lib/python3.10/dist-packages/transformers/models/auto/tokenization_auto.py(751): from_pretrained
/tensorrtllm_backend/llama_ifb/preprocessing/1/model.py(65): initialize
E0412 01:49:31.830012 386 model_lifecycle.cc:621] failed to load 'preprocessing' version 1: Internal: ValueError: Couldn't instantiate the backend tokenizer from one of:
(1) a
tokenizers
library serialization file,(2) a slow tokenizer instance to convert or
(3) an equivalent slow tokenizer class to instantiate and convert.
You need to have sentencepiece installed to convert a slow tokenizer to a fast one.
At:
/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_fast.py(120): init
/usr/local/lib/python3.10/dist-packages/transformers/models/llama/tokenization_llama_fast.py(122): init
/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py(2250): _from_pretrained
/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py(2017): from_pretrained
/usr/local/lib/python3.10/dist-packages/transformers/models/auto/tokenization_auto.py(751): from_pretrained
/tensorrtllm_backend/llama_ifb/preprocessing/1/model.py(65): initialize
I0412 01:49:31.830029 386 model_lifecycle.cc:756] failed to load 'preprocessing'
E0412 01:49:31.831569 386 backend_model.cc:634] ERROR: Failed to create instance: ValueError: Couldn't instantiate the backend tokenizer from one of:
(1) a
tokenizers
library serialization file,(2) a slow tokenizer instance to convert or
(3) an equivalent slow tokenizer class to instantiate and convert.
You need to have sentencepiece installed to convert a slow tokenizer to a fast one.
At:
/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_fast.py(120): init
/usr/local/lib/python3.10/dist-packages/transformers/models/llama/tokenization_llama_fast.py(122): init
/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py(2250): _from_pretrained
/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py(2017): from_pretrained
/usr/local/lib/python3.10/dist-packages/transformers/models/auto/tokenization_auto.py(751): from_pretrained
/tensorrtllm_backend/llama_ifb/postprocessing/1/model.py(64): initialize
E0412 01:49:31.831641 386 model_lifecycle.cc:621] failed to load 'postprocessing' version 1: Internal: ValueError: Couldn't instantiate the backend tokenizer from one of:
(1) a
tokenizers
library serialization file,(2) a slow tokenizer instance to convert or
(3) an equivalent slow tokenizer class to instantiate and convert.
You need to have sentencepiece installed to convert a slow tokenizer to a fast one.
At:
/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_fast.py(120): init
/usr/local/lib/python3.10/dist-packages/transformers/models/llama/tokenization_llama_fast.py(122): init
/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py(2250): _from_pretrained
/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py(2017): from_pretrained
/usr/local/lib/python3.10/dist-packages/transformers/models/auto/tokenization_auto.py(751): from_pretrained
/tensorrtllm_backend/llama_ifb/postprocessing/1/model.py(64): initialize
I0412 01:49:31.831668 386 model_lifecycle.cc:756] failed to load 'postprocessing'
E0412 01:49:31.831731 386 model_repository_manager.cc:563] Invalid argument: ensemble 'ensemble' depends on 'postprocessing' which has no loaded version. Model 'postprocessing' loading failed with error: version 1 is at UNAVAILABLE state: Internal: ValueError: Couldn't instantiate the backend tokenizer from one of:
(1) a
tokenizers
library serialization file,(2) a slow tokenizer instance to convert or
(3) an equivalent slow tokenizer class to instantiate and convert.
You need to have sentencepiece installed to convert a slow tokenizer to a fast one.
At:
/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_fast.py(120): init
/usr/local/lib/python3.10/dist-packages/transformers/models/llama/tokenization_llama_fast.py(122): init
/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py(2250): _from_pretrained
/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py(2017): from_pretrained
/usr/local/lib/python3.10/dist-packages/transformers/models/auto/tokenization_auto.py(751): from_pretrained
/tensorrtllm_backend/llama_ifb/postprocessing/1/model.py(64): initialize
;
I0412 01:49:31.831783 386 server.cc:592]
+------------------+------+
| Repository Agent | Path |
+------------------+------+
+------------------+------+
I0412 01:49:31.831832 386 server.cc:619]
+-------------+-----------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------+
| Backend | Path | Config |
+-------------+-----------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------+
| python | /opt/tritonserver/backends/python/libtriton_python.so | {"cmdline":{"auto-complete-config":"false","backend-directory":"/opt/tritonserver/backends","min-compute-capability |
| | | ":"6.000000","shm-region-prefix-name":"prefix0_","default-max-batch-size":"4"}} |
| tensorrtllm | /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so | {"cmdline":{"auto-complete-config":"false","backend-directory":"/opt/tritonserver/backends","min-compute-capability |
| | | ":"6.000000","default-max-batch-size":"4"}} |
+-------------+-----------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------+
I0412 01:49:31.831930 386 server.cc:662]
+------------------+---------+--------------------------------------------------------------------------------------------------------------------------------------------+
| Model | Version | Status |
+------------------+---------+--------------------------------------------------------------------------------------------------------------------------------------------+
| postprocessing | 1 | UNAVAILABLE: Internal: ValueError: Couldn't instantiate the backend tokenizer from one of: |
| | | (1) a
tokenizers
library serialization file, || | | (2) a slow tokenizer instance to convert or |
| | | (3) an equivalent slow tokenizer class to instantiate and convert. |
| | | You need to have sentencepiece installed to convert a slow tokenizer to a fast one. |
| | | |
| | | At: |
| | | /usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_fast.py(120): init |
| | | /usr/local/lib/python3.10/dist-packages/transformers/models/llama/tokenization_llama_fast.py(122): init |
| | | /usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py(2250): _from_pretrained |
| | | /usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py(2017): from_pretrained |
| | | /usr/local/lib/python3.10/dist-packages/transformers/models/auto/tokenization_auto.py(751): from_pretrained |
| | | /tensorrtllm_backend/llama_ifb/postprocessing/1/model.py(64): initialize |
| preprocessing | 1 | UNAVAILABLE: Internal: ValueError: Couldn't instantiate the backend tokenizer from one of: |
| | | (1) a
tokenizers
library serialization file, || | | (2) a slow tokenizer instance to convert or |
| | | (3) an equivalent slow tokenizer class to instantiate and convert. |
| | | You need to have sentencepiece installed to convert a slow tokenizer to a fast one. |
| | | |
| | | At: |
| | | /usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_fast.py(120): init |
| | | /usr/local/lib/python3.10/dist-packages/transformers/models/llama/tokenization_llama_fast.py(122): init |
| | | /usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py(2250): _from_pretrained |
| | | /usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py(2017): from_pretrained |
| | | /usr/local/lib/python3.10/dist-packages/transformers/models/auto/tokenization_auto.py(751): from_pretrained |
| | | /tensorrtllm_backend/llama_ifb/preprocessing/1/model.py(65): initialize |
| tensorrt_llm | 1 | UNAVAILABLE: Internal: unexpected error when creating modelInstanceState: [json.exception.out_of_range.403] key 'builder_config' not found |
| tensorrt_llm_bls | 1 | READY |
+------------------+---------+--------------------------------------------------------------------------------------------------------------------------------------------+
I0412 01:49:31.862367 386 metrics.cc:817] Collecting metrics for GPU 0: NVIDIA A100-PCIE-40GB
I0412 01:49:31.862637 386 metrics.cc:710] Collecting CPU metrics
I0412 01:49:31.862782 386 tritonserver.cc:2458]
+----------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Option | Value |
+----------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| server_id | triton |
| server_version | 2.39.0 |
| server_extensions | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cuda_shared_memory binary_te |
| | nsor_data parameters statistics trace logging |
| model_repository_path[0] | llama_ifb/ |
| model_control_mode | MODE_NONE |
| strict_model_config | 1 |
| rate_limit | OFF |
| pinned_memory_pool_byte_size | 268435456 |
| cuda_memory_pool_byte_size{0} | 67108864 |
| min_supported_compute_capability | 6.0 |
| strict_readiness | 1 |
| exit_timeout | 30 |
| cache_enabled | 0 |
+----------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------+
I0412 01:49:31.862790 386 server.cc:293] Waiting for in-flight requests to complete.
I0412 01:49:31.862795 386 server.cc:309] Timeout 30: Found 0 model versions that have in-flight inferences
I0412 01:49:31.862836 386 server.cc:324] All models are stopped, unloading models
I0412 01:49:31.862841 386 server.cc:331] Timeout 30: Found 1 live models and 0 in-flight non-inference requests
I0412 01:49:32.862929 386 server.cc:331] Timeout 29: Found 1 live models and 0 in-flight non-inference requests
Cleaning up...
I0412 01:49:33.161251 386 model_lifecycle.cc:603] successfully unloaded 'tensorrt_llm_bls' version 1
I0412 01:49:33.863024 386 server.cc:331] Timeout 28: Found 0 live models and 0 in-flight non-inference requests
error: creating server: Internal - failed to load all models
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:
Process name: [[57830,1],0]
Exit code: 1
Beta Was this translation helpful? Give feedback.
All reactions