You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I was able to create the checkpoint for the Meta-Llama-3-8B-Instruct model successfully. However, when I downloaded the meta-llama/Llama-3.2-1B and 3B version and tried to convert them, it fails with the following error:
agxuser@ubuntu:~/Documents/projects/tensorrt_llm$ python3 TensorRT-LLM/examples/llama/convert_checkpoint.py --model_dir models/base/Llama-3.2-3B/ --output_dir models/converted/checkpoints/llama-3.2-3B/ --dtype bfloat16
[TensorRT-LLM] TensorRT-LLM version: 0.12.0
0.12.0
201it [00:00, 390.69it/s]
Traceback (most recent call last):
File "/home/agxuser/Documents/projects/tensorrt_llm/TensorRT-LLM/examples/llama/convert_checkpoint.py", line 487, in
main()
File "/home/agxuser/Documents/projects/tensorrt_llm/TensorRT-LLM/examples/llama/convert_checkpoint.py", line 479, in main
convert_and_save_hf(args)
File "/home/agxuser/Documents/projects/tensorrt_llm/TensorRT-LLM/examples/llama/convert_checkpoint.py", line 421, in convert_and_save_hf
execute(args.workers, [convert_and_save_rank] * world_size, args)
File "/home/agxuser/Documents/projects/tensorrt_llm/TensorRT-LLM/examples/llama/convert_checkpoint.py", line 428, in execute
f(args, rank)
File "/home/agxuser/Documents/projects/tensorrt_llm/TensorRT-LLM/examples/llama/convert_checkpoint.py", line 410, in convert_and_save_rank
llama = LLaMAForCausalLM.from_hugging_face(
File "/home/agxuser/.local/lib/python3.10/site-packages/tensorrt_llm/models/llama/model.py", line 363, in from_hugging_face
loader.generate_tllm_weights(model)
File "/home/agxuser/.local/lib/python3.10/site-packages/tensorrt_llm/models/model_weights_loader.py", line 329, in generate_tllm_weights
tllm_weights.update(self.load(tllm_key))
File "/home/agxuser/.local/lib/python3.10/site-packages/tensorrt_llm/models/model_weights_loader.py", line 271, in load
v = sub_module.postprocess(tllm_key, v, **postprocess_kwargs)
File "/home/agxuser/.local/lib/python3.10/site-packages/tensorrt_llm/layers/linear.py", line 380, in postprocess
weights = weights.to(str_dtype_to_torch(self.dtype))
AttributeError: 'NoneType' object has no attribute 'to'
Exception ignored in: <function PretrainedModel.del at 0xfffe07666170>
Traceback (most recent call last):
File "/home/agxuser/.local/lib/python3.10/site-packages/tensorrt_llm/models/modeling_utils.py", line 449, in del
self.release()
File "/home/agxuser/.local/lib/python3.10/site-packages/tensorrt_llm/models/modeling_utils.py", line 446, in release
release_gc()
File "/home/agxuser/.local/lib/python3.10/site-packages/tensorrt_llm/_utils.py", line 469, in release_gc
torch.cuda.ipc_collect()
File "/home/agxuser/.local/lib/python3.10/site-packages/torch/cuda/init.py", line 962, in ipc_collect
_lazy_init()
File "/home/agxuser/.local/lib/python3.10/site-packages/torch/cuda/init.py", line 334, in _lazy_init
raise DeferredCudaCallError(msg) from e
torch.cuda.DeferredCudaCallError: CUDA call failed lazily at initialization with error: 'NoneType' object is not iterable
CUDA call was originally invoked at:
File "/home/agxuser/Documents/projects/tensorrt_llm/TensorRT-LLM/examples/llama/convert_checkpoint.py", line 8, in
from transformers import AutoConfig
File "", line 1027, in _find_and_load
File "", line 1006, in _find_and_load_unlocked
File "", line 688, in _load_unlocked
File "", line 883, in exec_module
File "", line 241, in _call_with_frames_removed
File "/home/agxuser/.local/lib/python3.10/site-packages/transformers/init.py", line 26, in
from . import dependency_versions_check
File "", line 1078, in _handle_fromlist
File "", line 241, in _call_with_frames_removed
File "", line 1027, in _find_and_load
File "", line 1006, in _find_and_load_unlocked
File "", line 688, in _load_unlocked
File "", line 883, in exec_module
File "", line 241, in _call_with_frames_removed
File "/home/agxuser/.local/lib/python3.10/site-packages/transformers/dependency_versions_check.py", line 16, in
from .utils.versions import require_version, require_version_core
File "", line 1027, in _find_and_load
File "", line 992, in _find_and_load_unlocked
File "", line 241, in _call_with_frames_removed
File "", line 1027, in _find_and_load
File "", line 1006, in _find_and_load_unlocked
File "", line 688, in _load_unlocked
File "", line 883, in exec_module
File "", line 241, in _call_with_frames_removed
File "/home/agxuser/.local/lib/python3.10/site-packages/transformers/utils/init.py", line 27, in
from .chat_template_utils import DocstringParsingException, TypeHintParsingException, get_json_schema
File "", line 1027, in _find_and_load
File "", line 1006, in _find_and_load_unlocked
File "", line 688, in _load_unlocked
File "", line 883, in exec_module
File "", line 241, in _call_with_frames_removed
File "/home/agxuser/.local/lib/python3.10/site-packages/transformers/utils/chat_template_utils.py", line 40, in
from torch import Tensor
File "", line 1027, in _find_and_load
File "", line 1006, in _find_and_load_unlocked
File "", line 688, in _load_unlocked
File "", line 883, in exec_module
File "", line 241, in _call_with_frames_removed
File "/home/agxuser/.local/lib/python3.10/site-packages/torch/init.py", line 1903, in
_C._initExtension(_manager_path())
File "", line 1027, in _find_and_load
File "", line 1006, in _find_and_load_unlocked
File "", line 688, in _load_unlocked
File "", line 883, in exec_module
File "", line 241, in _call_with_frames_removed
File "/home/agxuser/.local/lib/python3.10/site-packages/torch/cuda/init.py", line 1528, in
_lazy_call(_register_triton_kernels)
File "/home/agxuser/.local/lib/python3.10/site-packages/torch/cuda/init.py", line 257, in _lazy_call
_queued_calls.append((callable, traceback.format_stack()))
I also tried to convert them in the dustynv/tensorrt_llm:0.12-r36.4.0 jetson-container with the same error. It seems like there are correlations with #1634, but was not fixed for the jetson-branch. Any ideas on how to resolve this?
Who can help?
No response
Information
The official example scripts
My own modified scripts
Tasks
An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
System Info
TensorRT: 10.3.0.30-1+cuda12.5 (arm64)
Cuda: 12.6.11-1 (arm64)
I was able to create the checkpoint for the Meta-Llama-3-8B-Instruct model successfully. However, when I downloaded the meta-llama/Llama-3.2-1B and 3B version and tried to convert them, it fails with the following error:
agxuser@ubuntu:~/Documents/projects/tensorrt_llm$ python3 TensorRT-LLM/examples/llama/convert_checkpoint.py --model_dir models/base/Llama-3.2-3B/ --output_dir models/converted/checkpoints/llama-3.2-3B/ --dtype bfloat16
[TensorRT-LLM] TensorRT-LLM version: 0.12.0
0.12.0
201it [00:00, 390.69it/s]
Traceback (most recent call last):
File "/home/agxuser/Documents/projects/tensorrt_llm/TensorRT-LLM/examples/llama/convert_checkpoint.py", line 487, in
main()
File "/home/agxuser/Documents/projects/tensorrt_llm/TensorRT-LLM/examples/llama/convert_checkpoint.py", line 479, in main
convert_and_save_hf(args)
File "/home/agxuser/Documents/projects/tensorrt_llm/TensorRT-LLM/examples/llama/convert_checkpoint.py", line 421, in convert_and_save_hf
execute(args.workers, [convert_and_save_rank] * world_size, args)
File "/home/agxuser/Documents/projects/tensorrt_llm/TensorRT-LLM/examples/llama/convert_checkpoint.py", line 428, in execute
f(args, rank)
File "/home/agxuser/Documents/projects/tensorrt_llm/TensorRT-LLM/examples/llama/convert_checkpoint.py", line 410, in convert_and_save_rank
llama = LLaMAForCausalLM.from_hugging_face(
File "/home/agxuser/.local/lib/python3.10/site-packages/tensorrt_llm/models/llama/model.py", line 363, in from_hugging_face
loader.generate_tllm_weights(model)
File "/home/agxuser/.local/lib/python3.10/site-packages/tensorrt_llm/models/model_weights_loader.py", line 329, in generate_tllm_weights
tllm_weights.update(self.load(tllm_key))
File "/home/agxuser/.local/lib/python3.10/site-packages/tensorrt_llm/models/model_weights_loader.py", line 271, in load
v = sub_module.postprocess(tllm_key, v, **postprocess_kwargs)
File "/home/agxuser/.local/lib/python3.10/site-packages/tensorrt_llm/layers/linear.py", line 380, in postprocess
weights = weights.to(str_dtype_to_torch(self.dtype))
AttributeError: 'NoneType' object has no attribute 'to'
Exception ignored in: <function PretrainedModel.del at 0xfffe07666170>
Traceback (most recent call last):
File "/home/agxuser/.local/lib/python3.10/site-packages/tensorrt_llm/models/modeling_utils.py", line 449, in del
self.release()
File "/home/agxuser/.local/lib/python3.10/site-packages/tensorrt_llm/models/modeling_utils.py", line 446, in release
release_gc()
File "/home/agxuser/.local/lib/python3.10/site-packages/tensorrt_llm/_utils.py", line 469, in release_gc
torch.cuda.ipc_collect()
File "/home/agxuser/.local/lib/python3.10/site-packages/torch/cuda/init.py", line 962, in ipc_collect
_lazy_init()
File "/home/agxuser/.local/lib/python3.10/site-packages/torch/cuda/init.py", line 334, in _lazy_init
raise DeferredCudaCallError(msg) from e
torch.cuda.DeferredCudaCallError: CUDA call failed lazily at initialization with error: 'NoneType' object is not iterable
CUDA call was originally invoked at:
File "/home/agxuser/Documents/projects/tensorrt_llm/TensorRT-LLM/examples/llama/convert_checkpoint.py", line 8, in
from transformers import AutoConfig
File "", line 1027, in _find_and_load
File "", line 1006, in _find_and_load_unlocked
File "", line 688, in _load_unlocked
File "", line 883, in exec_module
File "", line 241, in _call_with_frames_removed
File "/home/agxuser/.local/lib/python3.10/site-packages/transformers/init.py", line 26, in
from . import dependency_versions_check
File "", line 1078, in _handle_fromlist
File "", line 241, in _call_with_frames_removed
File "", line 1027, in _find_and_load
File "", line 1006, in _find_and_load_unlocked
File "", line 688, in _load_unlocked
File "", line 883, in exec_module
File "", line 241, in _call_with_frames_removed
File "/home/agxuser/.local/lib/python3.10/site-packages/transformers/dependency_versions_check.py", line 16, in
from .utils.versions import require_version, require_version_core
File "", line 1027, in _find_and_load
File "", line 992, in _find_and_load_unlocked
File "", line 241, in _call_with_frames_removed
File "", line 1027, in _find_and_load
File "", line 1006, in _find_and_load_unlocked
File "", line 688, in _load_unlocked
File "", line 883, in exec_module
File "", line 241, in _call_with_frames_removed
File "/home/agxuser/.local/lib/python3.10/site-packages/transformers/utils/init.py", line 27, in
from .chat_template_utils import DocstringParsingException, TypeHintParsingException, get_json_schema
File "", line 1027, in _find_and_load
File "", line 1006, in _find_and_load_unlocked
File "", line 688, in _load_unlocked
File "", line 883, in exec_module
File "", line 241, in _call_with_frames_removed
File "/home/agxuser/.local/lib/python3.10/site-packages/transformers/utils/chat_template_utils.py", line 40, in
from torch import Tensor
File "", line 1027, in _find_and_load
File "", line 1006, in _find_and_load_unlocked
File "", line 688, in _load_unlocked
File "", line 883, in exec_module
File "", line 241, in _call_with_frames_removed
File "/home/agxuser/.local/lib/python3.10/site-packages/torch/init.py", line 1903, in
_C._initExtension(_manager_path())
File "", line 1027, in _find_and_load
File "", line 1006, in _find_and_load_unlocked
File "", line 688, in _load_unlocked
File "", line 883, in exec_module
File "", line 241, in _call_with_frames_removed
File "/home/agxuser/.local/lib/python3.10/site-packages/torch/cuda/init.py", line 1528, in
_lazy_call(_register_triton_kernels)
File "/home/agxuser/.local/lib/python3.10/site-packages/torch/cuda/init.py", line 257, in _lazy_call
_queued_calls.append((callable, traceback.format_stack()))
I also tried to convert them in the dustynv/tensorrt_llm:0.12-r36.4.0 jetson-container with the same error. It seems like there are correlations with #1634, but was not fixed for the jetson-branch. Any ideas on how to resolve this?
Who can help?
No response
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
python3 TensorRT-LLM/examples/llama/convert_checkpoint.py --model_dir models/base/Llama-3.2-3B/ --output_dir models/converted/checkpoints/llama-3.2-3B/ --dtype bfloat16
Expected behavior
successful checkpoint creation for llama3.2 1B and 3B
actual behavior
Error message (see above)
additional notes
Did not test llama3.1 models, can try if that helps fixing the problem.
The text was updated successfully, but these errors were encountered: