Cannot create checkpoint for llama-3.2 (1B, 3B) #2772

falkbene · 2025-02-11T12:04:38Z

System Info

CPU architecture : aarch64
Host memory 64GB (128GB swap on SSD)
GPU name: Nvidia Jetson Orin AGX (Ampere)
clock frequencies used: 1.3GHz
branch: v0.12.0-jetson
commit: 20a7cec
versions:
TensorRT: 10.3.0.30-1+cuda12.5 (arm64)
Cuda: 12.6.11-1 (arm64)
Nvidia driver: 540.4.0
OS: Ubuntu 22.04

I was able to create the checkpoint for the Meta-Llama-3-8B-Instruct model successfully. However, when I downloaded the meta-llama/Llama-3.2-1B and 3B version and tried to convert them, it fails with the following error:

agxuser@ubuntu:~/Documents/projects/tensorrt_llm$ python3 TensorRT-LLM/examples/llama/convert_checkpoint.py --model_dir models/base/Llama-3.2-3B/ --output_dir models/converted/checkpoints/llama-3.2-3B/ --dtype bfloat16
[TensorRT-LLM] TensorRT-LLM version: 0.12.0
0.12.0
201it [00:00, 390.69it/s]
Traceback (most recent call last):
File "/home/agxuser/Documents/projects/tensorrt_llm/TensorRT-LLM/examples/llama/convert_checkpoint.py", line 487, in
main()
File "/home/agxuser/Documents/projects/tensorrt_llm/TensorRT-LLM/examples/llama/convert_checkpoint.py", line 479, in main
convert_and_save_hf(args)
File "/home/agxuser/Documents/projects/tensorrt_llm/TensorRT-LLM/examples/llama/convert_checkpoint.py", line 421, in convert_and_save_hf
execute(args.workers, [convert_and_save_rank] * world_size, args)
File "/home/agxuser/Documents/projects/tensorrt_llm/TensorRT-LLM/examples/llama/convert_checkpoint.py", line 428, in execute
f(args, rank)
File "/home/agxuser/Documents/projects/tensorrt_llm/TensorRT-LLM/examples/llama/convert_checkpoint.py", line 410, in convert_and_save_rank
llama = LLaMAForCausalLM.from_hugging_face(
File "/home/agxuser/.local/lib/python3.10/site-packages/tensorrt_llm/models/llama/model.py", line 363, in from_hugging_face
loader.generate_tllm_weights(model)
File "/home/agxuser/.local/lib/python3.10/site-packages/tensorrt_llm/models/model_weights_loader.py", line 329, in generate_tllm_weights
tllm_weights.update(self.load(tllm_key))
File "/home/agxuser/.local/lib/python3.10/site-packages/tensorrt_llm/models/model_weights_loader.py", line 271, in load
v = sub_module.postprocess(tllm_key, v, **postprocess_kwargs)
File "/home/agxuser/.local/lib/python3.10/site-packages/tensorrt_llm/layers/linear.py", line 380, in postprocess
weights = weights.to(str_dtype_to_torch(self.dtype))
AttributeError: 'NoneType' object has no attribute 'to'
Exception ignored in: <function PretrainedModel.del at 0xfffe07666170>
Traceback (most recent call last):
File "/home/agxuser/.local/lib/python3.10/site-packages/tensorrt_llm/models/modeling_utils.py", line 449, in del
self.release()
File "/home/agxuser/.local/lib/python3.10/site-packages/tensorrt_llm/models/modeling_utils.py", line 446, in release
release_gc()
File "/home/agxuser/.local/lib/python3.10/site-packages/tensorrt_llm/_utils.py", line 469, in release_gc
torch.cuda.ipc_collect()
File "/home/agxuser/.local/lib/python3.10/site-packages/torch/cuda/init.py", line 962, in ipc_collect
_lazy_init()
File "/home/agxuser/.local/lib/python3.10/site-packages/torch/cuda/init.py", line 334, in _lazy_init
raise DeferredCudaCallError(msg) from e
torch.cuda.DeferredCudaCallError: CUDA call failed lazily at initialization with error: 'NoneType' object is not iterable

CUDA call was originally invoked at:

File "/home/agxuser/Documents/projects/tensorrt_llm/TensorRT-LLM/examples/llama/convert_checkpoint.py", line 8, in
from transformers import AutoConfig
File "", line 1027, in _find_and_load
File "", line 1006, in _find_and_load_unlocked
File "", line 688, in _load_unlocked
File "", line 883, in exec_module
File "", line 241, in _call_with_frames_removed
File "/home/agxuser/.local/lib/python3.10/site-packages/transformers/init.py", line 26, in
from . import dependency_versions_check
File "", line 1078, in _handle_fromlist
File "", line 241, in _call_with_frames_removed
File "", line 1027, in _find_and_load
File "", line 1006, in _find_and_load_unlocked
File "", line 688, in _load_unlocked
File "", line 883, in exec_module
File "", line 241, in _call_with_frames_removed
File "/home/agxuser/.local/lib/python3.10/site-packages/transformers/dependency_versions_check.py", line 16, in
from .utils.versions import require_version, require_version_core
File "", line 1027, in _find_and_load
File "", line 992, in _find_and_load_unlocked
File "", line 241, in _call_with_frames_removed
File "", line 1027, in _find_and_load
File "", line 1006, in _find_and_load_unlocked
File "", line 688, in _load_unlocked
File "", line 883, in exec_module
File "", line 241, in _call_with_frames_removed
File "/home/agxuser/.local/lib/python3.10/site-packages/transformers/utils/init.py", line 27, in
from .chat_template_utils import DocstringParsingException, TypeHintParsingException, get_json_schema
File "", line 1027, in _find_and_load
File "", line 1006, in _find_and_load_unlocked
File "", line 688, in _load_unlocked
File "", line 883, in exec_module
File "", line 241, in _call_with_frames_removed
File "/home/agxuser/.local/lib/python3.10/site-packages/transformers/utils/chat_template_utils.py", line 40, in
from torch import Tensor
File "", line 1027, in _find_and_load
File "", line 1006, in _find_and_load_unlocked
File "", line 688, in _load_unlocked
File "", line 883, in exec_module
File "", line 241, in _call_with_frames_removed
File "/home/agxuser/.local/lib/python3.10/site-packages/torch/init.py", line 1903, in
_C._initExtension(_manager_path())
File "", line 1027, in _find_and_load
File "", line 1006, in _find_and_load_unlocked
File "", line 688, in _load_unlocked
File "", line 883, in exec_module
File "", line 241, in _call_with_frames_removed
File "/home/agxuser/.local/lib/python3.10/site-packages/torch/cuda/init.py", line 1528, in
_lazy_call(_register_triton_kernels)
File "/home/agxuser/.local/lib/python3.10/site-packages/torch/cuda/init.py", line 257, in _lazy_call
_queued_calls.append((callable, traceback.format_stack()))

I also tried to convert them in the dustynv/tensorrt_llm:0.12-r36.4.0 jetson-container with the same error. It seems like there are correlations with #1634, but was not fixed for the jetson-branch. Any ideas on how to resolve this?

Who can help?

No response

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

python3 TensorRT-LLM/examples/llama/convert_checkpoint.py --model_dir models/base/Llama-3.2-3B/ --output_dir models/converted/checkpoints/llama-3.2-3B/ --dtype bfloat16

Expected behavior

successful checkpoint creation for llama3.2 1B and 3B

actual behavior

Error message (see above)

additional notes

Did not test llama3.1 models, can try if that helps fixing the problem.

The text was updated successfully, but these errors were encountered:

falkbene added the bug Something isn't working label Feb 11, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cannot create checkpoint for llama-3.2 (1B, 3B) #2772

Cannot create checkpoint for llama-3.2 (1B, 3B) #2772

falkbene commented Feb 11, 2025

Cannot create checkpoint for llama-3.2 (1B, 3B) #2772

Cannot create checkpoint for llama-3.2 (1B, 3B) #2772

Comments

falkbene commented Feb 11, 2025

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

actual behavior

additional notes