Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot create checkpoint for llama-3.2 (1B, 3B) #2772

Open
3 of 4 tasks
falkbene opened this issue Feb 11, 2025 · 0 comments
Open
3 of 4 tasks

Cannot create checkpoint for llama-3.2 (1B, 3B) #2772

falkbene opened this issue Feb 11, 2025 · 0 comments
Labels
bug Something isn't working

Comments

@falkbene
Copy link

System Info

  • CPU architecture : aarch64
  • Host memory 64GB (128GB swap on SSD)
  • GPU name: Nvidia Jetson Orin AGX (Ampere)
  • clock frequencies used: 1.3GHz
  • branch: v0.12.0-jetson
  • commit: 20a7cec
  • versions:
    TensorRT: 10.3.0.30-1+cuda12.5 (arm64)
    Cuda: 12.6.11-1 (arm64)
  • Nvidia driver: 540.4.0
  • OS: Ubuntu 22.04

I was able to create the checkpoint for the Meta-Llama-3-8B-Instruct model successfully. However, when I downloaded the meta-llama/Llama-3.2-1B and 3B version and tried to convert them, it fails with the following error:

agxuser@ubuntu:~/Documents/projects/tensorrt_llm$ python3 TensorRT-LLM/examples/llama/convert_checkpoint.py --model_dir models/base/Llama-3.2-3B/ --output_dir models/converted/checkpoints/llama-3.2-3B/ --dtype bfloat16
[TensorRT-LLM] TensorRT-LLM version: 0.12.0
0.12.0
201it [00:00, 390.69it/s]
Traceback (most recent call last):
File "/home/agxuser/Documents/projects/tensorrt_llm/TensorRT-LLM/examples/llama/convert_checkpoint.py", line 487, in
main()
File "/home/agxuser/Documents/projects/tensorrt_llm/TensorRT-LLM/examples/llama/convert_checkpoint.py", line 479, in main
convert_and_save_hf(args)
File "/home/agxuser/Documents/projects/tensorrt_llm/TensorRT-LLM/examples/llama/convert_checkpoint.py", line 421, in convert_and_save_hf
execute(args.workers, [convert_and_save_rank] * world_size, args)
File "/home/agxuser/Documents/projects/tensorrt_llm/TensorRT-LLM/examples/llama/convert_checkpoint.py", line 428, in execute
f(args, rank)
File "/home/agxuser/Documents/projects/tensorrt_llm/TensorRT-LLM/examples/llama/convert_checkpoint.py", line 410, in convert_and_save_rank
llama = LLaMAForCausalLM.from_hugging_face(
File "/home/agxuser/.local/lib/python3.10/site-packages/tensorrt_llm/models/llama/model.py", line 363, in from_hugging_face
loader.generate_tllm_weights(model)
File "/home/agxuser/.local/lib/python3.10/site-packages/tensorrt_llm/models/model_weights_loader.py", line 329, in generate_tllm_weights
tllm_weights.update(self.load(tllm_key))
File "/home/agxuser/.local/lib/python3.10/site-packages/tensorrt_llm/models/model_weights_loader.py", line 271, in load
v = sub_module.postprocess(tllm_key, v, **postprocess_kwargs)
File "/home/agxuser/.local/lib/python3.10/site-packages/tensorrt_llm/layers/linear.py", line 380, in postprocess
weights = weights.to(str_dtype_to_torch(self.dtype))
AttributeError: 'NoneType' object has no attribute 'to'
Exception ignored in: <function PretrainedModel.del at 0xfffe07666170>
Traceback (most recent call last):
File "/home/agxuser/.local/lib/python3.10/site-packages/tensorrt_llm/models/modeling_utils.py", line 449, in del
self.release()
File "/home/agxuser/.local/lib/python3.10/site-packages/tensorrt_llm/models/modeling_utils.py", line 446, in release
release_gc()
File "/home/agxuser/.local/lib/python3.10/site-packages/tensorrt_llm/_utils.py", line 469, in release_gc
torch.cuda.ipc_collect()
File "/home/agxuser/.local/lib/python3.10/site-packages/torch/cuda/init.py", line 962, in ipc_collect
_lazy_init()
File "/home/agxuser/.local/lib/python3.10/site-packages/torch/cuda/init.py", line 334, in _lazy_init
raise DeferredCudaCallError(msg) from e
torch.cuda.DeferredCudaCallError: CUDA call failed lazily at initialization with error: 'NoneType' object is not iterable

CUDA call was originally invoked at:

File "/home/agxuser/Documents/projects/tensorrt_llm/TensorRT-LLM/examples/llama/convert_checkpoint.py", line 8, in
from transformers import AutoConfig
File "", line 1027, in _find_and_load
File "", line 1006, in _find_and_load_unlocked
File "", line 688, in _load_unlocked
File "", line 883, in exec_module
File "", line 241, in _call_with_frames_removed
File "/home/agxuser/.local/lib/python3.10/site-packages/transformers/init.py", line 26, in
from . import dependency_versions_check
File "", line 1078, in _handle_fromlist
File "", line 241, in _call_with_frames_removed
File "", line 1027, in _find_and_load
File "", line 1006, in _find_and_load_unlocked
File "", line 688, in _load_unlocked
File "", line 883, in exec_module
File "", line 241, in _call_with_frames_removed
File "/home/agxuser/.local/lib/python3.10/site-packages/transformers/dependency_versions_check.py", line 16, in
from .utils.versions import require_version, require_version_core
File "", line 1027, in _find_and_load
File "", line 992, in _find_and_load_unlocked
File "", line 241, in _call_with_frames_removed
File "", line 1027, in _find_and_load
File "", line 1006, in _find_and_load_unlocked
File "", line 688, in _load_unlocked
File "", line 883, in exec_module
File "", line 241, in _call_with_frames_removed
File "/home/agxuser/.local/lib/python3.10/site-packages/transformers/utils/init.py", line 27, in
from .chat_template_utils import DocstringParsingException, TypeHintParsingException, get_json_schema
File "", line 1027, in _find_and_load
File "", line 1006, in _find_and_load_unlocked
File "", line 688, in _load_unlocked
File "", line 883, in exec_module
File "", line 241, in _call_with_frames_removed
File "/home/agxuser/.local/lib/python3.10/site-packages/transformers/utils/chat_template_utils.py", line 40, in
from torch import Tensor
File "", line 1027, in _find_and_load
File "", line 1006, in _find_and_load_unlocked
File "", line 688, in _load_unlocked
File "", line 883, in exec_module
File "", line 241, in _call_with_frames_removed
File "/home/agxuser/.local/lib/python3.10/site-packages/torch/init.py", line 1903, in
_C._initExtension(_manager_path())
File "", line 1027, in _find_and_load
File "", line 1006, in _find_and_load_unlocked
File "", line 688, in _load_unlocked
File "", line 883, in exec_module
File "", line 241, in _call_with_frames_removed
File "/home/agxuser/.local/lib/python3.10/site-packages/torch/cuda/init.py", line 1528, in
_lazy_call(_register_triton_kernels)
File "/home/agxuser/.local/lib/python3.10/site-packages/torch/cuda/init.py", line 257, in _lazy_call
_queued_calls.append((callable, traceback.format_stack()))

I also tried to convert them in the dustynv/tensorrt_llm:0.12-r36.4.0 jetson-container with the same error. It seems like there are correlations with #1634, but was not fixed for the jetson-branch. Any ideas on how to resolve this?

Who can help?

No response

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

python3 TensorRT-LLM/examples/llama/convert_checkpoint.py --model_dir models/base/Llama-3.2-3B/ --output_dir models/converted/checkpoints/llama-3.2-3B/ --dtype bfloat16

Expected behavior

successful checkpoint creation for llama3.2 1B and 3B

actual behavior

Error message (see above)

additional notes

Did not test llama3.1 models, can try if that helps fixing the problem.

@falkbene falkbene added the bug Something isn't working label Feb 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant