Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Receiving Invalid Buffer size error when running kvcache (CPU) #16

Open
glli01 opened this issue Jan 23, 2025 · 3 comments
Open

Receiving Invalid Buffer size error when running kvcache (CPU) #16

glli01 opened this issue Jan 23, 2025 · 3 comments

Comments

@glli01
Copy link

glli01 commented Jan 23, 2025

M1 Macbook Pro - CPU
Logs:
 Code/current_projects/CAG  main  python3 ./kvcache.py --kvcache file --dataset "squad-train" --similarity bertscore --maxKnowledge 5 --maxParagraph 100 --maxQuestion 1000 --modelname "meta-llama/Llama-3.1-8B-Instruct" --randomSeed 0 --output "./result_kvcache.txt" 2025-01-22 13:43:55,158 - INFO - Using device: cpu 2025-01-22 13:43:55,184 - INFO - Use pytorch device_name: mps 2025-01-22 13:43:55,184 - INFO - Load pretrained SentenceTransformer: all-MiniLM-L6-v2 maxKnowledge 5 maxParagraph 100 maxQuestion 1000 randomeSeed 0 Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:14<00:00, 3.73s/it] 2025-01-22 13:44:12,200 - WARNING - Some parameters are on the meta device because they were offloaded to the disk. max_knowledge 5 max_paragraph 100 max_questions 1000 Traceback (most recent call last): File "/Users/georgeli/Documents/Code/current_projects/CAG/./kvcache.py", line 500, in <module> kvcache_test(args) File "/Users/georgeli/Documents/Code/current_projects/CAG/./kvcache.py", line 318, in kvcache_test knowledge_cache, prepare_time = prepare_kvcache(knowledges, filepath=kvcache_path, answer_instruction=answer_instruction) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/georgeli/Documents/Code/current_projects/CAG/./kvcache.py", line 190, in prepare_kvcache kv = preprocess_knowledge(model, tokenizer, knowledges) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/georgeli/Documents/Code/current_projects/CAG/./kvcache.py", line 118, in preprocess_knowledge outputs = model( ^^^^^^ File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl return self._call_impl(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl return forward_call(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/accelerate/hooks.py", line 170, in new_forward output = module._old_forward(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/transformers/models/llama/modeling_llama.py", line 1190, in forward outputs = self.model( ^^^^^^^^^^^ File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl return self._call_impl(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl return forward_call(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/transformers/models/llama/modeling_llama.py", line 945, in forward layer_outputs = decoder_layer( ^^^^^^^^^^^^^^ File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl return self._call_impl(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl return forward_call(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/accelerate/hooks.py", line 170, in new_forward output = module._old_forward(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/transformers/models/llama/modeling_llama.py", line 676, in forward hidden_states, self_attn_weights, present_key_value = self.self_attn( ^^^^^^^^^^^^^^^ File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl return self._call_impl(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl return forward_call(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/accelerate/hooks.py", line 170, in new_forward output = module._old_forward(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/transformers/models/llama/modeling_llama.py", line 602, in forward attn_output = torch.nn.functional.scaled_dot_product_attention( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ RuntimeError: Invalid buffer size: 117.59 GB

@brian030128
Copy link
Collaborator

brian030128 commented Feb 1, 2025

Hi george, how much RAM does your system have?

@glli01
Copy link
Author

glli01 commented Feb 1, 2025

Hi Brian, I think 16 GB of RAM

@brian030128
Copy link
Collaborator

brian030128 commented Feb 1, 2025

It’s most likely an out-of-memory error running on MAC with CPU, as discussed in this issue: huggingface/diffusers#5894

For running a Large Language Model, a dedicated GPU is highly recommended. If you don’t have access to one, I suggest using Google Colab, which offers a free tier with GPU support.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants