NotImplementedError: Cannot copy out of meta tensor; no data! #87

akashmittal18 · 2023-04-04T09:16:42Z

While trying to implement Pythia-Chat-Base-7B I am getting this error on running the very fist command (python inference/bot.py --model togethercomputer/Pythia-Chat-Base-7B) after creating and activating the conda env.
Can anyone help to identify what could possibly be the issue?

The text was updated successfully, but these errors were encountered:

koonseng · 2023-04-07T09:24:42Z

I have the same problem. I'm running this on AWS g3.4xlarge model with 128GB of memory.

python3 inference/bot.py --model togethercomputer/Pythia-Chat-Base-7B
Loading togethercomputer/Pythia-Chat-Base-7B to cuda:0...
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████| 2/2 [00:09<00:00, 4.55s/it]
Traceback (most recent call last):
File "inference/bot.py", line 285, in
main()
File "inference/bot.py", line 280, in main
not args.no_stream,
File "/usr/lib64/python3.7/cmd.py", line 105, in cmdloop
self.preloop()
File "inference/bot.py", line 127, in preloop
self._model = ChatModel(self._model_name_or_path, self._gpu_id, self._max_memory)
File "inference/bot.py", line 59, in init
self._model.to(device)
File "/home/ec2-user/.local/lib/python3.7/site-packages/transformers/modeling_utils.py", line 1811, in to
return super().to(*args, **kwargs)
File "/home/ec2-user/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 989, in to
return self._apply(convert)
File "/home/ec2-user/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 641, in _apply
module._apply(fn)
File "/home/ec2-user/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 641, in _apply
module._apply(fn)
File "/home/ec2-user/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 641, in _apply
module._apply(fn)
[Previous line repeated 1 more time]
File "/home/ec2-user/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 664, in _apply
param_applied = fn(param)
File "/home/ec2-user/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 987, in convert
return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
NotImplementedError: Cannot copy out of meta tensor; no data!

nvidia-smi -L
GPU 0: Tesla M60 (UUID: GPU-db292a1c-442c-5142-97e5-384a4cf4dd07)

pip3 freeze
accelerate==0.18.0
brotlipy==0.7.0
certifi==2022.12.7
cffi @ file:///croot/cffi_1670423208954/work
charset-normalizer==3.1.0
conda==23.1.0
conda-content-trust @ file:///tmp/abs_5952f1c8-355c-4855-ad2e-538535021ba5h26t22e5/croots/recipe/conda-content-trust_1658126371814/work
conda-package-handling @ file:///croot/conda-package-handling_1672865015732/work
conda_package_streaming @ file:///croot/conda-package-streaming_1670508151586/work
cryptography @ file:///croot/cryptography_1673298753778/work
faiss-gpu==1.7.2
filelock==3.11.0
flit_core @ file:///opt/conda/conda-bld/flit-core_1644941570762/work/source/flit_core
huggingface-hub==0.13.4
idna==3.4
importlib-metadata==6.1.0
numpy==1.21.6
nvidia-cublas-cu11==11.10.3.66
nvidia-cuda-nvrtc-cu11==11.7.99
nvidia-cuda-runtime-cu11==11.7.99
nvidia-cudnn-cu11==8.5.0.96
packaging==23.0
pandas==1.3.5
Pillow==9.5.0
pluggy @ file:///tmp/build/80754af9/pluggy_1648042572264/work
psutil==5.9.4
pycosat @ file:///croot/pycosat_1666805502580/work
pycparser @ file:///tmp/build/80754af9/pycparser_1636541352034/work
pyOpenSSL @ file:///opt/conda/conda-bld/pyopenssl_1643788558760/work
PySocks @ file:///tmp/build/80754af9/pysocks_1594394576006/work
python-dateutil==2.8.2
pytz==2023.3
PyYAML==6.0
regex==2022.10.31
requests==2.28.2
ruamel.yaml @ file:///croot/ruamel.yaml_1666304550667/work
ruamel.yaml.clib @ file:///croot/ruamel.yaml.clib_1666302247304/work
six==1.16.0
tokenizers==0.13.3

koonseng · 2023-04-07T12:18:33Z

OK, solved it. The problem was the g3.4xlarge instance has only 8GB per GPU, clearly not enough. I re-ran this on a g5.2xlarge and the problem disappears.

zas97 · 2023-04-13T10:13:14Z

I have the same problem

orangetin · 2023-04-13T15:42:01Z

@zas97 @akashmittal18 Could you please describe your setup? I see that a lot of people have this issue but I'm not able to reproduce it.

zas97 · 2023-04-14T15:00:01Z

I used paperspace gradient with a P500

orangetin · 2023-04-21T20:46:39Z

This error is caused by Accelerate auto-offloading weights to either the cpu or disk because of insufficient memory on the GPU.

@zas97 can you try manually offloading weights using the -g and -r flags as suggested in these docs? You should be able to run it on a P5000 in 8bit.

So on the g3.4xlarge (8GB VRAM, 122 GB memory) you'd run:
python inference/bot.py --model togethercomputer/Pythia-Chat-Base-7B -g 0:6 -r 120.
This will load up to 6 GB of the model onto the gpu and the rest into memory.

This can work better with #84 as you'd be able to change the 6 to an 8.

@koonseng can you try this too?

wemoveon2 · 2023-05-06T21:19:32Z

This error is caused by Accelerate auto-offloading weights to either the cpu or disk because of insufficient memory on the GPU.

@zas97 can you try manually offloading weights using the -g and -r flags as suggested in these docs? You should be able to run it on a P5000 in 8bit.

So on the g3.4xlarge (8GB VRAM, 122 GB memory) you'd run: python inference/bot.py --model togethercomputer/Pythia-Chat-Base-7B -g 0:6 -r 120. This will load up to 6 GB of the model onto the gpu and the rest into memory.

This can work better with #84 as you'd be able to change the 6 to an 8.

@koonseng can you try this too?

@orangetin can you give more details regarding the exact cause of this error?

orangetin · 2023-05-06T22:27:18Z

@orangetin can you give more details regarding the exact cause of this error?

Sure @wemoveon2 !

When loading the model using device_map="auto" on a GPU with insufficient VRAM, Transformers tries to offload the rest of the model onto the CPU/disk. The problem is, the model is being loaded in float16 which is not supported by CPU/disk (neither is 8-bit). So, torch offloads the model as a meta-tensor (no data). In other words, parts of the model are missing.

Solutions:

Using the -g and -r arguments: gives Accelerate a manual config for where it should offload the model. Accelerate takes care of the dtype.
Loading the model using either float32 or bfloat16 should work. Note, I haven't tested this one out myself but it should work.
Using a larger GPU like @koonseng did. This prevents offloading in the first place.

wemoveon2 · 2023-05-07T20:34:34Z

@orangetin Not sure if float32 will solve this particular issue since that's been the cause of my issue (unrelated to this project, more specific to just the accelerate package). I've been trying to load model pipelines in float32 with disk offload and have been getting this error inside accelerate's helper functionmodeling.py::set_module_tensor_to_device() at module._parameters[tensor_name] = new_value.

There is another thread documenting this same issue (occurs at the line, with a different torch version IIRC) in which the solution was resolved by using float16, but I think this only worked as there was no longer offloading going on.

@akashmittal18 did the proposed solution help resolve your issue? And if so, can you confirm whether you are still using CPU/disk offload along with the dtype assigned by accelerate?

anujsahani01 · 2023-06-02T14:21:25Z

@orangetin can you give more details regarding the exact cause of this error?

Sure @wemoveon2 !

When loading the model using device_map="auto" on a GPU with insufficient VRAM, Transformers tries to offload the rest of the model onto the CPU/disk. The problem is, the model is being loaded in float16 which is not supported by CPU/disk (neither is 8-bit). So, torch offloads the model as a meta-tensor (no data). In other words, parts of the model are missing.

Solutions:

Using the -g and -r arguments: gives Accelerate a manual config for where it should offload the model. Accelerate takes care of the dtype.

Loading the model using either float32 or bfloat16 should work. Note, I haven't tested this one out myself but it should work.

Using a larger GPU like @koonseng did. This prevents offloading in the first place.

I am having the same problem i loaded the model checkpoint shards in both float32 and bfloat16 but it does not work for me i do not know for what reason.

This is my google colab file its a request to have a look in it.
https://drive.google.com/file/d/1-ccrx1Q5tkLUYtZBGi5lNZGjPMyr_X9U/view?usp=sharing

AN OVERVIEW OF MY CODE:
i am using https://huggingface.co/HuggingFaceH4/starchat-alpha model, finetuning it on my own dataset. Firstly i using the meta device i made a device_map to load the checkpoint shards to my device , then i initialized my model using the downloaded checkpoints on my session storage then i loaded the weights tied them and finally i used acceletator load_checkpoint_and_dispatch and passed the folder contaning checkpoints and .josn files which is giving me this error.

This is the code snip that is giving me error:

The error:

my checkpoint folder that i am passing.

Please correct if i am conceptually wrong or missing some imp step.
I am using colab pro for running this code.

Thank You!
please help me in solving this error. @orangetin
Your inputs will be highly appreciated.

orangetin · 2023-06-02T17:12:48Z

@anujsahani01 I can't import your Colab file.

The error is caused by offloading model weights incorrectly. Refer to my previous comments on how to fix it:

Closing this thread as it is solved. Feel free to continue the conversation if you're still having issues.

anujsahani01 · 2023-06-02T19:09:13Z

@anujsahani01 I can't import your Colab file.

The error is caused by offloading model weights incorrectly. Refer to my previous comments on how to fix it:

NotImplementedError: Cannot copy out of meta tensor; no data! #87 (comment)

NotImplementedError: Cannot copy out of meta tensor; no data! #87 (comment)

Closing this thread as it is solved. Feel free to continue the conversation if you're still having issues.

Thank You !
Can you please tell how to run these commands on my google colab?

zetyquickly · 2023-08-29T23:51:22Z

@orangetin can you give more details regarding the exact cause of this error?

Sure @wemoveon2 !

When loading the model using device_map="auto" on a GPU with insufficient VRAM, Transformers tries to offload the rest of the model onto the CPU/disk. The problem is, the model is being loaded in float16 which is not supported by CPU/disk (neither is 8-bit). So, torch offloads the model as a meta-tensor (no data). In other words, parts of the model are missing.

Solutions:

Using the -g and -r arguments: gives Accelerate a manual config for where it should offload the model. Accelerate takes care of the dtype.

Loading the model using either float32 or bfloat16 should work. Note, I haven't tested this one out myself but it should work.

Using a larger GPU like @koonseng did. This prevents offloading in the first place.

Based on what was said, reordering the commands might provide a solution:

# first do
pipe = pipe.to(device)
# then do
pipe.enable_sequential_cpu_offload()

Ofc if the model itself (without inference data) can fit into VRAM

larekrow mentioned this issue May 22, 2023

NotImplementedError: Cannot copy out of meta tensor; no data! AutoGPTQ/AutoGPTQ#96

Closed

orangetin closed this as completed Jun 2, 2023

desperadoola mentioned this issue Jul 17, 2023

Quantizing Falcon Instruct Model fails at tgi 0.9.0 huggingface/text-generation-inference#552

Closed

4 tasks

wemoveon2 mentioned this issue Sep 8, 2023

NotImplementedError: Cannot copy out of meta tensor; no data! tloen/alpaca-lora#368

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NotImplementedError: Cannot copy out of meta tensor; no data! #87

NotImplementedError: Cannot copy out of meta tensor; no data! #87

akashmittal18 commented Apr 4, 2023

koonseng commented Apr 7, 2023 •

edited

Loading

koonseng commented Apr 7, 2023

zas97 commented Apr 13, 2023

orangetin commented Apr 13, 2023

zas97 commented Apr 14, 2023

orangetin commented Apr 21, 2023 •

edited

Loading

wemoveon2 commented May 6, 2023

orangetin commented May 6, 2023 •

edited

Loading

wemoveon2 commented May 7, 2023

anujsahani01 commented Jun 2, 2023

orangetin commented Jun 2, 2023

anujsahani01 commented Jun 2, 2023

zetyquickly commented Aug 29, 2023

NotImplementedError: Cannot copy out of meta tensor; no data! #87

NotImplementedError: Cannot copy out of meta tensor; no data! #87

Comments

akashmittal18 commented Apr 4, 2023

koonseng commented Apr 7, 2023 • edited Loading

koonseng commented Apr 7, 2023

zas97 commented Apr 13, 2023

orangetin commented Apr 13, 2023

zas97 commented Apr 14, 2023

orangetin commented Apr 21, 2023 • edited Loading

wemoveon2 commented May 6, 2023

orangetin commented May 6, 2023 • edited Loading

wemoveon2 commented May 7, 2023

anujsahani01 commented Jun 2, 2023

orangetin commented Jun 2, 2023

anujsahani01 commented Jun 2, 2023

zetyquickly commented Aug 29, 2023

koonseng commented Apr 7, 2023 •

edited

Loading

orangetin commented Apr 21, 2023 •

edited

Loading

orangetin commented May 6, 2023 •

edited

Loading