You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello. I want to use vramfs as a swap space for Nvidia GPU Memory.
So after reading the README.md file, I set vramfs to 20GB space.
When I executed the nvidia-smi command, I was happy to see that vramfs was grabbed as 20GB as shown below.
# vramfs /tmp/vram 20G# nvidia-smi
Tue Jun 18 13:22:00 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.171.04 Driver Version: 535.171.04 CUDA Version: 12.2 ||-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC || Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |||| MIG M. ||=========================================+======================+======================|| 0 NVIDIA A100 80GB PCIe Off | 00000000:21:00.0 Off | 0 || N/A 40C P0 65W / 300W | 76773MiB / 81920MiB | 0% Default |||| Disabled |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: || GPU GI CI PID Type Process name GPU Memory || ID ID Usage ||=======================================================================================|| 0 N/A N/A 2867 G /usr/lib/xorg/Xorg 4MiB || 0 N/A N/A 1856687 C bin/vramfs 20892MiB |<--- 20GB for OpenCL
| 0 N/A N/A 1906793 C /opt/conda/bin/python3.10 51754MiB || 0 N/A N/A 1988805 C /usr/bin/python 2670MiB || 0 N/A N/A 3729345 C /usr/bin/python 1418MiB |
+---------------------------------------------------------------------------------------+
And then, I also created the /tmp/vram/swapfile with 10GB as follows.
# cd /tmp/vram# LOOPDEV=$(losetup -f)# truncate -s 10G swapfile # replace 10G with target swapspace size, has to be smaller than the allocated vramfs (e.g. 20G)# losetup $LOOPDEV swapfile# mkswap $LOOPDEV# swapon $LOOPDEV# cat /proc/swaps
Filename Type Size Used Priority
/dev/loop7 partition 10485756 0 -3
# vi /etc/security/limits.conf
leemgs hard memlock unlimited
leemgs soft memlock unlimited
leemgs hard rtprio unlimited
leemgs soft rtprio unlimited
However, when I used an open source project called axolotl (https://github.com/OpenAccess-AI-Collective/axolotl) to run the model training as shown below, I got a cuda-oom error (e.g., torch.cuda.OutOfMemoryError: CUDA out of memory). I got a cuda-oom error when I ran the model training like below.
........... Omission ....................
[2024-06-18 13:11:25,615] [DEBUG] [axolotl.load_tokenizer:216] [PID:3288778] [RANK:0] EOS: 2 / </s>
[2024-06-18 13:11:25,615] [DEBUG] [axolotl.load_tokenizer:217] [PID:3288778] [RANK:0] BOS: 1 / <s>
[2024-06-18 13:11:25,616] [DEBUG] [axolotl.load_tokenizer:218] [PID:3288778] [RANK:0] PAD: 2 / </s>
[2024-06-18 13:11:25,616] [DEBUG] [axolotl.load_tokenizer:219] [PID:3288778] [RANK:0] UNK: 0 / <unk>
[2024-06-18 13:11:25,616] [INFO] [axolotl.load_tokenizer:224] [PID:3288778] [RANK:0] No Chat template selected. Consider adding a chat template for easier inference.
[2024-06-18 13:11:25,616] [DEBUG] [axolotl.train.log:61] [PID:3288778] [RANK:0] loading model and peft_config...
[2024-06-18 13:11:25,862] [INFO] [axolotl.load_model:280] [PID:3288778] [RANK:0] patching with flash attention for sample packing
[2024-06-18 13:11:25,862] [INFO] [axolotl.load_model:366] [PID:3288778] [RANK:0] patching _expand_mask
/home/guest/.local/lib/python3.10/site-packages/torch/_utils.py:776: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
return self.fget.__get__(instance, owner)()
[2024-06-18 13:11:32,028] [ERROR] [axolotl.load_model:591] [PID:3288778] [RANK:0] CUDA out of memory. Tried to allocate 196.00 MiB (GPU 0; 79.15 GiB total capacity; 3.20 GiB already allocated; 153.94 MiB free; 3.53 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Traceback (most recent call last):
File "/data/home/guest/fine-tuning-axolotl/src/axolotl/utils/models.py", line 480, in load_model
model = LlamaForCausalLM.from_pretrained(
File "/home/guest/.local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3852, in from_pretrained
) = cls._load_pretrained_model(
File "/home/guest/.local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 4286, in _load_pretrained_model
new_error_msgs, offload_index, state_dict_index = _load_state_dict_into_meta_model(
File "/home/guest/.local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 841, in _load_state_dict_into_meta_model
set_module_quantized_tensor_to_device(model, param_name, param_device, value=param)
File "/home/guest/.local/lib/python3.10/site-packages/transformers/integrations/bitsandbytes.py", line 128, in set_module_quantized_tensor_to_device
new_value = value.to(device)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 196.00 MiB (GPU 0; 79.15 GiB total capacity; 3.20 GiB already allocated; 153.94 MiB free; 3.53 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Traceback (most recent call last):
File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/data/home/guest/fine-tuning-axolotl/src/axolotl/cli/train.py", line 49, in<module>
fire.Fire(do_cli)
File "/home/guest/.local/lib/python3.10/site-packages/fire/core.py", line 143, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/home/guest/.local/lib/python3.10/site-packages/fire/core.py", line 477, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/home/guest/.local/lib/python3.10/site-packages/fire/core.py", line 693, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "/data/home/guest/fine-tuning-axolotl/src/axolotl/cli/train.py", line 33, in do_cli
return do_train(parsed_cfg, parsed_cli_args)
File "/data/home/guest/fine-tuning-axolotl/src/axolotl/cli/train.py", line 45, in do_train
return train(cfg=cfg, cli_args=cli_args, dataset_meta=dataset_meta)
File "/data/home/guest/fine-tuning-axolotl/src/axolotl/train.py", line 65, in train
model, peft_config = load_model(cfg, tokenizer, inference=cli_args.inference)
File "/data/home/guest/fine-tuning-axolotl/src/axolotl/utils/models.py", line 592, in load_model
raise err
File "/data/home/guest/fine-tuning-axolotl/src/axolotl/utils/models.py", line 480, in load_model
model = LlamaForCausalLM.from_pretrained(
File "/home/guest/.local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3852, in from_pretrained
) = cls._load_pretrained_model(
File "/home/guest/.local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 4286, in _load_pretrained_model
new_error_msgs, offload_index, state_dict_index = _load_state_dict_into_meta_model(
File "/home/guest/.local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 841, in _load_state_dict_into_meta_model
set_module_quantized_tensor_to_device(model, param_name, param_device, value=param)
File "/home/guest/.local/lib/python3.10/site-packages/transformers/integrations/bitsandbytes.py", line 128, in set_module_quantized_tensor_to_device
new_value = value.to(device)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 196.00 MiB (GPU 0; 79.15 GiB total capacity; 3.20 GiB already allocated; 153.94 MiB free; 3.53 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Traceback (most recent call last):
File "/home/guest/.local/bin/accelerate", line 8, in<module>sys.exit(main())
File "/home/guest/.local/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 47, in main
args.func(args)
File "/home/guest/.local/lib/python3.10/site-packages/accelerate/commands/launch.py", line 1023, in launch_command
simple_launcher(args)
File "/home/guest/.local/lib/python3.10/site-packages/accelerate/commands/launch.py", line 643, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/usr/bin/python3', '-m', 'axolotl.cli.train', 'examples/openllama-3b/lora.yml']' returned non-zero exit status 1.
# cat /proc/swaps
Filename Type Size Used Priority
/dev/loop7 partition 10485756 0 -2
As you can see, the used swap space of /dev/loop7 is still 0. It's weird.
So I was wondering, is it possible to use vramfs as a swap space for Nvidia GPUs by using vramfs? Welcome to any hints or clue.
The text was updated successfully, but these errors were encountered:
leemgs
changed the title
Using as swap for Nvidia GPU Memory causes cuda-oom.
Although enabling vramfs, cuda-oom happens
Jun 18, 2024
Hello. I want to use
vramfs
as a swap space for Nvidia GPU Memory.So after reading the README.md file, I set vramfs to 20GB space.
When I executed the
nvidia-smi
command, I was happy to see thatvramfs
was grabbed as 20GB as shown below.And then, I also created the /tmp/vram/swapfile with 10GB as follows.
However, when I used an open source project called axolotl (https://github.com/OpenAccess-AI-Collective/axolotl) to run the model training as shown below, I got a cuda-oom error (e.g.,
torch.cuda.OutOfMemoryError: CUDA out of memory
). I got a cuda-oom error when I ran the model training like below.As you can see, the used swap space of /dev/loop7 is still 0. It's weird.
So I was wondering, is it possible to use
vramfs
as a swap space for Nvidia GPUs by using vramfs? Welcome to any hints or clue.The text was updated successfully, but these errors were encountered: